-1

Context, I'm trying to port a Perl code into Python from https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/normalize-punctuation.perl#L87 and there is this regex here in Perl:

s/(\d) (\d)/$1.$2/g;

If I try it with the Perl script given the input text 123 45, it returns the same string with a dot. As a sanity check, I've tried on the command line too:

echo "123 45" | perl -pe 's/(\d) (\d)/$1.$2/g;' 

[out]:

123.45

And it does so too when I convert the regex to Python,

>>> import re
>>> r, s = r'(\d) (\d)', '\g<1>.\g<2>'
>>> print(re.sub(r, s, '123 45'))
123.45

But when I use the Moses script:

$ wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/normalize-punctuation.perl
--2019-03-19 12:33:09--  https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/normalize-punctuation.perl
Resolving raw.githubusercontent.com... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 905 [text/plain]
Saving to: 'normalize-punctuation.perl'

normalize-punctuation.perl    100%[================================================>]     905  --.-KB/s    in 0s      

2019-03-19 12:33:09 (8.72 MB/s) - 'normalize-punctuation.perl' saved [1912]

$ echo "123 45" > foobar

$ perl normalize-punctuation.perl < foobar
123 45

Even when we try to print the string before and after the regex in the Moses code, i.e.

if ($language eq "de" || $language eq "es" || $language eq "cz" || $language eq "cs" || $language eq "fr") {
    s/(\d) (\d)/$1,$2/g;
    }
else {
    print $_;
    s/(\d) (\d)/$1.$2/g;
    print $_;
    }

[out]:

123 45
123 45
123 45

We see that before and after the regex, there's no change in the string.

My question in parts are:

  • Is the Python \g<1>.\g<2> regex equivalent to the Perl's $1.$2?
  • Why is it that the Perl regex didn't add the full stop . between the two digit groups in Moses?
  • How to replicate Perl's behavior in Moses in Python regex?
  • How to replicate Python's behavior in Perl regex in Moses?
4

1 に答える 1