:)入力ファイル(現在は英語)の音訳を元の形式(ヒンディー語)に戻すプロセスを試みています
サンプルまたは入力ファイルの一部は次のようになります。
E-k- b-u-d-z*dhi-m-aan- p-ksii#
E-k- ghn-e- j-ngg-l- m-e-ng E-k- b-h-u-t- UUNNc-aa p-e-dr thaa#
U-s- k-ii p-t-z*t-o-ng s-e- l-d-ii shaakhaay-e-ng m-j-*zb-uut- b-aaj-u-O-ng k-ii t-r-h- pheil-ii h-u-II thiing#
w-n- h-NNs-o-ng k-aa E-k- jhu-nhz*D- I-s- p-e-dr p-r- n-i-w-aas- k-r-t-aa thaa#
w-e- s-b- y-h-aaNN s-u-r-ksi-t- the- AUr- b-dre- AAr-aam- s-e- r-h-t-e- the-#
U-n- m-e-ng s-e- E-k- p-ksii b-h-u-t- b-u-d-z*dhi-m-aan- thaa#
I-s- b-u-d-z*dhi-m-aan- p-ksii n-e- E-k- d-i-n- p-e-dr k-ii j-dr m-e-ng s-e- E-k- l-t-aa k-o- U-g-t-e- d-e-khaa#
I-s- k-e- b-aar-e- m-e-ng U-s-n-e- d-uus-r-e- p-ksi-y-o-ng s-e- b-aat- k-ii#
"k-z*y-aa t-u-m-z*h-e-ng w-h- l-t-aa d-i-khaaII d-e-t-ii h-ei", U-s- n-e- U-n- s-e- p-uuchaa "t-u-m-z*h-e-ng I-s-e- n-Shz*T- k-r- d-e-n-aa c-aah-i-E-"#
"I-s-e- k-z*y-o-ng n-Shz*T- k-r- d-e-n-aa c-aah-i-E-?" h-NNs-o-ng n-e- AAshz*c-*ry- s-e- p-uuchaa "y-h- t-o- I-t-n-ii cho-T-ii s-e- h-ei#
h-m-e-ng y-h- k-z*y-aa h-aan-i- p-h-u-NNc-aa s-k-t-ii h-ei"#
"m-e-r-e- m-i-tro-ng," b-u-d-z*dhi-m-aan- p-ksii n-e- U-t-z*t-r- d-i-y-aa "w-h- cho-T-ii s-ii l-t-aa j-l-z*d-ii h-ii b-drii h-o- j-aay-e-g-ii#
y-h- h-m-aar-e- p-e-dr p-r- c-Dh*z k-r- U-s- s-e- l-i-p-T-t-ii j-aay-e-g-ii AUr- phi-r- m-o-T-ii AUr- m-j-*zb-uut- h-o- j-aay-e-g-ii"#
"t-o- k-z*y-aa h-u-AA"#
英語での同等の意味は次のとおりです。
A WISE OLD BIRD.
Deep in the forest stood a very tall tree.
Its leafy branches spread out like long arms.
This was the home of a flock of wild geese.
They were safe there.
One of the geese was a wild old bird.
One day this wise old bird noticed a small creeper growing at the foot of the tree.
He spoke to the other birds about it.
"Do you see that creeper ?" he said to them.
"You must destroy it."
"Why must we destroy it ?" asked the geese in surprise.
"It is so small.
What harm can it do?"
"My friends," replied the wise old bird, " that little creeper will soon grow.
私のスクリプトは次のようになります。
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import sys
CODEC = 'utf-8'
input_file=sys.argv[1]
output_file=sys.argv[2]
list1=[]
f=open(input_file,'r')
f1 = open(output_file,'w')
english_hindi_dict={'A' : u'अ' , 'AA' : u'आ ' , 'I' : u'इ' , 'II' : u'ई ' , 'U' : u'उ ' ,\
'UU' : u'ऊ' , 'r' : u'ऋ' , 'E' : u'ए' , 'ai' : u'ऐ' , 'O' : u'ओ' , 'AU' : u'औ' ,\
'k' : u'क' , 'kh' : u'ख' , 'g' : u'ग' , 'gh' : u'घ' , 'c' : u'च' , 'ch' : u'छ',\
'j': u'ज' , 'jh' : u'झ' , 'tr' : u'त्र' , 'T' : u'ट' , 'Th' : u'ठ' , 'D' : u'ड',\
'dr' : u'ड' , 'Dh' : u'ढ' , 'Na' : u'ण' , 'th' : u'त' , 'tha' : u'थ',\
'd' : u'द' , 'dh': u'ध' , 'n' : u'न' , 'p' : u'प' , 'ph' : u'फ' ,\
'b' : u'ब' , 'bh' : u'भ' , 'm' : u'म' , 'y' : u'य' , 'r' : u'र' , 'l' : u'ल' ,\
'w' : u'व' , 'sh' : u'श' , 'sha' : u'ष', 's' : u'स' , 'h' : u'ह' , 'ks' : u'क्ष' ,\
'i' : u'ि' , 'ii' : u'ी' , 'u' : u'ु' , 'uu' : u'ू' , 'e' : u'े' ,\
'aa' : u'ै' , 'o' : u'ो' , 'AU' : u'ौ' ,'H' : u'्' ,'mn' : u'ं' ,\
'NN' : u'ँ' , 'AW' : u'ॅ' , 'rr' : u'ृ' , '4' : u'४' , '6': u'६' , '8' : u'८',\
'2' : u'२' , '5' : u'५' , '3' : u'३' , '7' : u'७' , '9' : u'९' , '1' : u'१'}
for line in f:
#line=line.strip() to remove a line from its newline character....
#line=line.rstrip('.')
line=line.replace('-','')
line=line.replace('#','|') # i am using the or symbol for poornviram
#line=line.replace('।','')
#line = line.lower()
for word in line:
for ch in word:
if (ch in english_hindi_dict) :
translatedToken = english_hindi_dict[ch]
else :
translatedToken = ch
#{ translatedToken = english_hindi_dict[ch] }
#for ch in line:
f1.write(translatedToken)
#print translatedToken
#line = line.replace( char,english_hindi_dict[char] )
#list1.append(line)
f.close()
f1.write(' '.join(list1))
f1.close()
私が得ているエラーは次のとおりです。
python transliterate_eh_nw.py Hstory.txt op1.txt
Traceback (most recent call last):
File "transliterate_eh_nw.py", line 43, in <module>
f1.write(translatedToken)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u092f' in position 0: ordinal not in range(128)
このエラーの対処方法を教えてください。ありがとうございました..:)