1

文字列に特定の分音記号があるかどうかを確認し、それを別の音訳スキームに変換するスクリプトを作成しようとしています。(どちらもサンスクリット語の音訳スキームです)。

これが私のコードです。

$first = $_POST["first"];
$second = $_POST['second'];
$iast = array("a","A","ā","Ā","i","I","ī","Ī","u","U","ū","Ū","ṛ","Ṛ","ṝ","Ṝ","ḷ","Ḷ","ḹ","Ḹ","e","E","ai","Ai","o","O","au","Au","ṃ","Ṃ","ḥ","Ḥ","k","K","c","C","ṭ","Ṭ","t","T","p","P","kh","Kh","ch","Ch","ṭh","Ṭh","th","Th","ph","Ph","g","G","j","J","ḍ","Ḍ","d","D","b","B","gh","Gh","jh","Jh","ḍh","Ḍh","dh","Dh","bh","Bh","ṅ","Ṅ","ñ","Ñ","ṇ","Ṇ","n","N","m","M","y","Y","r","R","l","L","v","V","ś","Ś","ṣ","Ṣ","s","S","h","H");
$slp  = array("a","a","A","A","i","i","I","I","u","u","U","U","f","f","F","F","x","x","X","X","e","e","E", "E", "o","o","O", "O", "M","M","H","H","k","k","c","c","w","w","t","t","p","p","K", "K", "C", "C", "W", "W", "T", "T", "P", "P", "g","g","j","j","q","q","d","d","b","b","G", "G", "J", "J", "Q", "Q", "D", "D", "B", "B", "N","N","Y","Y","R","R","n","n","m","m","y","Y","r","r","l","l","v","v","S","S","z","z","s","s","h","h");

if (preg_match('/[āĀīĪūŪṛṚṝṜḷḶḹḸṃṂḥḤṭṬḍḌṅṄñÑṇṆśŚṣṢV]/',$first) || preg_match('/[āĀīĪūŪṛṚṝṜḷḶḹḸṃṂḥḤṭṬḍḌṅṄñÑṇṆśŚṣṢV]/',$second))
{
    $first = str_replace($iast,$slp,$first);
    $second = str_replace($iast,$slp,$second);
}

ユーザー入力として HTML から $first と $second の両方を取得します。

質問: $first="dhātṛ"; と入力すると、そして $second = "aṃśaḥ"; 出力は「DAtf」+「amsah」です。配列からわかるように、目的の出力は「DAtf」+「aMSaH」です。

どのように ṛ を識別し、正しく f に変換したのか、まだわかりません。そして、m と h の下のドットを置き換えることができませんでした -> M と H をそれぞれ。

4

1 に答える 1

0

問題は、翻訳配列の順序にあります。str_replace()愚かなアルゴリズムを使用します。検索配列で見つかったすべての文字を、両方の配列の最初の値から始めて、置換配列の一致する値に置き換えます。

ある時点で、「ṃ」は大文字の「M」に置き換えられます。後で、大文字の「M」を小文字の「m」に置き換えます。str_replace()この M が実際に置き換えられた "ṃ" であることを覚えていないため、再度変換します。

交換用アレイを再配置することで、この問題を回避できます。「簡単な」文字を最初に翻訳し、発音区別符号の文字を後で翻訳すると、このトラップを回避できる可能性があります。「m」と「M」の翻訳を配列の前に移動することで、その「ṃ」の正しい翻訳をテストすることに成功しました。

一方、配列内の値の再配置を開始して、str_replace() が実際に 1 つの文字に対して 2 回機能するかどうかをすべてチェックすることはおそらく望まないでしょう。アルゴリズムは、各文字を 1 回だけ分析し、適切な音訳に変換する必要があります。strtr()それができるPHP関数のようですが、残念ながらシングルバイトエンコーディングでしか使えません。また、利用できる機能はありませmb_strtr()ん。

于 2013-11-09T11:49:11.020 に答える