regex - matlab で正規表現を使用して単語を分割します。「分割」の開始インデックス?

Question

私の目的は、一連の規則に従って任意の単語の音声表記を生成することです。

まず、単語を音節に分割します。たとえば、単語内の「ch」を見つけて、次のように区切るアルゴリズムが必要です。

Input: 'aachbutcher'
Output: 'a' 'a' 'ch' 'b' 'u' 't' 'ch' 'e' 'r'

私はこれまでに来ました：

check=regexp('aachbutcher','ch');

if (isempty(check{1,1})==0)          % Returns 0, when 'ch' was found.

   [match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')

   %Now I split the 'aa', 'but' and 'er' into single characters:
   for i = 1:length(split)
       SingleLetters{i} = regexp(split{1,i},'.','match');
   end

end

私の問題は次のとおりです。目的の出力のようにフォーマットされるように、セルをまとめるにはどうすればよいですか? 一致部分 (「ch」) の開始インデックスしかありませんが、分割部分 (「aa」、「but」、「er」) の開始インデックスはありません。

何か案は？

score 0 · Accepted Answer

インデックスや長さを操作する必要はありません。単純なロジック: match の最初の要素、次に split の最初の要素、次に match の 2 番目の要素などを処理します。

[match,split,startIndex,endIndex] = regexp('aachbutcher','ch','match','split');

%Now I split the 'aa', 'but' and 'er' into single characters:
SingleLetters=regexp(split{1,1},'.','match');

for i = 2:length(split)
   SingleLetters=[SingleLetters,match{i-1},regexp(split{1,i},'.','match')];
end

regex - matlab で正規表現を使用して単語を分割します。「分割」の開始インデックス?

2 に答える 2

Related

Reference