excel - VBAで韓国語の文字列から文字を抽出する方法

Question

MS-ExcelおよびMS-Accessで韓国語の単語から頭文字を抽出する必要があります。Left（ "한글"、1）を使用すると、最初の音節、つまり한が返されます。必要なのは最初の文字、つまりㅎです。これを行う機能はありますか？または少なくともイディオム？

文字列からUnicode値を取得する方法を知っていれば、そこからそれを解決することはできますが、車輪の再発明を行うことになると確信しています。（再び）

score 8 · Accepted Answer

免責事項: Access や VBA についてはほとんど知りませんが、あなたが抱えているのは一般的な Unicode の問題であり、これらのツールに固有のものではありません。この問題に関連するタグを追加するために、質問にタグを付け直しました。

Access は 한 を返すことで正しいことを行っています。実際には、その 2 文字の文字列の最初の文字です。ここで必要なのは、このハングルをその構成要素のジャム (Normalization Form D (NFD) とも呼ばれる「分解」) に正規に分解したものです。NFD 形式は ᄒ ‌ᅡ ‌ᆫ で、最初の文字が必要です。

また、あなたの例によれば、関数がjamo（ᄒ）に相当するハングル（ㅎ）を返すように見えることに注意してください。実際には、異なる意味単位（本格的なハングル音節、またはハングルの一部）。jamos の数は数十に制限されているため、前者から後者への定義済みのマッピングはありません (実際の作業は最初の関数 NFD で行われます)。

score 2 · Accepted Answer

Adding to Arthur's excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn't specific to Excel or Access (it's a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of ~~integers~~ encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

This is the output in my console:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

score 1 · Accepted Answer

あなたが探しているのは、Byte Array Dim aByte() as byte aByte="한글" で、文字列内の各文字の 2 つの Unicode 値が得られるはずだと思います

score 0 · Accepted Answer

必要なものは揃っていると思いますが、かなり複雑なようです。私はこれについて何も知りませんが、最近、Unicode の処理について調査を行い、LeftB()、RightB()、InputB()、InStrB()、LenB()、AscB などのすべての文字列 Byte 関数を調べました。 ()、ChrB()、MidB()、および vbUnicode 引数を持つ StrConv() もあります。これらはすべて、2 バイトのコンテキストで使用されると思われる関数ですが、その環境では作業しないため、非常に重要なものが欠落している可能性があります。

excel - VBAで韓国語の文字列から文字を抽出する方法

4 に答える 4

Related

Reference