python - 単語とカテゴリの2つのリストをPythonを使用して独自のコーパスにリンクする

Question

さて、私はそれについて何度も何度も考えましたが、私はPythonの初心者であり、解決策は見つかりません。これは私がする必要があることです：私はその後ろにあらゆる種類のオランダ語の単語と数字を含むLIWCからのテキストファイルを持っています：

aaien 12 13 32
aan 10
aanbad 12 13 14 57 58 38
...

次に、LIWCからテキストファイルがあり、その後ろに番号とカテゴリがあります。

01:Pronoun
02:I
03:We
04:Self
05:You
06:Other
...

そして今、私は自分のコーパスをこれらのカテゴリーのオランダ語の単語とリンクさせることになっています。したがって、最初にオランダ語をLIWC単語リストのオランダ語の後ろの数字にリンクする必要があり、次にそれらの数字をこれらのカテゴリにリンクする必要があります...の辞書を作成すると便利だと思いましたLIWCの両方のリスト。これは私がこれまでに持っているものです：

with open('LIWC_words.txt', 'rU') as document:
    answer = {}
    for line in document:
        line = line.split()
        if not line:  #empty line
            continue
        answer[line[0]] = line[1:]

with open ('LIWC_categories.txt','rU') as document1:
    categoriesLIWC = {}
    for line in document1:
        line = line.strip()
        if not line:
            continue
        key, value = line.split(':')
        if key.isdigit():
            categoriesLIWC[int(key)] = value
        else:
            categoriesLIWC[key] = value

だから私は今2つの辞書を持っています...しかし今私は立ち往生しています。誰かが私が次に何をすべきか考えていますか？（私は主にNLTKで作業する必要があるため、Python 2.6.5で作業します）

score 0 · Accepted Answer

データをその形式に変換する 1 つの方法を次に示します。

dic = {}
ref = {}
tempdic = open('dic.txt','r').read().split('\n')
tempref = open('ref.txt','r').read().split('\n')

for line in tempdic:
  if line:
    line = line.split()
    dic[line[0]] = line[1:]
for line in tempref:
  if line:
    line = line.split(':')
    ref[line[0]] = line[1]
#dic = {'word1':[1,2,3], word2:[2,3]...}
#ref = {1:'ref1',2:'ref2',...}
for word in dic:
  for indx in range(len(dic[word])):#for each number after word
    dic[word][indx] = ref[dic[word][indx]]

から始めたとしましょう{'apple':[1,2,3]}。dic['apple'][0]に解決される1と、右側はどちらにref[1]なる可能性があります'pronoun'。これにより{'apple' : ['pronoun', 2, 3]、次の反復で置き換えられる残りの数値が残ります。

score 0 · Accepted Answer

作成しようとしている終了形式が正確にはわかりません。たとえば、dict['pronoun']からのすべての行を含むdocument辞書を作成でき'01'ます。

#for example from this format
dic = {'word1': [1,2,3], 'word2':[3,2]}
ref = {1: 'pronoun', 2: 'I' , 3: 'you'}

out = {}

for word in dic:
  for entry in dic[word]:
    if entry in out:
      out[entry].append(word)
    else:
      out[entry] = []
      out[entry].append(word)

print out
>>>{1: ['word1'], 2: ['word1', 'word2'], 3: ['word1', 'word2']}

documentまたは、の番号をのエントリに置き換えることもできますdocument1。

#for example from this format
dic = {'word1': [1,2,3], 'word2':[3,2]}
ref = {1: 'pronoun', 2: 'I' , 3: 'you'}

for word in dic:
  for indx in range(len(dic[word])): 
    dic[word][indx] = ref[dic[word][indx]]

print dic
>>>{'word1': ['pronoun', 'I', 'you'], 'word2': ['you', 'I']}

そうでなければ、データベースについて考えたことはありますか?

python - 単語とカテゴリの2つのリストをPythonを使用して独自のコーパスにリンクする

2 に答える 2

Related

Reference