nlp - GATE Twitter 品詞タガーを StanfordCoreNLP コードのモデルとしてどのように使用しますか?

Question

GATE Twitter 品詞タガーを StanfordCoreNLP コードのモデルとしてどのように使用しますか?

モデルはこちら: https://gate.ac.uk/wiki/twitter-postagger.html . ただし、モデルは StanfordCoreNLP 形式ではないようです。

Gateからモデルファイルをダウンロードして、クラスパスを入れてみました。ファイルは見つかりましたが、正しいヘッダーがありません:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
//props.put("pos.model", "gate-EN-twitter-fast.model");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

スタックトレースは次のとおりです。

    Reading POS tagger model from gate-EN-twitter-fast.model ... Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:558)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at com.rincaro.mapreduce.apps.StanfordCoreNlpDemo.main(StanfordCoreNlpDemo.java:31)

    Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:857)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:755)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:289)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:253)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:88)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:76)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:556)
... 5 more

    Caused by: java.io.StreamCorruptedException: invalid stream header: EFBFBDEF
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
at edu.stanford.nlp.tagger.maxent.TaggerConfig.readConfig(TaggerConfig.java:746)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:792)

score 1 · Accepted Answer

最近のリリース (v3.3.1 の 2014 年 4 月 11 日) のモデルを試してみたところ、次のコマンドでうまくいきました。

./corenlp.sh -file tweets.txt -pos.model gate-EN-twitter.model -ssplit.newlineIsSentenceBreak always

score 0 · Accepted Answer

Stanford 3.3.1 に対してテスト済み

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.put("pos.model", "gate-EN-twitter.model");
props.put("dcoref.score", true);
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

サンプルツイート:

ikr smh 彼は fb に u を追加できるように姓を尋ねました lololol

gate-EN-twitter.model なしの結果

word: ikr :: pos: NN :: ne:O
word: smh :: pos: NN :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: NNP :: ne:O
word: yo :: pos: NNP :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: NN :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NN :: ne:O
word: lololol :: pos: NN :: ne:O

gate-EN-twitter.model での結果

word: ikr :: pos: UH :: ne:O
word: smh :: pos: UH :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: IN :: ne:O
word: yo :: pos: PRP$ :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: PRP :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NNP :: ne:O
word: lololol :: pos: UH :: ne:O

score 0 · Accepted Answer

また、これを StanfordCoreNLP の v3.6.0 (2016 年 4 月) に対してテストしたところ、POS タガーはすぐに (.NET コードから) 正常に動作しました。

nlp - GATE Twitter 品詞タガーを StanfordCoreNLP コードのモデルとしてどのように使用しますか?

3 に答える 3

Related

Reference