持っているcsvデータファイルから.arffファイルを生成しようとしています。今、私はWekaにまったく慣れておらず、ちょうど1日前にそれを使い始めました。初心者向けに、これを使って簡単なTwitterの感情分析を試しています。CSVでトレーニングデータを生成しました。CSVファイルの内容は次のとおりです。
tweet,affinScore,polarity
ATAUTHORcfoblog is giving away a $25 Amex gift card (enter to win over $600 in prizes!) http://t.co/JD8EP14c ,4,4
"American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite-retailer platform, no? ",0,1
African-American Demos Express Ethnic Identity Differently http://t.co/gInv4bKj via ATAUTHORmediapost ,0,3
Google ???????? Visa ? American Express http://t.co/eEZTSiHY ,0,4
Secrets to Success from Small-Business Owners : Lifestyle :: American Express OPEN Forum http://t.co/b85F8JX0 via ATAUTHOROpenForum ,2,1
RT ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1
Winning Surveys $1500 american express Huggies Sweeps http://t.co/WoaTFowp ,4,1
I root for Square mostly because a small business that takes Square is also one that takes American Express. ,0,1
I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Uh oh... RT ATAUTHORBlackArrowBella: I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Just got another credit card. A Blue Sky card with American Express. Its gonna help pay for the honeymoon! ATAUTHORAmericanExpress ,-1,1
Follow ATAUTHORShaveMagazine and ReTweet this msg to be entered to #Win an American Express Gift card. Winners contacted bi-weekly by direct msg! ,2,4
American Express Gold zakelijk aanvragen: http://t.co/xheZwmbt ,0,3
RT ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1
ここで、最初の属性は実際のツイート、2番目はAFFINスコア、3番目は実際の分類クラス(1-ポジティブ、2-ネガティブ、3-ニュートラル、4-スパム)です。
ここで、コードを使用して.arff形式を生成しようとします。
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;
public class CSV2Arff {
/**
* takes 2 arguments:
* - CSV input file
* - ARFF output file
*/
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
System.exit(1);
}
// load CSV
CSVLoader loader = new CSVLoader();
loader.setSource(new File(args[0]));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(args[1]));
saver.setDestination(new File(args[1]));
saver.writeBatch();
}
}
これにより、次のような.arffファイルが生成されます。
@relation file
@attribute tweet {_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._
@data
_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,4,4
'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',0,1
African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,0,3
Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,0,4
Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,2,1
RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._,0,1
私はWekaを初めて使用しますが、読んだことから、このARFFが正しく形成されていないのではないかと疑っています。誰かコメントできますか?
また、それが間違っている場合、誰かが私がどこで間違っているのかを正確に指摘できますか?