java - Java でのパーセプトロンの実装に関するデータ構造の混乱

Question

私はパーセプトロンアルゴリズムを Java で実装しようとしています。完全なニューラルネットタイプではなく、1 層のみの種類です。私が解決しようとしているのは分類問題です。

私がしなければならないことは、政治、科学、スポーツ、無神論の 4 つのカテゴリのいずれかで、各ドキュメントの bag-of-words 特徴ベクトルを作成することです。これがデータです。

私はこれを達成しようとしています（この質問への最初の回答からの直接引用）：

例：

Document 1 = ["I", "am", "awesome"]
Document 2 = ["I", "am", "great", "great"]

辞書は次のとおりです。

["I", "am", "awesome", "great"]

したがって、ベクトルとしてのドキュメントは次のようになります。

Document 1 = [1, 1, 1, 0]
Document 2 = [1, 1, 0, 2]

これにより、あらゆる種類の高度な数学を実行して、これをパーセプトロンに入力できます。

グローバルディクショナリを生成できたので、ドキュメントごとに 1 つ作成する必要があります。フォルダ構造は非常に単純です。つまり、`/politics/' には多くの記事が含まれており、各記事に対してグローバルディクショナリに対して特徴ベクトルを作成する必要があります。私が使用しているイテレータが私を混乱させていると思います。

これはメインクラスです:

public class BagOfWords 
{
    static Set<String> global_dict = new HashSet<String>();

    static boolean global_dict_complete = false; 

    static String path = "/home/Workbench/SUTD/ISTD_50.570/assignments/data/train";

    public static void main(String[] args) throws IOException 
    {
        //each of the diferent categories
        String[] categories = { "/atheism", "/politics", "/science", "/sports"};

        //cycle through all categories once to populate the global dict
        for(int cycle = 0; cycle <= 3; cycle++)
        {
            String general_data_partition = path + categories[cycle]; 

            File file = new File( general_data_partition );
            Iterateur.iterateDirectory(file, global_dict, global_dict_complete);
        }   

        //after the global dict has been filled up
        //cycle through again to populate a set of
        //words for each document, compare it to the
        //global dict. 
        for(int cycle = 0; cycle <= 3; cycle++)
        {
            if(cycle == 3)
                global_dict_complete = true;

            String general_data_partition = path + categories[cycle]; 

            File file = new File( general_data_partition );
            Iterateur.iterateDirectory(file, global_dict, global_dict_complete);
        }

        //print the data struc              
        //for (String s : global_dict)
            //System.out.println( s );
    }
}

これは、データ構造を反復処理します。

public class Iterateur 
{
    static void iterateDirectory(File file, 
                             Set<String> global_dict, 
                             boolean global_dict_complete) throws IOException 
    {
        for (File f : file.listFiles()) 
        {
            if (f.isDirectory()) 
            {
                iterateDirectory(file, global_dict, global_dict_complete);
            } 
            else 
            {
                String line; 
                BufferedReader br = new BufferedReader(new FileReader( f ));

                while ((line = br.readLine()) != null) 
                {
                    if (global_dict_complete == false)
                    {
                        Dictionary.populate_dict(file, f, line, br, global_dict);
                    }
                    else
                    {
                        FeatureVecteur.generateFeatureVecteur(file, f, line, br, global_dict);
                    }
                }
            }
        }
    }
}

これはそのグローバルディクショナリを埋めます。

public class Dictionary 
{

    public static void populate_dict(File file, 
                                 File f, 
                                 String line, 
                                 BufferedReader br, 
                                 Set<String> global_dict) throws IOException
    {

        while ((line = br.readLine()) != null) 
        {
            String[] words = line.split(" ");//those are your words

            String word;

            for (int i = 0; i < words.length; i++) 
            {
                word = words[i];
                if (!global_dict.contains(word))
                {
                    global_dict.add(word);
                }
            }   
        }
    }
}

これは、ドキュメント固有の辞書を埋めるための最初の試みです。

public class FeatureVecteur 
{
    public static void generateFeatureVecteur(File file, 
                                          File f, 
                                          String line, 
                                          BufferedReader br, 
                                          Set<String> global_dict) throws IOException
    {
        Set<String> file_dict = new HashSet<String>();

        while ((line = br.readLine()) != null) 
        {

            String[] words = line.split(" ");//those are your words

            String word;

            for (int i = 0; i < words.length; i++) 
            {
                word = words[i];
                if (!file_dict.contains(word))
                {
                    file_dict.add(word);
                }
            }   
        }
    }
}

java - Java でのパーセプトロンの実装に関するデータ構造の混乱

1 に答える 1

Related

Reference