csv - HdfsからCSVファイルを読み取る方法は?

Question

CSV ファイルにデータがあります。HDFSにあるCSVファイルを読みたいです。

誰でもコードを手伝ってもらえますか??

私はhadoopが初めてです。前もって感謝します。

score 6 · Accepted Answer

これに必要なクラスは、FileSystem、FSDataInputStream、およびPathです。クライアントは次のようになります。

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
        System.out.println(inputStream.readChar());         
    }

FSDataInputStream にはいくつかのreadメソッドがあります。ニーズに合ったものを選択してください。

MR の場合はさらに簡単です。

        public static class YourMapper extends
                    Mapper<LongWritable, Text, Your_Wish, Your_Wish> {

                public void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {

                    //Framework does the reading for you...
                    String line = value.toString();      //line contains one line of your csv file.
                    //do your processing here
                    ....................
                    ....................
                    context.write(Your_Wish, Your_Wish);
                    }
                }
            }

score 2 · Accepted Answer

mapreduce を使用する場合は、TextInputFormat を使用して行ごとに読み取り、マッパーの map 関数で各行を解析できます。

その他のオプションは、ファイルからデータを読み取るための CSV 入力形式を開発する (または開発されたものを見つける) ことです。

ここに古いチュートリアルが 1 つありますhttp://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.htmlですが、ロジックは新しいバージョンでも同じです

ファイルからデータを読み取るために単一のプロセスを使用している場合、他のファイルシステムからファイルを読み取るのと同じです。ここに良い例がありますhttps://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs

HTH

csv - HdfsからCSVファイルを読み取る方法は?

2 に答える 2

Related

Reference