scala - スパークストリーミングファイルストリーム

Question

Spark Streaming でプログラミングしていますが、scala に問題があります。関数 StreamingContext.fileStream を使用しようとしています

この関数の定義は次のようになります。

def fileStream[K, V, F <: InputFormat[K, V]](directory: String)(implicit arg0: ClassManifest[K], arg1: ClassManifest[V], arg2: ClassManifest[F]): DStream[(K, V)]

新しいファイルの Hadoop 互換ファイルシステムを監視し、指定されたキーと値の型と入力形式を使用してそれらを読み取る入力ストリームを作成します。で始まるファイル名。は無視されます。K HDFS ファイルを読み取るためのキータイプ V HDFS ファイルを読み取るための値のタイプ F HDFS ファイルを読み取るための入力形式ディレクトリ新しいファイルを監視する HDFS ディレクトリ

Key と Value の型を渡す方法がわかりません。スパークストリーミングのマイコード:

val ssc = new StreamingContext(args(0), "StreamingReceiver", Seconds(1),
  System.getenv("SPARK_HOME"), Seq("/home/mesos/StreamingReceiver.jar"))

// Create a NetworkInputDStream on target ip:port and count the
val lines = ssc.fileStream("/home/sequenceFile")

Hadoop ファイルを書き込む Java コード:

public class MyDriver {

private static final String[] DATA = { "One, two, buckle my shoe",
        "Three, four, shut the door", "Five, six, pick up sticks",
        "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);
    IntWritable key = new IntWritable();
    Text value = new Text();
    SequenceFile.Writer writer = null;
    try {
        writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
                value.getClass());
        for (int i = 0; i < 100; i++) {
            key.set(100 - i);
            value.set(DATA[i % DATA.length]);
            System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
                    value);
            writer.append(key, value);
        }
    } finally {
        IOUtils.closeStream(writer);
    }
}

}

score 9 · Accepted Answer

を使用する場合fileStreamは、呼び出すときに 3 つの型パラメーターすべてを指定する必要があります。を呼び出す前にKey、、、Valueおよびタイプが何であるかを知る必要があります。InputFormatタイプがLongWritable, Textandの場合、次のようTextInputFormatに呼び出します。fileStream

val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/home/sequenceFile")

これらの 3 つのタイプがたまたまあなたのタイプである場合は、textFileStream代わりに使用することをお勧めします。これは、タイプパラメーターを必要とせず、fileStream私が言及した 3 つのタイプを使用するための委任を行うためです。それを使用すると、次のようになります。

val lines = ssc.textFileStream("/home/sequenceFile")

scala - スパークストリーミングファイルストリーム

2 に答える 2

Related

Reference