apache-spark-sql - REST Web サービスでクエリをレポートするために Spark sql を使用できますか

Question

Spark に関するいくつかの基本的な質問。ジョブの処理のコンテキストでのみスパークを使用できますか?このユースケースでは、位置データとモーションデータのストリームがあり、これを調整して Cassandra テーブルに保存できます。いくつかの検索基準でいくつかのレポートを表示したい場合は、Spark (Spark SQL) を使用できますか?または、この目的のために cql に制限する必要がありますか? spark を使用できる場合、Tomcat サーバーにデプロイされた Web サービスから spark-sql を呼び出すにはどうすればよいでしょうか。

score 0 · Accepted Answer

次のような HTML アドレスを介して SQL リクエストを渡すことで、それを行うことができます。

http://yourwebsite.com/Requests?query=WOMAN

受信ポイントでは、アーキテクチャは次のようになります。

Tomcat+Servlet --> Apache Kafka/Flume --> Spark Streaming --> Spark SQL inside a SS closure

Tomcat の webapplication フォルダーにあるサーブレット (サーブレットが何かわからない場合は調べてください) には、次のようなものがあります。

public class QueryServlet extends HttpServlet{
    @Override
    public void doGet(ttpServletRequest request, HttpServletResponse response){
        String requestChoice = request.getQueryString().split("=")[0];
        String requestArgument = request.getQueryString().split("=")[1];
        KafkaProducer<String, String> producer;

            Properties properties = new Properties();
            properties.setProperty("bootstrap.servers", "localhost:9092");
            properties.setProperty("acks", "all");
            properties.setProperty("retries", "0");
            properties.setProperty("batch.size", "16384");
            properties.setProperty("auto.commit.interval.ms", "1000");
            properties.setProperty("linger.ms", "0");
            properties.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
            properties.setProperty("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
            properties.setProperty("block.on.buffer.full", "true");
            producer = new KafkaProducer<>(properties);
            producer.send(new ProducerRecord<String, String>(
                    requestChoice,
                    requestArgument));

Spark Streaming 実行中のアプリケーション (クエリをキャッチするために実行する必要があります。そうしないと、Spark の開始にかかる時間がわかります) では、Kafka レシーバーが必要です。

JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(batchInt*1000));

    Map<String, Integer> topicMap = new HashMap<>();
    topicMap.put("wearable", 1);

    //FIrst Dstream is a couple made by the topic and the value written to the topic
    JavaPairReceiverInputDStream<String, String> kafkaStream =
            KafkaUtils.createStream(jssc, "localhost:2181", "test", topicMap);

この後、何が起こるかというと、

GET 本体を設定するか、クエリに引数を指定して GET を実行します。
GET はサーブレットによってキャッチされ、すぐに Kafka Producer を作成、送信、閉じます (実際には Kafka ステップを回避することができます。Spark Streaming アプリに他の方法で情報を送信するだけです。SparkStreaming レシーバーを参照してください)。
Spark Streaming は、他の送信された Spark アプリケーションと同じように SparkSQL コードを操作しますが、他のクエリが来るのを待って実行を続けます。

もちろん、サーブレットではリクエストの有効性をチェックする必要がありますが、これが主な考え方です。または、少なくとも私が使用してきたアーキテクチャ

apache-spark-sql - REST Web サービスでクエリをレポートするために Spark sql を使用できますか

1 に答える 1

Related

Reference