5

I am running spark on Amazon EMR with yarn as the cluster manager. I am trying to write a python app which starts and caches data in memory. How can I allow other python programs to access that cached data i.e.

I start an app Pcache -> Cache data and keep that app running. Another user can access that same cached data running a different instance.

My understanding was that it should be possible to get a handle on the already running sparkContext and access that data? Is that possible? Or do I need to set up an API on top of that Spark App to access that data. Or may be use something like Spark Job Server of Livy.

4

2 に答える 2

0

複数のプロセス間で SparkContext を共有することはできません。実際、あなたのオプションは、SparkContext を保持する 1 つのサーバーとそのクライアントがそれをどう処理するかを伝えることで、自分で API を構築するか、同じものの一般的な実装であるSpark Job Serverを使用することです。

于 2016-02-03T10:29:01.350 に答える