“alluxio”の関連問題_Stack Overflow日本語サイト

0 投票する

1 に答える

1447 参照

scala - Spark Tachyon: ファイルを削除するには?

Scala では、実験として、Spark を使用して Tachyon でシーケンスファイルを作成し、それを読み戻します。また、Spark スクリプトを使用して Tachyon からファイルを削除したいと考えています。

私は Scala 言語をよく理解していません。また、ファイルパスの操作に関するリファレンスが見つかりません。ScalaでJavaを使用してこれを行う方法を見つけましたが、Tachyonを使用して動作させることはできません。

2014-07-19T02:45:23.270

0 投票する

1 に答える

416 参照

amazon-s3 - ファイルシステム下の S3 で Tachyon をセットアップする際のエラー

S3 ファイルシステムで Tachyon をセットアップしようとしています。私は Tachyon にまったく慣れていないので、まだ内容を読んでいます。私の tachyon-env.sh を以下に示します。

ただし、Tachyon をフォーマットしようとすると、次のエラーが発生します。

jets3t jar ファイルを変更する必要がありますか、それとも別のものですか? 質問は本当に基本的なものかもしれませんが、それはまさに今の私のレベルです。ただし、Tachyon でいくつかの基本的なテストを実行しました。

どんな助けでも嬉しいです!!

amazon-s3 alluxio

2014-10-29T17:52:36.197

0 投票する

1 に答える

1394 参照

apache-spark - Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

I'm trying to understand if the Spark Driver is a single point of failure when deploying in cluster mode for Yarn. So I'd like to get a better grasp of the innards of the failover process regarding the YARN Container of the Spark Driver in this context.

I know that the Spark Driver will run in the Spark Application Master inside a Yarn Container. The Spark Application Master will request resources to the YARN Resource Manager if required. But I haven't been able to find a document with enough detail about the failover process in the event of the YARN Container of the Spark Application Master (and Spark driver) failing.

I'm trying to find out some detailed resources that can allow me to answer some questions related to the following scenario: If the host machine of the YARN Container that runs the Spark Application Master / Spark Driver losses network connectivity for 1 hour:

Does the YARN Resource Manager spawn a new YARN Container with another Spark Application Master/Spark Driver?
In that case (spawning a new YARN Container), does it start the Spark Driver from scratch if at least 1 stage in 1 of the Executors had been completed and notified as such to the original Driver before it failed? Does the option used in persist() make a difference here? And will the new Spark Driver know that the executor had completed 1 stage? Would Tachyon help out in this scenario?
Does a failback process get triggered if network connectivity is recovered in the YARN Container's host machine of the original Spark Application Master? I guess that this behaviour can be controlled from YARN, but I don't know what's the default when deploying SPARK in cluster mode.

I'd really appreciate it if you can point me out to some documents / web pages where the Architecture of Spark in yarn-cluster mode and the failover process are explored in detail.

apache-spark hadoop hadoop-yarn alluxio

2015-01-18T12:29:23.377

0 投票する

1 に答える

436 参照

apache-spark - Tachyon: copyFromLocal コマンド中に名前を変更できませんでした

Apache Spark を使用してアプリケーションを構築しています。RDD を他のアプリケーションから利用できるようにするために、次の 2 つの方法を試しています。

タキオンの使用
spark-jobserver の使用

私はタキオンが初めてです。「クラスターでの Tachyon の実行」に記載されている次のタスクを完了しました。

master:19999URLから UI にアクセスできます。

tachyonディレクトリからディレクトリを正常に作成しました./bin/tachyon tfs mkdir /Test が、copyFromLocal コマンドを実行しようとすると、次のエラーが発生します。

apache-spark alluxio

2015-01-21T12:17:01.080

0 投票する

1 に答える

852 参照

apache-spark - Tachyon はデフォルトで Apache Spark の RDD によって実装されていますか?

Spark のインメモリ機能を理解しようとしています。このプロセスで、基本的にメモリデータレイヤーにあるTachyon に出会いました。これは、リネージュシステムを使用してレプリケーションなしでフォールトトレランスを提供し、データセットのチェックポイントによって再計算を減らします。ここで混乱したのは、これらの機能はすべて、Spark の標準RDDシステムでも実現できるということです。では、これらの機能を実装するために、RDD はカーテンの後ろに Tachyon を実装しているのだろうか? そうでない場合は、標準の RDD ですべての作業を実行できる Tachyon の使用方法です。それとも、これら2つを関連付ける際に何か間違いを犯していますか? 詳細な説明またはリンクは非常に役立ちます。ありがとうございました。

apache-spark bigdata rdd in-memory-database alluxio

2015-04-22T13:53:45.447

0 投票する

1 に答える

88 参照

apache-spark - apache-spark デプロイメント: スタンドアロン VS 複数の VM

Spark、Hadoop、および Tachyon をデプロイするマシンが 1 台あります。hdfs/tachyon からの spark 操作は、すべてのコア/RAM またはリソースを均等に分割する複数の VM ノードを使用して 1 つのノードで高速になりますか? RAMは200GB未満です。

Spark での Broadcast のパフォーマンスとスケーラビリティはかなり古いものですが、ネットワークトラフィックの増加は、すべての VM と VM の問題に大きく影響する可能性があることを示唆しています。

apache-spark hadoop hdfs alluxio

2015-05-21T17:12:14.913

問題タブ [alluxio]

scala - Spark Tachyon: ファイルを削除するには?

amazon-s3 - ファイルシステム下の S3 で Tachyon をセットアップする際のエラー

apache-spark - Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

apache-spark - Tachyon: copyFromLocal コマンド中に名前を変更できませんでした

apache-spark - Tachyon はデフォルトで Apache Spark の RDD によって実装されていますか?

apache-spark - apache-spark デプロイメント: スタンドアロン VS 複数の VM

Reference