apache-spark - Spark 2.0 Standalone mode Dynamic Resource Allocation Worker Launch Error

Question

I'm running Spark 2.0 on Standalone mode, successfully configured it to launch on a server and also was able to configure Ipython Kernel PySpark as option into Jupyter Notebook. Everything works fine but I'm facing the problem that for each Notebook that I launch, all of my 4 workers are assigned to that application. So if another person from my team try to launch another Notebook with PySpark kernel, it simply does not work until I stop the first notebook and release all the workers.

To solve this problem I'm trying to follow the instructions from Spark 2.0 Documentation. So, on my $SPARK_HOME/conf/spark-defaults.conf I have the following lines:

spark.dynamicAllocation.enabled    true
spark.shuffle.service.enabled      true
spark.dynamicAllocation.executorIdleTimeout    10

Also, on $SPARK_HOME/conf/spark-env.sh I have:

export SPARK_WORKER_MEMORY=1g
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_WORKER_INSTANCES=4
export SPARK_WORKER_CORES=1

But when I try to launch the workers, using $SPARK_HOME/sbin/start-slaves.sh, only the first worker is successfully launched. The log from the first worker end up like this:

16/11/24 13:32:06 INFO Worker: Successfully registered with master spark://cerberus:7077

But the log from workers 2-4 show me this error:

INFO ExternalShuffleService: Starting shuffle service on port 7337 with useSasl = false 16/11/24 13:32:08 ERROR Inbox: Ignoring error java.net.BindException: Address already in use

It seems (to me) that the first worker successfully starts the shuffle-service at port 7337, but the workers 2-4 "does not know" about this and try to launch another shuffle-service on the same port.

The problem occurs also for all workers (1-4) if I first launch a shuffle-service (using $SPARK_HOME/sbin/start-shuffle-service.sh) and then try to launch all the workers ($SPARK_HOME/sbin/start-slaves.sh).

Is any option to get around this? To be able to all workers verfy if there is a shuffle service running and connect to it instead of try to create a new service?

apache-spark - Spark 2.0 Standalone mode Dynamic Resource Allocation Worker Launch Error

1 に答える 1

Related

Reference