distributed-computing - Around 5-10% executors are LOST in my mesos framework

翻译自：https://stackoverflow.com/questions/30494267 2015-05-27T22:34:52.370

140 次

I have a 200 node mesos cluster that can run around 2700 executors concurrently. Around 5-10% of my executors are LOST at the very beginning. They go only until extracting the executor tar file.

WARNING: Logging before InitGoogleLogging() is written to STDERR I0617 21:35:09.947180 45885 fetcher.cpp:76] Fetching URI 'http://download_url/remote_executor.tgz' I0617 21:35:09.947273 45885 fetcher.cpp:126] Downloading 'http://download_url/remote_executor.tgz' to '/mesos_dir/remote_executor.tgz' I0617 21:35:57.551722 45885 fetcher.cpp:64] Extracted resource '/mesos_dir/remote_executor.tgz' into '/extracting_mesos_dir/'

Please let me know if someone else is facing this issue.

I am using python to implement both the scheduler and executor. The executor code is a python file that extends base class 'Executor'. I have implemented the launchTasks method of Executor class that simply does what the executor is supposed to do.

The executor info is:

    executor = mesos_pb2.ExecutorInfo()
    executor.executor_id.value = "executor-%s" % (str(task_id),)
    executor.command.value = 'python -m myexecutor'

    # where to download executor from
    tar_uri = '%s/remote_executor.tgz' % (
        self.conf.remote_executor_cache_url)
    executor.command.uris.add().value = tar_uri
    executor.name = 'some_executor_name'
    executor.source = "executor_test"

distributed-computing - Around 5-10% executors are LOST in my mesos framework

1 に答える 1

Related

Reference