python-3.x - MLflow: アクティブな実行 ID が環境の実行 ID と一致しません

Question

OS: Ubuntu 18

パイソン: パイソン 3.6

MLフロー: 1.4

MLflow プロジェクトを実行しようとしています。これが私のプロジェクトです：

MLflow
- conda.yaml
- main.py
- prep_data.py
- Learn.py
- リスト項目

プロジェクトは、このリポジトリに大きく基づいています: https://github.com/mlflow/mlflow/tree/master/examples/multistep_workflow MLflow プロジェクトと main.py スクリプトを使用して、prep_data スクリプトと学習スクリプトの両方を実行しようとしています。エントリーポイントとして。実行には、次のコマンドを使用します。mlflow run . -P experiment_name=testproject

しかし、次のエラーが表示されます。

Traceback (most recent call last):
  File "prep_data.py", line 126, in <module>
    prep_data()
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
   return self.main(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "prep_data.py", line 65, in prep_data
    with mlflow.start_run() as active_run:
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 129, in start_run
    "arguments".format(existing_run_id))
mlflow.exceptions.MlflowException: Cannot start run with ID 405b83bbb61046afa83b8dcd71b4db14 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
Traceback (most recent call last):
  File "main.py", line 75, in <module>
    workflow()
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "main.py", line 61, in workflow
    }, experiment_name)
  File "main.py", line 40, in _get_or_run
    submitted_run = mlflow.run('.', entry_point=entry_point, parameters=params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/projects/__init__.py", line 287, in run
    _wait_for(submitted_run_obj)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/projects/__init__.py", line 304, in _wait_for
    raise ExecutionException("Run (ID '%s') failed" % run_id)
mlflow.exceptions.ExecutionException: Run (ID '405b83bbb61046afa83b8dcd71b4db14') failed
2019/11/22 18:51:59 ERROR mlflow.cli: === Run (ID '62c229b2d9194b569a7b2bfc14338800') failed ===

エラーを正しく理解しているかどうかはわかりませんが、複数の実験を使用していると言っているようです。ただし、使用しているのは 1 つ (testproject) だけであると確信しています。SOとGithubの問題を閲覧すると、環境変数を設定する必要があることが示唆されましたMLFLOW_TRACKING_URIが、その設定方法については述べられていませんでした。したがって、私は 2 つの異なる方法を試しました: 1) MLflow プロジェクトを実行する前にエクスポートします: $ export MLFLOW_TRACKING_URI='http://127.0.0.1:5099' 2) Python を使用して main.py スクリプトの先頭に設定します:os.environ['MLFLOW_TRACKING_URI'] = 'http://127.0.0.1:5099' どちらも効果がありませんでした。ここで私のプロジェクトを見ることができます：

main.py

import os
import click
import mlflow
from mlflow.entities import RunStatus
def _already_ran(entry_point, params, experiment_name):
    # experiment = mlflow.get_experiment_by_name('{}_{}'.format(experiment_name, entry_point))
    experiment = mlflow.get_experiment_by_name(experiment_name)
    if experiment == None:
        return None
    experiment_id = experiment.experiment_id
    client = mlflow.tracking.MlflowClient()
    all_run_infos = reversed(client.list_run_infos(experiment_id))
    match_failed = False
    for run_info in all_run_infos
        full_run = client.get_run(run_info.run_id)
        for p_key, p_val in params:
            run_value = full_run.data.params.get(p_key)
            if run_value != p_val:
                match_failed = True
                break
        if match_failed:
            continue
        if run_info.to_proto().status != RunStatus.FINISHED:
            continue
        return client.get_run(run_info.run_id)
    return None


def _get_or_run(entry_point, params, experiment_name, use_cache=True):
    existing_run = _already_ran(entry_point, params, experiment_name)
    if use_cache and existing_run:
        return existing_run
    submitted_run = mlflow.run('.', entry_point=entry_point, parameters=params)
    return mlflow.tracking.MlflowClient().get_run(submitted_run.run_id)

@click.command()
@click.option("--experiment-name")
@click.option('--prep-data-time-avg', default='placeholder')
@click.option('--prep-data-sensor-id', default='placeholder')
@click.option('--learn-epochs', default=100, type=int)
@click.option('--learn-neurons', default=5, type=int)
@click.option('--learn-layers', default=2, type=int)
def workflow(experiment_name, prep_data_time_avg, prep_data_sensor_id, learn_epochs, learn_neurons, learn_layers):
    # mlflow.set_tracking_uri('http://127.0.0.1:5099')

    # mlflow.set_experiment(experiment_name)
    # with mlflow.start_run() as active_run:

    data_run = _get_or_run('prep_data', {
        'time_avg': prep_data_time_avg,
        'sensor_id':prep_data_sensor_id,
        'experiment_name': experiment_name
    }, experiment_name)

    learn_run = _get_or_run('learn', {
        'epochs': learn_epochs,
        'neurons': learn_neurons,
        'layers': learn_layers,
        'prep_data_run_id': data_run.run_id,
        'experiment_name': experiment_name,
    }, experiment_name)
if __name__ == '__main__':
    # os.environ['MLFLOW_TRACKING_URI'] = 'http://127.0.0.1:5099'
    workflow()

prep_data.py

@click.command()
@click.option("--experiment-name")
@click.option('--time-avg', default='placeholder')
@click.option('--sensor-id', default='placeholder')
def prep_data(experiment_name, time_avg, sensor_id):
    mlflow.set_experiment(experiment_name)
    with mlflow.start_run() as active_run:
      # logic code of prep_data

if __name__ == '__main__':
    prep_data()

この問題を解決する方法についてのアイデアがあれば幸いです。

どうもありがとうございました！

乾杯、ラファエル

score 1 · Accepted Answer

mlflow CLI に同じ実験名を指定する必要があります。

mlflow run . -P experiment_name=testproject --experiment-name testproject

詳細: https://www.mlflow.org/docs/latest/cli.html#mlflow-run

python-3.x - MLflow: アクティブな実行 ID が環境の実行 ID と一致しません

1 に答える 1

Related

Reference