python - sklearnでkfpアーティファクトを使用するには?

Question

Vertex AI (Google Cloud Platform) 内の kubeflow パイプライン (kfp) コンポーネントを使用してカスタムパイプラインを開発しようとしています。パイプラインの手順は次のとおりです。

大きなクエリテーブルからデータを読み取る
パンダを作るDataFrame
を使用しDataFrameて K-Means モデルをトレーニングする
モデルをエンドポイントにデプロイする

ここにステップ 2 のコードがあります。ここで見つけた型が機能しなかっOutput[Artifact]たため、出力として使用する必要がありました。pd.DataFrame

@component(base_image="python:3.9", packages_to_install=["google-cloud-bigquery","pandas","pyarrow"])
def create_dataframe(
    project: str,
    region: str,
    destination_dataset: str,
    destination_table_name: str,
    df: Output[Artifact],
):
    
    from google.cloud import bigquery
    
    client = bigquery.Client(project=project, location=region)
    dataset_ref = bigquery.DatasetReference(project, destination_dataset)
    table_ref = dataset_ref.table(destination_table_name)
    table = client.get_table(table_ref)

    df = client.list_rows(table).to_dataframe()

ステップ3のコードは次のとおりです。

@component(base_image="python:3.9", packages_to_install=['sklearn'])
def kmeans_training(
        dataset: Input[Artifact],
        model: Output[Model],
        num_clusters: int,
):
    from sklearn.cluster import KMeans
    model = KMeans(num_clusters, random_state=220417)
    model.fit(dataset)

次のエラーが発生したため、パイプラインの実行が停止しました。

TypeError: float() argument must be a string or a number, not 'Artifact'

Artifact をnumpy arrayorに変換することは可能Dataframeですか?

python - sklearnでkfpアーティファクトを使用するには?

1 に答える 1

Related

Reference