問題タブ [pytorch-lightning]

質問する

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

257 問題

0 投票する

1 に答える

742 参照

python - PyTorch Lightning で複数のモデルを実行する際の問題

私は、それぞれ独自の TensorBoard プロットとログを使用して、Lightning を使用して数十の個別のモデル (>50) をトレーニングする必要があるシステムを開発しています。私の現在の実装には、モデルごとに 1 つの Trainer オブジェクトがあり、~90 の Trainer オブジェクトを超えると、このエラーが発生するようです。興味深いことに、エラーは、.fit() ではなく、.test() メソッドを実行したときにのみ表示されます。

私はライトニングを始めたばかりなので、トレーナー/モデルを 1 つ持つことが最善のアプローチかどうかはわかりません。ただし、各モデルから個別のプロットが必要であり、複数のモデルに対して単一のトレーナーを使用すると、結果が上書きされるようです。

参考までに、トレーナーのさまざまなリストを次のように定義しています。

トレーニングに関しては：

そしてテスト：

ありがとう！

2020-08-03T10:21:58.077

0 投票する

1 に答える

518 参照

pytorch - What is the correct way to implement gradient accumulation in pytorch?

Broadly there are two ways:

Call loss.backward() on every batch, but only call optimizer.step() and optimizer.zero_grad() every N batches. Is it the case that the gradients of the N batches are summed up? Hence to maintain the same learning rate per effective batch, we have to divide the learning rate by N?
Accumulate loss instead of gradient, and call (loss / N).backward() every N batches. This is easy to understand, but does it defeat the purpose of saving memory (because the gradients of the N batches are computed at once)? The learning rate doesn't need adjusting to maintain the same learning rate per effective batch, but should be multiplied by N if you want to maintain the same learning rate per example.

Which one is better, or more commonly used in packages such as pytorch-lightning? It seems that optimizer.zero_grad() is a prefect fit for gradient accumulation, therefore (1) should be recommended.

pytorch pytorch-lightning

2020-09-09T15:57:00.110

1 2 3 4 5 6 7 8 9 10

問題タブ [pytorch-lightning]

python - PyTorch Lightning で複数のモデルを実行する際の問題

pytorch - What is the correct way to implement gradient accumulation in pytorch?

Reference