python - モデルの剪定は、推論速度を向上させたり、モデルサイズを縮小したりしません

Question

torch.nn.utils.prune2 つのテンソルを提供するを使用して、PyTorch でモデルをプルーニングしようとしています。

1 つは元の重量で、
もう 1 つは、ネットワーク内の特定の接続を閉じるのに役立つ 0 と 1 を含むマスクです。

私は両方のソリューションを試しましたが、推論速度を向上させるものはありません:

プルーニング後にネットワークを使用して、最初にマスクとのいくつかの接続を閉じてから推論を実行する推論を行います。
マスクを使用して元の重みをゼロにし、state_dict からマスクを削除して推測します。

モデルテンソルとマスクで速度を改善する方法はありますか? ゼロ以外の浮動小数点数を 0 で乗算しない方が、2 つの浮動小数点数を互いに乗算するよりも速くなりますか?
これが私の剪定関数と剪定速度の計算手順です。

def prune_net(net):
    """Prune 20% net's weights that have abs(value) approx. 0
    Function that will be use when an iteration is reach
    Args:

    Return:
        newnet (nn.Module): a newnet contain mask that help prune network's weight
    """
    if not isinstance(net,nn.Module):
        print('Invalid input. Must be nn.Module')
        return
    newnet = copy.copy(net)
    modules_list = []

    for name, module in newnet.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            modules_list += [(module,'weight'),(module,'bias')]
        if isinstance(module, torch.nn.Linear):
            modules_list += [(module,'weight'),(module,'bias')]

    prune.global_unstructured(
        modules_list,
        pruning_method=prune.L1Unstructured,
        amount=0.2,)
    return newnet

推論速度のテスト 1 番目のケース:

import torch
from torch import nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F
import time
from torch.autograd import Variable


torch.set_default_tensor_type('torch.cuda.FloatTensor')
old_net = init_your_net()

new_net = prune_net(old_net)
new_net = prune_net(new_net)

old_net.eval()
new_net.eval()

old_net = old_net.cuda()
new_net = new_net.cuda()
dataset = load_your_dataset()

for i in range(100):
    x = dataset[i]
    x = x.cuda()
    y = x.cuda()

    #new infer
    start_time = time.perf_counter()
    detections = new_net(x).data
    time_new += time.perf_counter() - start_time

    #old infer
    start_time = time.perf_counter()
    detections = old_net(y).data
    time_old += time.perf_counter() - start_time
print('old ',time_old)
print('new ', time_new)

推論速度のテスト 2 番目のケース:

import torch
from torch import nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F
import time
from torch.autograd import Variable


torch.set_default_tensor_type('torch.cuda.FloatTensor')
old_net = init_your_net()

new_net = prune_net(old_net)
new_net = prune_net(new_net)
# Apply mask to model tensor and remove mask from state_dict
for name, module in new_net.named_modules():
    if isinstance(module, torch.nn.Conv2d):
        prune.remove(module,'weight')
        prune.remove(module,'bias')
    if isinstance(module, torch.nn.Linear):
        prune.remove(module,'weight')
        prune.remove(module,'bias')

old_net.eval()
new_net.eval()

old_net = old_net.cuda()
new_net = new_net.cuda()
dataset = load_your_dataset()

for i in range(100):
    x = dataset[i]
    x = x.cuda()
    y = x.cuda()

    #new infer
    start_time = time.perf_counter()
    detections = new_net(x).data
    time_new += time.perf_counter() - start_time

    #old infer
    start_time = time.perf_counter()
    detections = old_net(y).data
    time_old += time.perf_counter() - start_time
print('old ',time_old)
print('new ', time_new)

更新
トーチには、十分なパラメーターを削除するとメモリ使用量を削減できるスパースモジュールがあることがわかりましたが、まだ nn.Module をサポートしておらず、Tensor オブジェクトのみをサポートしています。ここにいくつかの便利なリンクがあります:
https://github.com/pytorch/pytorch/issues/36214#issuecomment-619586452
https://pytorch.org/docs/stable/sparse.html

score 0 · Accepted Answer

また、推論速度を上げるために剪定を試みています。しかし、私がより便利だと思ったのは、代わりに ONNX と ONNXRuntime を使用することです。すべての手順のリンクは次のとおりです。

https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html

精度を損なうことなく、時間を最大 85% 短縮します。

python - モデルの剪定は、推論速度を向上させたり、モデルサイズを縮小したりしません

2 に答える 2

Related

Reference