c# - 並列拡張

Question

ファイルのコピー、ファイルシステム内でのファイルの圧縮と移動、バックアップサーバーへのコピーなど、IO操作が多いアプリケーションがあります。

私はこのプログラムをシングルスレッドとしてビルドします。それは2分で実行されます。

このプログラムの別のバージョンをParallel拡張機能で作成し、Taskを使用して作成しました。これもほぼ2分で実行されます。

つまり、IOが重いため、Parallelsを使用してもパフォーマンスが向上することはありませんでした。

アプリケーションをブレードサーバーにデプロイした場合、同じ結果が得られますか？

ブレードサーバーは、ワークステーションよりも高速/マルチチャネルでIOを処理しますか？

ParallelsをIOバウンドアプリケーションで使用するメリットはありませんか？

score 6 · Accepted Answer

If all you're doing is copying or moving files across the system then the parallelism provided by the TPL isn't going to do you much good. Moving for example really doesn't use any CPU it simply changes the files location in the disk's directory record structure.

File compression is a different story. Here you're loading data and using the CPU to compress it before saving it out to disk. You might be able to use a pipeline or parallel loop to load/compress/save the data in a more efficient way. Instead of having one thread work on compressing each file you could have multiple threads working on different files.

The following code compresses a load of files sequentially and then in parallel. I get the following times on an i7 920 and with a intel X25 SSD compressing 329 JPG images totalling 800Mb of data.

Sequential: 39901ms

Parallel: 12404ms

class Program
{
    static void Main(string[] args)
    {
        string[] paths = Directory.GetFiles(@"C:\temp", "*.jpg");

        DirectoryInfo di = new DirectoryInfo(@"C:\temp");

        Stopwatch sw = new Stopwatch();
        sw.Start();
        foreach (FileInfo fi in di.GetFiles("*.jpg"))
        {
            Compress(fi);
        }
        sw.Stop();
        Console.WriteLine("Sequential: " + sw.ElapsedMilliseconds);

        Console.WriteLine("Delete the results files and then rerun...");
        Console.ReadKey();

        sw.Reset();
        sw.Start();
        Parallel.ForEach(di.GetFiles("*.jpg"), (fi) => { Compress(fi); });
        sw.Stop();

        Console.WriteLine("Parallel: " + sw.ElapsedMilliseconds);
        Console.ReadKey();
    }

    public static void Compress(FileInfo fi)
    {
        using (FileStream inFile = fi.OpenRead())
        {
            if ((File.GetAttributes(fi.FullName)
                & FileAttributes.Hidden)
                != FileAttributes.Hidden & fi.Extension != ".gz")
            {
                using (FileStream outFile =
                            File.Create(fi.FullName + ".gz"))
                {
                    using (GZipStream Compress =
                        new GZipStream(outFile,
                        CompressionMode.Compress))
                    {
                        inFile.CopyTo(Compress);
                    }
                }
            }
        }
    }
}

For the compression code see How to: Compress Files

score 1 · Accepted Answer

1つの物理デバイス上でファイルを移動する場合、同じ1つのデバイスに対して複数の並列IO要求を行うことによるパフォーマンス上のメリットはあまりありません。デバイスはすでにCPUよりも何桁も低速で動作しているため、並行して行われた複数の要求は、デバイス上で1つずつ処理されるように並んでいます。並列コードは、一度に複数のリクエストを実際に処理できない同じデバイスにすべてアクセスしているため、シリアル化されています。

ディスクコントローラーが「エレベーターシーク」、「スキャッターギャザー」、またはその他の順不同の操作を実装している場合、並列コードでパフォーマンスがわずかに向上することがありますが、パフォーマンスの違いは比較的小さくなります。

Where you should find a more rewarding perf difference for file I/O is when you're moving files between many different physical devices. You should be able to move or copy a file on disk A to some other location on disk A while also copying a file on disk B to disk C. With many physical devices, you don't have all the parallel requests stacking up waiting for the one device to fill all the requests.

You'll probably see similar results with network I/O: If everything is going through one ethernet card / network segment you're not going to realize as much parallelism as when you have multiple ethernet cards and multiple network segments to work with.

score 0 · Accepted Answer

並列拡張機能の利点は、CPU操作に大きな影響を与える可能性があると思います。それがIOthoにどのように影響するかをDonnu。

score 0 · Accepted Answer

It all depends on whether you are CPU bound or IO bound. I would suggest doing some performance testing to see where you bottle necks are.

If you find you are moving and compressing a lot of files (to different disks, as a move on the same disk is just a FAT table change) you might want to look at implementing a streaming file mover that compresses as it moves. This can save the extra IO of re-reading the files after moving them. I have done this with moving and checksumming and in my case was a huge performance bump.

Hope this helps.

score 0 · Accepted Answer

I have an application that is implemented in WinForms that processes ~7,800 URLs in approximately 5 minutes (downloads the URL, parses the content, looks for specific pieces of data and if it finds what its looking for does some additional processing of that data.

This specific application used to take between 26 to 30 minutes to run, but by changing the code to the TPL (Task Parallel Library in .NET v4.0) it executes in just 5. The computer is a Dell T7500 workstation with dual quad core Xeon processors (3 GHz), running with 24 GB of RAM, and Windows 7 Ultimate 64-bit edition.

Though, it's not exactly the same as your situation this too is extremely IO intensive. The documentation on TPL states it was originally conceived for processor bound problem sets, but this doesn't rule out using it in IO situations (as my application demonstrates to me). If you have at least 4 cores and you're not seeing your processing time drop significantly then it's possible you have other implementation issues that are preventing the TPL from really being efficient (locks, hard drive items, etc.). The book Parallel Programming with Microsoft .NET really helped me to understand "how" your code needs to be modified to really take advantage of all that power.

Worth a look in my opinion.

c# - 並列拡張

5 に答える 5

Related

Reference