rsync - rsync --sparse はデータ全体を転送します

Question

毎日同期する必要がある VM イメージがいくつかあります。VM ファイルはスパースです。

ネットワークトラフィックを節約するために、画像の実際のデータのみを転送したいと考えています。rsync で --sparse オプションを使用して試してみましたが、ネットワークトラフィックでは、実際のデータ使用量だけでなく、サイズ全体がネットワーク経由で転送されることがわかりました。

rsync -zv --sparse を使用すると、実際のサイズのみがネットワーク経由で送信され、すべて問題ありません。しかし、CPU使用率のためにファイルを圧縮したくありません。

--sparse オプションは実際のデータのみを転送し、「null データ」はネットワークトラフィックを節約するためにローカルに作成されるべきではありませんか?

圧縮しない回避策はありますか?

ありがとう！

score 11 · Accepted Answer

Take a look a this discussion, specifically, this answer.

It seems that the solution is to do a rsync --sparse followed by a rsync --inplace.

On the first, --sparse, call, also use --ignore-existing to prevent already transferred sparse files to be overwritten, and -z to save network resources.

The second call, --inplace, should ~~update only modified chunks~~. Here, compression is optional.

Also see this post.

Update

I believe the suggestions above won't solve your problem. I also believe that rsync is not the right tool for the task. You should search for other tools which will give you a good balance between network and disk I/O efficiency.

Rsync was designed for efficient usage of a single resource, the network. It assumes reading and writing to the network is much more expensive than reading and writing the source and destination files.

We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The rsync algorithm, abstract.

The algorithm, summarized in four steps.

The receiving side β sends checksums of blocks of size S of the destination file B.
The sending side α identify blocks that match in the source file A, at any offset.
α sends β a list of instructions made of either verbatim, non-matching, data, or matching block references.
β reconstructs the whole file from those instructions.

Notice that rsync normally reconstructs the file B as a temporary file T, then replaces B with T. In this case it must write the whole file.

The --inplace does not relieve rsync from writing blocks matched by α, as one could imagine. They can match at different offsets. Scanning B a second time to take new data checksums is prohibitive in terms of performance. A block that matches in the same offset it was read on step one could be skipped, but rsync does not do that. In the case of a sparse file, a null block of B would match for every null block of A, and would have to be rewritten.

The --inplace just causes rsync to write directly to B, instead of T. It will rewrite the whole file.

score 2 · Accepted Answer

圧縮レベルを最低値に変更してみてください (オプションを使用--compress-level=1)。スパースファイルのトラフィックを減らすには、最低の圧縮レベルで十分なようです。しかし、CPU使用率がどのように影響を受けるかはわかりません。

rsync - rsync --sparse はデータ全体を転送します

3 に答える 3

Related

Reference