bash - 名前付きパイプを使用した重いプロセスの結果と既存のファイルを比較する

Question

既存のファイルをプロセスの結果（重いもの、繰り返さない）と比較し、既存のファイルを一時ファイルに書き込むことなく、そのプロセスの結果で上書きする方法を見つけようとしています(これは、既存のファイルとほぼ同じサイズの大きな一時ファイルになります。効率的になり、必要な 2 倍のスペースを取らないようにしましょう)。

/tmp/replace_with_that通常のファイル(以下を参照) を fifoに置き換えたいのですが、もちろん、以下のコードでそう/tmp/replace_with_thatすると、既存のファイルと名前付きパイプを比較する前に fifo を読み取ることができないため、スクリプトがロックされるだけです。/tmp/test_against_this

#!/bin/bash

mkfifo /tmp/test_against_this
: > /tmp/replace_with_that    

echo 'A B C D' >/some/existing/file

{
  #A very heavy process not to repeat;
  #Solved: we used a named pipe.
  #Its large output should not be sent to a file
  #To solve: using this code, we write the output to a regular file

  for LETTER in "A B C D E"  
  do  
      echo $LETTER      
  done  

} | tee /tmp/test_against_this /tmp/replace_with_that >/dev/null &  

if cmp -s /some/existing/file /tmp/test_against_this
then  
    echo Exact copy
    #Don't do a thing to /some/existing/file
else
    echo Differs
    #Clobber /some/existing/file with /tmp/replace_with_that
    cat /tmp/replace_with_that >/some/existing/file
fi  

rm -f /tmp/test_against_this  
rm -f /tmp/replace_with_that

score 0 · Accepted Answer

完全を期すために、私の答え（パイプの使用を調べたかった）：

既存のファイルを不必要に上書きせずに（ストリームとファイルが正確なコピーである場合はそのままにしておく）、時には大きな一時ファイル（aaの製品）を作成せずに、ストリームと既存のファイルをオンザフライで比較する方法を見つけようとしていましたたとえば、mysqldump のような重いプロセス)。このソリューションは、パイプ (名前付きおよび匿名) のみに依存する必要があり、場合によってはいくつかの非常に小さな一時ファイルに依存する必要がありました。

twalberg によって提案されたチェックサムソリューションは問題ありませんが、大きなファイルでの md5sum 呼び出しはプロセッサに負荷がかかります (処理時間はファイルサイズに比例して長くなります)。cmp の方が高速です。

以下にリストされている関数の呼び出し例:

#!/bin/bash

mkfifo /tmp/fifo

mysqldump --skip-comments $HOST $USER $PASSWORD $DB >/tmp/fifo &

create_or_replace /some/existing/dump /tmp/fifo

#This also works, but depending on the anonymous fifo setup, seems less robust

create_or_replace /some/existing/dump <(mysqldump --skip-comments $HOST $USER $PASSWORD $DB)

機能:

#!/bin/bash

checkdiff(){
    local originalfilepath="$1"
    local differs="$2"
    local streamsize="$3"
    local timeoutseconds="$4"
    local originalfilesize=$(stat -c '%s' "$originalfilepath")
    local starttime
    local stoptime

    #Hackish: we can't know for sure when the wc subprocess will have produced the streamsize file
    starttime=$(date +%s)
    stoptime=$(( $starttime + $timeoutseconds ))
    while ([[ ! -f "$streamsize" ]] && (( $stoptime > $(date +%s) ))); do :; done;

    if ([[ ! -f "$streamsize" ]] || (( $originalfilesize == $(cat "$streamsize" | head -1) )))
    then
        #Using streams that were exact copies of files to compare with,
        #on average, with just a few test runs:
        #diff slowest, md5sum 2% faster than diff, and cmp method 5% faster than md5sum
        #Did not test, but on large unequal files,
        #cmp method should be way ahead of the 2 other methods
        #since equal files is the worst case scenario for cmp

        #diff -q --speed-large-files <(sort "$originalfilepath") <(sort -) >"$differs"
        #( [[ $(md5sum "$originalfilepath" | cut -b-32) = $(md5sum - | cut -b-32) ]] && : || echo -n '1' ) >"$differs" 
        ( cmp -s "$originalfilepath" - && : || echo -n '1' ) >"$differs"
    else
        echo -n '1' >"$differs"
    fi
}

create_or_replace(){

    local originalfilepath="$1"
    local newfilepath="$2" #Should be a pipe, but could be a regular file
    local differs="$originalfilepath.differs"
    local streamsize="$originalfilepath.size"
    local timeoutseconds=30
    local starttime
    local stoptime

    if [[ -f "$originalfilepath" ]]
    then
        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

        #cat the pipe, get its size, check for differences between the stream and the file and pipe the stream into the original file if all checks show a diff
        cat "$newfilepath" |
        tee >(wc -m - | cut -f1 -d' ' >"$streamsize") >(checkdiff "$originalfilepath" "$differs" "$streamsize" "$timeoutseconds") | {

                #Hackish: we can't know for sure when the checkdiff subprocess will have produced the differs file
                starttime=$(date +%s)
                stoptime=$(( $starttime + $timeoutseconds ))
                while ([[ ! -f "$differs" ]] && (( $stoptime > $(date +%s) ))); do :; done;

                [[ ! -f "$differs" ]] || [[ ! -z $(cat "$differs" | head -1) ]] && cat - >"$originalfilepath"
        }

        #Cleanup
        [[ -f "$differs" ]] && rm -f "$differs"
        [[ -f "$streamsize" ]] && rm -f "$streamsize"

    else
        cat "$newfilepath" >"$originalfilepath"
    fi
}

score 0 · Accepted Answer

別のアプローチをお勧めします。

既存のファイルの MD5/SHA1/SHA256/whatever ハッシュを生成します
重いプロセスを実行し、出力ファイルを置き換えます
新しいファイルのハッシュを生成する
ハッシュが一致する場合、ファイルは同じです。そうでない場合、新しいファイルは異なります

bash - 名前付きパイプを使用した重いプロセスの結果と既存のファイルを比較する

2 に答える 2

Related

Reference