I was just experimenting with archive/tar and compress/gzip, for automated processing of some backups I have.
My problem hereby is: I have various .tar files and .tar.gz files floating around, and thus I want to extract the hash (md5) of the .tar.gz file, and the hash (md5) of the .tar file as well, ideally in one run.
The example code I have so far, works perfectly fine for the hashes of the files in the .tar.gz as well for the .gz, but the hash for the .tar is wrong and I can't find out what the problem is.
I looked at the tar/reader.go file and I saw that there is some skipping in there, yet I thought everything should run over the io.Reader interface and thus the TeeReader should still catch all the bytes.
package main
import (
"archive/tar"
"compress/gzip"
"crypto/md5"
"fmt"
"io"
"os"
)
func main() {
tgz, _ := os.Open("tb.tar.gz")
gzMd5 := md5.New()
gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
tarMd5 := md5.New()
tr := tar.NewReader(io.TeeReader(gz, tarMd5))
for {
fileMd5 := md5.New()
hdr, err := tr.Next()
if err == io.EOF {
break
}
io.Copy(fileMd5, tr)
fmt.Printf("%x %s\n", fileMd5.Sum(nil), hdr.Name)
}
fmt.Printf("%x tb.tar\n", tarMd5.Sum(nil))
fmt.Printf("%x tb.tar.gz\n", gzMd5.Sum(nil))
}
Now for the following example:
$ echo "a" > a.txt
$ echo "b" > b.txt
$ tar cf tb.tar a.txt b.txt
$ gzip -c tb.tar > tb.tar.gz
$ md5sum a.txt b.txt tb.tar tb.tar.gz
60b725f10c9c85c70d97880dfe8191b3 a.txt
3b5d5c3712955042212316173ccf37be b.txt
501352dcd8fbd0b8e3e887f7dafd9392 tb.tar
90d6ba204493d8e54d3b3b155bb7f370 tb.tar.gz
On Linux Mint 14 (based on Ubuntu 12.04) with go 1.02 from the Ubuntu repositories the result for my go program is:
$ go run tarmd5.go
60b725f10c9c85c70d97880dfe8191b3 a.txt
3b5d5c3712955042212316173ccf37be b.txt
a26ddab1c324780ccb5199ef4dc38691 tb.tar
90d6ba204493d8e54d3b3b155bb7f370 tb.tar.gz
So all hashes except for tb.tar are as expected. (Of course if you retry that example your .tar and .tar.gz will be different from this, because of different timestamps)
Any hint about how to get it work would be greatly appreciated, I really would prefer to have it in 1 run though (with the TeeReaders).