2024-03-27

tar or tgz?


A thing I've done over the last while is automate a lot of my sysadmin, using systemd timers to hit scala-cli scripts.

I've built for myself a little framework that is incredibly in need of documentation, but that lets me define scripts very flexibly and can provide great step-by-step information about what happens and anything that goes wrong in brightly colored HTML e-mails. I love it.

Much of what I do is back things up to a cloud service using rclone.

Last night I wrote a script just to back up some directory. I ran into what I consider an age-old dilemma.

Sometime early in my geekish career, I picked up the nugget that its best to keep important backups as straight tar files rather than tgz (or tbz or whatever), because if some bit gets corrupted, most of a straight tar will remain recoverable, while the compressed archive will just be toast.

Is that right? Is it a real concern? I don't think I've ever experienced a corrupted archive, tar or tgz, but of course backup is a form of insurance, the whole point is to be resilient to tail risks.

Still, searching the interwebs, I don't see a lot of people recommending uncompressed archives. Space is more of a bottleneck to me than CPU or time, so if the resilience advantage isn't significant, I'd compress.

What do you think?


Update: Feel free to comment here