this post was submitted on 17 Apr 2025

45 points (97.9% liked)

Selfhosted

46067 readers

476 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

[email protected]

Incremental backups to optical media: tar, dar, or something else? (sh.itjust.works)

submitted 3 days ago* (last edited 3 days ago) by [email protected] to c/[email protected]

30 comments fedilink hide all child comments

I'm working on a project to back up my family photos from TrueNas to Blu-Ray disks. I have other, more traditional backups based on restic and zfs send/receive, but I don't like the fact that I could delete every copy using only the mouse and keyboard from my main PC. I want something that can't be ransomwared and that I can't screw up once created.

The dataset is currently about 2TB, and we're adding about 200GB per year. It's a lot of disks, but manageably so. I've purchased good quality 50GB blank disks and a burner, as well as a nice box and some silica gel packs to keep them cool, dark, dry, and generally protected. I'll be making one big initial backup, and then I'll run incremental backups ~monthly to capture new photos and edits to existing ones, at which time I'll also spot-check a disk or two for read errors using DVDisaster. I'm hoping to get 10 years out of this arrangement, though longer is of course better.

I've got most of the pieces worked out, but the last big question I need to answer is which software I will actually use to create the archive files. I've narrowed it down to two options: dar and bog-standard gnu tar. Both can create multipart, incremental backups, which is the core capability I need.

Dar Advantages (that I care about):

This is exactly what it's designed to do.
It can detect and tolerate data corruption. (I'll be adding ECC data to the disks using DVDisaster, but defense in depth is nice.)
More robust file change detection, it appears to be hash based?
It allows me to create a database I can use to locate and restore individual files without searching through many disks.

Dar disadvantages:

It appears to be a pretty obscure, generally inactive project. The documentation looks straight out of the early 2000s and it doesn't have https. I worry it will go offline, or I'll run into some weird bug that ruins the show.
Doesn't detect renames. Will back up a whole new copy. (Problematic if I get to reorganizing)
I can't find a maintained GUI project for it, and my wife ain't about to learn a CLI. Would be nice if I'm not the only person in the world who could get photos off of these disks.

Tar Advantages (that I care about):

battle-tested, reliable, not going anywhere
It's already installed on every single linux & mac PC , and it's trivial to put on a windows pc.
Correctly detects renames, does not create new copies.
There are maintained GUIs available; non-nerds may be able to access

Tar disadvantages:

I don't see an easy way to locate individual files, beyond grepping through snar metadata files (that aren't really meant for that).
The file change detection logic makes me nervous - it appears to be based on modification time and inode numbers. The photos are in a ZFS dataset on truenas, mounted on my local machine via SMB. I don't even know what an inode number is, how can I be sure that they won't change somehow? Am I stuck with this exact NAS setup until I'm ready to make a whole new base backup? This many blu-rays aren't cheap and burning them will take awhile, I don't want to do it unnecessarily.

I'm genuinely conflicted, but I'm leaning towards dar. Does anyone else have any experience with this sort of thing? Is there another option I'm missing? Any input is greatly appreciated!

top 30 comments

sorted by: hot top controversial new old

[–] [email protected] 10 points 3 days ago* (last edited 3 days ago) (3 children)

I don’t like the fact that I could delete every copy using only the mouse and keyboard from my main PC. I want something that can’t be ransomwared and that I can’t screw up once created.

Lots of ways to get around that without having to go the route of burning a hundred blu-rays with complicated (and risky) archive splitting and merging. Just a handful of external HDDs that you "zfs send" to and cycle on some regular schedule would handle that. So buy 3 drives, backup your data to all 3 of them, then unplug 2 and put them somewhere safe (desk at work, friend or family member's house, etc.). Continue backing up to the one you keep local for the next ~month and then rotate the drives. So at any given time you have a on-site copy that's up-to-date, and two off-site copies that are no more than 1 and 2 months old respectively. Immune to ransomware, accidental deletion, fire, flood, etc. and super easy to maintain and restore from.

[–] [email protected] 2 points 2 days ago (1 children)

I do this except the offline copies are raspberry pis, they grab an update then turn their network card off and go black for about a month. Randomly they turn on the network card, pull a fresh copy and go black again. Safe from randomware and automatic.

[–] [email protected] 3 points 2 days ago

Unless they are in different cities they wouldn't be safe from a fire, lightning strike, earth quake/flood/tsunami/typhon/hurricane/etc (remove whichever ones are not relevant to where you live).

[–] [email protected] 1 points 3 days ago

To add to this….ive added a layer of protection against accidental deletion and dumb fingering by making each year of my photos archive into a separate zfs dataset. Then each year I set each dataset to read-only and create a new one.

Manual, but effective enough. I also have automatic snapshots against dumb fingering, but this helps against ones I don’t notice before the snapshots expire.

[–] [email protected] 1 points 3 days ago

Yeah, you're probably right. I already bought all the stuff, though. This project is halfway vibes based; something about spinning rust just feels fragile you know?

I'm definitely moving away from the complex archive split & merge solution. fpart can make lists of files that add up to a given size, and fd can find files modified since a given date. Little bit of plumbing and I've got incremental backups that show up as plain files & folders on a disk.

[–] [email protected] 5 points 3 days ago (1 children)

This is the sort of thing bacula was made for - physical backups spread out over multiple removable media (tapes mostly, but it can work with optical drives).

https://www.bacula.org/free-tape-backup-software/

It tracks where it puts your files, so it does have its own db that also needs backing up. But if you want to restore without needing to search manually through dozens of disks this is what you need.

[–] [email protected] 1 points 3 days ago* (last edited 3 days ago)

Hey cool, I hadn't heard of bacula! Looks like a really robust project. I did look into tape storage, but I can't find a tape drive for a reasonable price that doesn't have a high jank factor (internal, 5.25" drives with weird enterprise connectors and such).

I'm digging through their docs and I can't find anything about optical media, except for a page in the manual for an old version saying not to use it. Am I missing something? It seems heavly geared towards tapes.

[–] [email protected] 5 points 3 days ago (2 children)

You can't really easily locate where the last version of the file is located on an append-only media without writing the index in a footer somewhere, and even then if you're trying to pull an older version you'd still need to traverse the whole media.

That said, you use ZFS, so you can literally just zfs send it. ZFS will already know everything that needs to be known, so it'll be a perfect incremental. But you'd definitely need to restore the entire dataset to pull anything out of it, reapply every incremental one by one, and if just one is unreadable the whole pool is unrecoverable, but so would the tar incrementals. But it'll be as perfect and efficient as possible, as ZFS knows the exact change set it needs to bundle up. It's unidirectional, so that's why you can just zfs send into a file and burn it to a CD.

Since ZFS can easily tell you the difference between two snapshots, it also wouldn't be too hard to make a Python script that writes the full new version of changed files and catalogs what file and what version is on which disc, for a more random access pattern.

But really for Blurays I think I'd just do it the old fashioned way and classify it to fit on a disc and label it with what's on it, and if I update it make a v2 of it on the next disc.

[–] [email protected] 3 points 3 days ago* (last edited 3 days ago) (1 children)

Ohhh boy, after so many people are suggesting I do simple files directly on the disks I went back and rethought some things. I think I'm landing on a solution that does everything and doesn't require me to manually manage all these files:

fd (and any number of other programs) can produce lists of files that have been modified since a given date.
fpart can produce lists of files that add up to a given size.
xorrisofs can accept lists of files to add to an iso

So if I fd a list of new files (or don't for the first backup), pipe them into fpart to chunk them up, and then pass these lists into xorrisofs to create ISOs, I've solved almost every problem.

The disks have plain files and folders on them, no special software is needed to read them. My wife could connect a drive, pop the disk in, and the photos would be right there organized by folder.
Incremental updates can be accomplished by keeping track of whenever the last backup was.
The fpart lists are also a greppable index; I can use them to find particular files easily.
Corruption only affects that particular file, not the whole archive.
A full restore can be accomplished with rsync or other basic tools.

Downsides:

Change detection is naive. Just mtime. Good enough?
Renames will still produce new copies. Solution: don't rename files. They're organized well enough, stop messing with it.
Deletions will be disregarded. I could solve this with some sort of indexing scheme, but I don't think I care enough to bother.
There isn't much rhyme or reason to how fpart splits up files. The first backup will be a bit chaotic. I don't think I really care.
If I rsync -a some files into the dataset, which have mtimes older than the last backup, they won't get slurped up in the next one. Can be solved by checking that all files are already in the existing fpart indices, or by just not doing that.

Honestly those downsides look quite tolerable given the benefits. Is there some software that will produce and track a checksum database?

Off to do some testing to make sure these things work like I think they do!

[–] [email protected] 1 points 1 day ago (1 children)

your first two points can be mitigated by using checksums. trivial to name the file after it's checksum, but ugly. save checksums separately? safe checksums in file metadata (exit)? this can be a bit tricky 🤣 I believe zfs already has the checksum, so the job would be to just compare lists.

restoring is as easy, creation gets more complicated and thus prone to errors

[–] [email protected] 1 points 1 day ago (1 children)

I’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?

ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.

[–] [email protected] 2 points 1 day ago (1 children)

never heard of nushell, but sounds interesting.... but it's not default anyhwhere yet. I'd go for bash, perl or maybe python? your comments on zfs make a lot of sense, and invalidate my respective thoughts :D

[–] [email protected] 1 points 1 day ago

I only looked how zfs tracks checksums because of your suggestion! Hashing 2TB will take a minute, would be nice to avoid.

Nushell is neat, I’m using it as my login shell. Good for this kind of data-wrangling but also a pre-1.0 moving target.

[–] [email protected] 2 points 3 days ago* (last edited 3 days ago)

Woah, that's cool! I didn't know you just zfs send anywhere. I suppose I'd have to split it up manually with split or something to get 50gb chunks?

Dar has dar_manager which you can use to create a database of snapshots and slices that you can use to locate individual files, but honestly if I'm using this backup it'll almost certainly be a full restore after some cataclysm. If I just want a few files I'll use one of my other, always-online backups.

Edit: Clicked save before I was finished

I'm more concerned with robustness than efficiency. Dar will warn you about corruption, which should only affect that particular file and not the whole archive. Tar will allow you to read past errors so the whole archive won't be ruined, but I'm not sure how bad the affects would be. I'm really not a fan of a solution that needs every part of every disk to be read perfectly.

I could chunk them up manually, but we're talking about 2TB of lumpy data, spread across hundreds of thousands of files. I'll definitely need some sort of tooling to track changes, I'm not doing that manually and I bounce around the photo library changing metadata all the time.

[–] [email protected] 2 points 3 days ago

I did (am doing) something very similar. I definitely have issues with my indexing, but I’m just ordering it manually by year/date for now.

I’m doing a little extra for parity though. I’m using 50-100gb discs for the data, and using 25gb discs as a full parity disc via dvdisaster for each disc I burn. Hopefully that reduces the risk of the parity data also being unreadable, and gives MORE parity data without eating into my actual data discs. It’s hard enough to break up the archives into 100gb chunks as is.

Need to look into bacula as suggested by another poster.

[–] [email protected] 2 points 3 days ago (1 children)

This is an interesting problem for the same use case which I've been thinking about lately.

Are you using standard BluRay, or M-Discs?

My plan was to simply copy files. These are photos, and IME they don't benefit from compression (I stopped taking raw format pictures when I switched to Fujifilm, and the jpgs coming from the camera were better than anything I could produce from raw in Darktable). Without compression, putting then in tarballs then only adds another level of indirection, and I can just checksum images directly after write, and access them directly when I need to. I was going to use the smallest M-Disc for an index and just copy and modify it when it changed, and version that.

I tend to not change photos after they've been processed through my workflow, so in my case I'm not as concerned with the "most recent version" of the image. In any case, the index would reflect which disc the latest version of an image lived, if something did change.

For the years I did shoot raw, I'm archiving those as DNG.

For the sensitive photos, I have a Rube Goldberg plan that will hopefully result in anyone with the passkey being able to mount that image. There aren't many of those, and that set hasn't been added to in years, so it'll go on one disc with the software necessary to mount it.

My main objective is accessibility after I'm gone, so having a few tools in the way makes trump over other concerns. I see no value in creating tarballs - attach the device, pop in the index (if necessary), find the disc with the file, pop that in, and view the image.

Key to this is

the data doesn't change over time
the data is already compressed in the file format, and does not benefit from extra compression

[–] [email protected] 2 points 3 days ago (1 children)

I'm using standard BD-DLs. M-Disks are almost triple the price, and this project is already too costly. I'm not looking for centuries of longevity, I'm using optical media because it's read-only once written. I read that properly stored Blu-Rays should be good for 10 or 20 years, which is good enough for me. I'll make another copy when the read errors start getting bad.

Copying files directly would work, but my library is real big and that sounds tedious. I have photos going back to the 80s and curating, tagging, and editing them is an ongoing job. (This data is saved in XMP sidecars alongside the original photos). I also won't be encrypting or compressing them for the same reasons you mentioned.

For me, the benefit of the archive tool is to automatically split it up into disk-sized chunks. That and to automatically detect changes and save a new version; your first key doesn't hold true for this dataset. You're right though, I'm sacrificing accessibility for the rest of the family. I'm hoping to address this with thorough documentation and static binaries on every disk.

[–] [email protected] 1 points 1 day ago (1 children)

The densities I'm seeing on M-Discs - 100GB, $5 per, a couple years ago - seemed acceptable to me. $50 for a TB? How big is your archive? Mine still fits in a 2TB disk.

Copying files directly would work, but my library is real big and that sounds tedious.

I mean, putting it in an archive isn't going to make it any smaller. Compression on even lossless compressed images doesn't often help.

And we're talking about 100GB discs. Is squeezing that last 10MB out of the disk by splitting an image across two disks worth it?

The metadata is a different matter. I'd have to think about how to handle the sidecar data... but that you could almost keep on a DVD-RW, because there's no way that's going to be anywhere near as large as the photos themselves. Is your photo editor DB bigger than 4GB?

I never change the originals. When I tag and edit, that information is kept separate from the source images - so I never have multiple versions of pictures, unless I export them for printing, or something, and those are ephemeral and can be re-exported by the editor with the original and the sidecar. Music, and photos, I always keep the originals isolated from the application.

This is good, though; it's helping me clarify how I want to archive this stuff. Right now mine is just backed up on multiple disks and once in B2, but I've been thinking about how to archive for long term storage.

I think in going to go the M-Disc route, with sidecar data on SSD and backed up to BluRay RW. The trick will be letting DarkTable know that the source images are on different media, but I'm pretty sure I saw an option for that. For sure, we're not the first people to approach this problem.

The whole static binary thing - I'm going that route with an encrypted share for financial and account info, in case I die, but that's another topic.

[–] [email protected] 2 points 1 day ago (2 children)

Where I live (not the US) I’m seeing closer to $240 per TB for M-disc. My whole archive is just a bit over 2TB, though I’m also including exported jpgs in case I can’t get a working copy of darktable that can render my edits. It’s set to save xmp sidecars on edit so I don’t bother with backing up the database.

I mostly wanted a tool to divide up the images into disk-sized chunks, and to automatically track changes to existing files, such as sidecar edits or new photos. I’m now seeing I can do both of those and still get files directly on the disk, so that’s what I’ll be doing.

I’d be careful with using SSDs for long term, offline storage. I hear they lose data if not powered for a long time. IMO metadata is small enough to just save a new copy when it changes

[–] [email protected] 1 points 9 hours ago (1 children)

This is more expensive in your country?

https://a.co/d/9DiKeie

That's a little over $11 USD per 100 GB disk. Is it just more expensive where you live, or is it shipping?

I'd be really surprised if these weren't manufactured in Asia somewhere.

[–] [email protected] 1 points 9 hours ago (1 children)

My options look like this:

https://allegro.pl/kategoria/nosniki-blu-ray-257291?m-disc=tak

Exchange rate is 3.76 PLN to 1 USD, which is actually the best I’ve seen in years

[–] [email protected] 1 points 9 hours ago (1 children)

Just out of curiosity, is the product on Amazon, and is it that same price?

[–] [email protected] 1 points 8 hours ago (1 children)

Broadly similar from a quick glance: https://www.amazon.pl/s?k=m-disc+blu+ray

[–] [email protected] 1 points 1 hour ago

Shit, that's way more expensive. If only you knew someone in the US who would buy a few boxes and ship them to you...

But, seriously, yeah, that basically eliminates it as an option.

[–] [email protected] 2 points 1 day ago

It'd be more space efficient to store a COW2 of Linux with a minimum desktop and basically only DarkTable on it. The VM format hasn't changed in decades.

Shoot. A bootable disc containing Linux and the software you need to access the images, and on a separate track, a COW2 image of the same, and on a third, just DarkTable. Best case, you pop in the drive & run DarkTable. Or, you fire up a VM with the images. Worst case, boot into linux. This may be the way I go, although - again - the source images are the important part.

I’d be careful with using SSDs for long term, offline storage.

What I meant was, keep the master sidecar on SSD for regular use, and back it up occasionally to a RW disc. Probably with a simply cp -r to a directory with a date. This works for me because my sources don't change, except to add data, which is usually stored in date directories anyway.

You're also wanting to archive the exported files, and sometimes those change? Surely, this is much less data? Of you're like me, I'll shoot 128xB and end up using a tiny fraction of the shots. I'm not sure what I'd do for that - probably BD-RW. The longevity isn't great, but it's by definition mutable data, and in any case the most recent version can be easily enough regenerated as long as I have the sidecar and source image secured.

Burning the sidecar to disk is less about storage and more about backup, because that is mutable. I suppose an append backup snapshot to M-Disc periodically would be boots and suspenders, and frankly the sidecar data is so tiny I could probably append such snapshots to a single disc for years before it all gets used. Although... sidecar data would compress well. Probably simply tgz, then, since it's always existed, and always will, even if gzip has been superseded by better algorithms.

BTW, I just learned about the b3 hashing algorithm (about which I'm chagrined, because I thought I kept an eye out on the topic of compression and hashing). It's astonishingly fast - for the verification part, is what I'm suggesting.

[–] [email protected] 0 points 3 days ago (1 children)

I recommend looking into a borg/borgmatic setup.

[–] [email protected] 2 points 3 days ago (2 children)

Can borg back up to write-once optical media spread over multiple disks? I'm looking through their docs and I can't find anything like that. I see an append-only mode but that seems more focused on preventing hacked clients from corrupting data on a server.

[–] [email protected] 2 points 3 days ago

I’m not sure it would intelligently handle that on its own. There’d need to be some manual work on your end.

[–] [email protected] 2 points 3 days ago (1 children)

Borg is amazing but I don't think it's a good fit for your case.

[–] [email protected] 2 points 3 days ago

Yeah, I already use restic which is extremely similar and I don't believe it could do this either. Both are awesome projects though