this post was submitted on 22 Mar 2024
23 points (96.0% liked)

Selfhosted

39980 readers
780 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Basically title. Is it common to use some kind of RAID for backing up other RAIDs or do people just go with single drives?

all 35 comments
sorted by: hot top controversial new old
[–] [email protected] 16 points 7 months ago* (last edited 7 months ago)

2 Single drives means 2 full copies, one you can keep at a friends place. 2 mirrored drives means if you accidentally overwrite a backup, you have lost both drives to the error, unless you have snapshotting or imcremental backups.

Lots of good backup advice on this podcast https://2.5admins.com/

[–] [email protected] 9 points 7 months ago* (last edited 7 months ago) (1 children)

It depends on your needs. How much do you value your data? Can you re-create / re-download it in case of a disk failure?

In some case, like a typical home users with a few writes per day or even week simply having a second disk that is updated every day with rsync may be a better choice. Consider that if you’re two mechanical disks spinning 24h7 they’ll most likely fail at the same time (or during a RAID rebuild) and you’ll end up loosing all your data. Simply having one active disk (shared on the network and spinning) and the other spun down and only turned on once a day with a cron rsync job mean your second disk will last a LOT longer and you’ll be safer.

[–] [email protected] 2 points 7 months ago* (last edited 7 months ago) (2 children)

Well, afaik the spinning up and down and related temperature changes do the most damage. I am not sure if a disk that is spun up daily will outlast one that mostly idles 24/7. Maybe if you do it only weekly?

[–] [email protected] 1 points 7 months ago

I am not sure if a disk that is spun up daily will outlast one that mostly idles 24/7. Maybe if you do it only weekly?

Well, I do it weekly in a specific case but I also have other systems running daily. I guess it also depends on the use case / amount of data written and how damaging it can be if the "hot" drive breaks between the syncs.

[–] [email protected] 1 points 7 months ago (1 children)

Without any cold hard data, this isn't worth discussing.

[–] [email protected] 0 points 7 months ago

The "cold hard data" is that 100% of the people that would be able to collect this "cold hard data" run their drives 24/7.

[–] [email protected] 5 points 7 months ago

I would recommend avoiding RAID for backups. It's preferable to have two separate backup disks in two distinct systems rather than relying on mirrored backup disks. If there's a human error on the backup machine, you risk losing both backups simultaneously. Additionally, unforeseen events like system failure due to a lightning strike could compromise your data. Ideally, you should have two backups stored in two different location.

[–] [email protected] 4 points 7 months ago

As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.

Enterprise backups tend to fall along the recommendation called 3-2-1:

  • 3 copies of the data, of which
  • 2 are backups, and
  • 1 is off-site (and preferably offline)

On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn't have an off-site, but I do have two external hard drives connected to my NAS.

  • All devices are backed up to the NAS for fast recovery access between 1w and 24h RPO
  • The NAS backs up various parts of itself to the external hard drives every 24h
    • Data is split up by role and convenience factor - just putting stuff together like Tetris pieces, spreading out the NAS between the two drives
    • The most critical data for me to have first during a recovery is backed up to BOTH external disks
  • Coincidentally, both drives happen to be from different vendors, but I didn't initially plan it that way, the Seagate drive was a gift and the WD drive was on sale

Story time

I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.

The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That's how much you might be willing to spend on keeping that data safe.

[–] [email protected] 3 points 7 months ago

So many people didn't read the post and going off how raid isn't backup.

There are a few things to consider. How much data is it? How is it connected? How reliable do you want it to be? Where is it going to be? How are you backing it up? How will you monitor the disk(s) and backup process for failures?

Is it at some place that will be a pain to deal with if a hard drive dies, like a friend's house or something. I'd deal with raid so it wouldn't be an immediate reason to go fix it or go without backups.

Is it small enough amounts of data that you could have a complete third copy if you didn't put the disks in raid? Then I'd probably make multiple copies and not use raid.

Are you dealing with something like veeam doing backup chains? Having an initial copy and then incremental with changes where you can go back to different days? Go with raid because having to reconfigure can be a hassle or having a full and incremental across jbods could cost you all the backups if the disk with the full backup is lost.

Either or is a valid choice and depends on your particular needs.

[–] [email protected] 2 points 7 months ago

Generally speaking, fault protection schemes need only account for one fault at a time, unless you're a really large business, or some other entity with extra-stringent data protection requirements.

RAID protects against drive failure faults. Backups protect against drive failure faults as well, but also things like accidental deletions or overwrites of data.

In order for RAID on backups to make sense, when you already have RAID on your main storage, you'd have to consider drive failures and other data loss to be likely to occur simultaneously. I.E. RAID on your backups only protects you from drive failure occurring WHILE you're trying to restore a backup. Or maybe more generally, WHILE that backup is in use, say, if you have a legal requirement that you must keep a history of all your data for X years or something (I would argue data like this shouldn't be classified as backups, though).

[–] [email protected] 1 points 7 months ago* (last edited 7 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NAS Network-Attached Storage
PSU Power Supply Unit
RAID Redundant Array of Independent Disks for mass storage
SSD Solid State Drive mass storage
ZFS Solaris/Linux filesystem focusing on data integrity

5 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.

[Thread #622 for this sub, first seen 22nd Mar 2024, 23:15] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 1 points 7 months ago (2 children)

Could always use UNRAID for the backup if you're trying to be storage efficient, but it's really no better than RAID5

[–] [email protected] 3 points 7 months ago (2 children)

Obligatory "TrueNAS is free " comment

[–] [email protected] 3 points 7 months ago* (last edited 7 months ago) (1 children)

Unraid’s “killer feature” is the ability to mix and match disparate drive sizes and only requiring the parity drive to be at least as large as your largest data disk, a la MergeFS/Snapraid. Also ZFS chugging RAM like there’s no tomorrow so not really an option for underpowered devices like some NASes. But yeah, TrueNAS is nice.

[–] [email protected] 2 points 7 months ago

Thats is a very budget-friendly choice for UnRAID to accept varying drive sizes. As a backup destination, especially a cold backup, the RAM requirements of ZFS should be less impactful. I had lots of use from my TrueNAS box with 16GB, and my dedicated cold backup build is just 8GB on 5x1TB WD Blue (gasp!) HDDs. I always wanted to try other NAS platforms, but I'm away from all my tech for a few years.

[–] [email protected] 2 points 7 months ago (1 children)

Lol.

If you have a spare box doing little, and a bunch of drives, it (or unRAID) are reasonable solutions. Proxmox can also build RAID with random drive sizes - I'm running one with 3 drives, using ZFS RAID 0, it has a terabyte of storage.

Yep, it's gonna suck when one of those drives fail.

[–] [email protected] 1 points 7 months ago

Well as long as you're aware of the risk and prepared for it, its not so bad to run in a volatile way like that. I ran my TN box for almost a decade on the same USB boot before I finally caved and picked up three Intel enterprise SSD for the job, with one as a cold spare. Nothing in the vox was critical or would be missed for more than a few beers of crying.

[–] [email protected] 1 points 7 months ago

Yep, all RAID has the same kinds of issues - largely sensitivity to X number of drive failures. Which is part of why we see RAID 6 (double parity), Mirroring, RAID 1-0, etc, all as mechanisms to provide compensation for disk failure within the RAID.

In the SMB, RAID 10 seems to be the favorite approach today for NAS/Virtualization hosts (ESX, etc), with backup going to a cloud provider such as iland or barracuda.

[–] [email protected] 1 points 7 months ago

I have 1 off site and two 10tb external drives that are duplicate backups.

[–] [email protected] 1 points 7 months ago

Snapraid to a single drive works well if you are fine with daily snapshots of up to 6 drives.

[–] [email protected] 1 points 7 months ago

A mirror raid with a filesystem that does error correction based on checksums (btrfs/ZFS) and incremental backups with snapshots is probably the safest... and you should still have another off-site backup if it is really important data.

But for most home use stuff a single drive for backups that you regularly do is sufficient in 95% of the cases.

[–] [email protected] 1 points 7 months ago

I have a tiny archive of my own consisting of one 1 TB and one 2 TB USB HDDs by different vendors. Whenever I want to save something, I put it on both. Btrfs snapshots make that really easy.

[–] [email protected] 1 points 7 months ago

I use a RAID for the data but the backups go to simple single disks. My reasoning is, I already have a RAID and redundancy. And I don't have an unlimited budged. It'd already need 2 disks to fail to wreck the RAID and then also the backup has to fail with that solution. That's probably a fire or ransomware or a deliberate effort. Adding one more disk of redundancy would probably not change much. But It'd cost and add complexity.

Also this way I don't need to care about buying disks of a certain size and go through painful migration processes more than necessary. I can re-use the drives with mismatched sizes and swap them in to the backup pool.

[–] [email protected] 1 points 7 months ago (1 children)

I would go with raid on the backup system too. you don't want all your backups disappearing because one drive fails.

[–] [email protected] 3 points 7 months ago (1 children)

Depends on them not choosing wrong raid type :)

[–] [email protected] 3 points 7 months ago (1 children)

That is why I say "RAID0 is not RAID."

[–] [email protected] 2 points 7 months ago

Where? not in what i replied to

[–] [email protected] 0 points 7 months ago

Object storage is really popular for backups now because you can make it immutable by protocol standard and with erasure coding you can have fault tolerance across locations.

[–] [email protected] 0 points 7 months ago (1 children)

RAID. No question. Or two individual drives each alternating full backups, which is what I do.

I just plugged in a new drive to replace an SSD that locked and wouldn't write new backups. It failed a format attempt. I immediately ordered a replacement. Remember the rule: one is none.

And for fucksake, have an offsite backup.

[–] [email protected] 0 points 7 months ago

RAID is a choice if you're (generally) trying to maximize storage capacity against cost of drive capacity. It was born out of a lack of drives of sufficient capacity.

Mirroring is useful for protection against hardware failures - it's not a backup.

Follow the 3:3 rule: 3 backups, in 3 different "locations". Locations in quotes because 2 different cloud storage providers count as 2 different locations.

Whether your "local" backup (in your location, at a friend's house, etc) uses RAID depends on your requirements, cost sensitivity, etc.

I have a couple RAID setups only because I always have spare drives around, and it's relatively cheap to build a box to run something like UnRAID or TrueNAS which can take advantage of mixed drive sizes.

My current setup is an old file server with a large drive that is currently replicating to an external drive, a small NAS, and Crashplan.

Not an ideal setup since 2 backups are local (though my NAS is easy to grab and run with, weighs about 10lbs).

Next phase is to move to Storj.io and switch to a proper backup tool like Borg.

[–] [email protected] -4 points 7 months ago (1 children)

Any storage shut be raid or a form their of in a ideal world. The storage where backups are stored a defiantly yes raid shut be a very high priority.

[–] [email protected] 1 points 7 months ago (1 children)

I haven't needed RAID for years, because my storage needs were small enough to fit on currently available drives.

Which is why my file server has a single 4TB data drive, with an external attached for mirroring on a schedule, plus a NAS also mirrored on a schedule, and Crashplan.

The NAS was recently added, and it's RAID 5, only because it was free and I had the drives sitting around collecting dust. Hopefully I can switch it to RAID 6 once deduplication is finished.

Technically only Crashplan is a real backup in my setup. The rest is just local redundancy.

I'd prefer to not use RAID if I can avoid it.

[–] [email protected] 0 points 7 months ago* (last edited 7 months ago) (1 children)

Raid is not only for if a drive fails. But can also be used against slow corruption of files. If you love your data use raid.

[–] [email protected] 1 points 7 months ago

That is just a specific type of drive failure and only certain software RAID solutions are able to even detect corruption through the use of checksums. Typical "dumb" RAID will happily pass on corrupted data returned by the drives.

RAID only serves to prevent downtime due to drive failure. If your system has very high uptime requirements and a drive just dropping out must not affect the availability of your system, that's where you use RAID.

If you want to preserve data however, there are much greater hazards than drive failure: Ransomware, user error, machine failure (PSU blows up), facility failure (basement flooded) are all similarly likely. RAID protects against exactly none of those.

Proper backups do provide at least decent mitigation against most of these hazards in addition to failure of any one drive.

If love your data, you make backups of it.

With a handful of modern drives (<~10) and a restore time of 1 week, you can expect storage uptime of >99.68%. If you don't need more than that, you don't need RAID. I'd also argue that if you do indeed need more than that, you probably also need higher uptime in other components than the drives through redundant computers at which point the benefit of RAID in any one of those redundant computers diminishes.