this post was submitted on 09 Sep 2023
28 points (93.8% liked)

Selfhosted

39980 readers
780 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Hello.

My setup is:

  • Lenovo M920q mini pc with Proxmox installed (this doesn't have IPMI, only vPRO and it's annoying me)
  • Fujitsu TX1320 M3 with TrueNAS Core installed - ZFS + RAID1 (this is a low-end "enterprise grade" server, and best thing - it has IPMI).

The Proxmox PC keeps all its CTs and 1 VM on the TrueNAS using iSCSI.

The idea behind my setup was that it felt nice that the TrueNAS would handle all the storage heavy lifting - ZFS, RAID etc., while the Proxmox mini PC would be a "compute-only" node that has a naked Proxmox install with some config.

The problem with that is if the TrueNAS machine loses power or is restarted, the Proxmox CTs/VMs switch their filesystem to read-only and stop responding to requests. This is because the iSCSI connection is interrupted. When the TrueNAS is back online, Proxmox doesn't make any attempt to restart the VMs/CTs - they'd still be broken.

It's annoying to me to have to VPN to the Proxmox web ui and wait 15 minutes until all the CTs/VMs are restarted and now again functioning on the "alive" iSCSI connection.

I was wondering what are my options here to remove the dependency chain?

I'm really into the idea of decomissioning the Proxmox node because I'm scared I won't be able to (over VPN) change the power state of the machine if something goes wrong, since it only has vPro and not iSCSI like the TrueNAS machine. By doing that, I'd consolidate the storage and the compute into the TrueNAS machine.

Options I can think of:

  1. Decomission the Proxmox node and move all Debian VMs/CTs to TrueNAS BSD jails. Is that even possible? Will all my Debian VMs work in BSD?
  2. Decomission the Proxmox node, switch TrueNAS Core to TrueNAS Scale and move CTs/VMs to TrueNAS Scale's Linux VMs
  3. Keep the Proxmox node and somehow figure out how to get Proxmox to refresh the CTs/VMs on iSCSI connection loss.
  4. Keep the Proxmox PC, but switch it to iESXI hoping that it handles the iSCSI failure more gracefully

EDIT: I didn't make it clear at first - TrueNAS stores more data than just VMs - documents, Linux ISOs (TM), photos, Syncthing

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 5 points 1 year ago (1 children)

2 is the only possible one of those options.

You could also make the current setup more reliable by adding a UPS and/or second storage node for redundancy, so that when one goes down, the other is still available. Presumably TrueNAS supports this.

But nothing is going to help you recover if the iSCSI link is broken. It's up to the host and guest OS to re-establish the link, and to the guest it usually looks like the hard drive has been unplugged, and I don't know any OS that considers that a supported and recoverable condition.

[–] [email protected] 1 points 1 year ago

Thanks for making it clear that iSCSI power down is in fact one of the more grim scenarios, I couldn't make it out how bad of a situation it is. In an enterprise environment a SAN being down would require some type of incident report.

UPS - as you suggested - would solve most of my problems to be honest.