Hello.
My setup is:
- Lenovo M920q mini pc with Proxmox installed (this doesn't have IPMI, only vPRO and it's annoying me)
- Fujitsu TX1320 M3 with TrueNAS Core installed - ZFS + RAID1 (this is a low-end "enterprise grade" server, and best thing - it has IPMI).
The Proxmox PC keeps all its CTs and 1 VM on the TrueNAS using iSCSI.
The idea behind my setup was that it felt nice that the TrueNAS would handle all the storage heavy lifting - ZFS, RAID etc., while the Proxmox mini PC would be a "compute-only" node that has a naked Proxmox install with some config.
The problem with that is if the TrueNAS machine loses power or is restarted, the Proxmox CTs/VMs switch their filesystem to read-only and stop responding to requests. This is because the iSCSI connection is interrupted. When the TrueNAS is back online, Proxmox doesn't make any attempt to restart the VMs/CTs - they'd still be broken.
It's annoying to me to have to VPN to the Proxmox web ui and wait 15 minutes until all the CTs/VMs are restarted and now again functioning on the "alive" iSCSI connection.
I was wondering what are my options here to remove the dependency chain?
I'm really into the idea of decomissioning the Proxmox node because I'm scared I won't be able to (over VPN) change the power state of the machine if something goes wrong, since it only has vPro and not iSCSI like the TrueNAS machine. By doing that, I'd consolidate the storage and the compute into the TrueNAS machine.
Options I can think of:
- Decomission the Proxmox node and move all Debian VMs/CTs to TrueNAS BSD jails. Is that even possible? Will all my Debian VMs work in BSD?
- Decomission the Proxmox node, switch TrueNAS Core to TrueNAS Scale and move CTs/VMs to TrueNAS Scale's Linux VMs
- Keep the Proxmox node and somehow figure out how to get Proxmox to refresh the CTs/VMs on iSCSI connection loss.
- Keep the Proxmox PC, but switch it to iESXI hoping that it handles the iSCSI failure more gracefully
EDIT: I didn't make it clear at first - TrueNAS stores more data than just VMs - documents, Linux ISOs (TM), photos, Syncthing
Get rid of iscsi. Instead, use truenas scale for nas and use a zvol on truenas to run a vm of proxmox backup server. Run proxmox on the other box with local vms and just backup the vms to proxmox backup server at a rate you are comfortable with (i.e. once a night). Map nfs shares from truenas to any docker containers directly that you are running on your vms. map cifs shares to any windows vms, map nfs shares directly to any linux things. This is way more resilient, gets local nvme speeds for the vms and still keepa the bulk of your files on the nas, while also not abusing your 1gbit ethernet for vm stuff, just for file transfer (the vm stuff happens at multi GB speeds on the local nvme on the proxmox server).
For a home environment, this is the correct idea
I would argue it's the correct idea up to a fairly decently sized business. Basically anything where you don't have the budget or the need for super fault tolerant systems (i.e. where it's ok to very rarely have a 20 minute to an hour outage in order to save 50k+ of IT hardware costs). You can take the above and go next step to a high availability proxmox cluster to further reduce potential downtime before you step into the realm of needing vmware and very expensive highly available and fast storage as well. It gets even more true when you start messing around with truenas and differential speed vdevs (i.e build a super fast nvme one with 10-25gig networking for some applications, a cheaper spinning rust one with maybe 10 gig networking for bulk storage. It's also nice that, by using proxmox backup server as a zvol you can take advantage of all the benefit of both zfs replication/snapshotting and cloud (jstor/wasabi s3 bucket, another truenas server at a different location) for that zvol as well as your other data you are sharing as datasets.
I would have assumed that most businesses invested in resilient hardware, but perhaps not. Thanks for the note