You say that XMPP is much lighter. But I think that is mostly due to Synapse not being very efficient. Other implementations are fairly light. Even then my Synapse is using fairly small amounts of resources. You should also check that you are making an apples-to-apples comparison with large rooms, media and message history like you would typically see in a common Matrix server.
I have a Prosody server running with about 10 concurrent users (friends/family). I just checked and it's using 32M of RAM, local storage is in the megabytes. The database I'm using as a backend for message history and such is about 70MB. The only other data is temporary cache for uploaded media, which varies depending on what's uploaded. How does that compare with a typical Matrix server for friends and family?
Not sure what the problem with Gajim was. There's a distro package already in Arch, although I usually use the AUR packages from git. I haven't had any problem with either of them, though.
You should be able to limit MUCs so that they're only for local users of a server and not accessible to people outside the server. Or, you can make them invite only, require a password, etc. So there are options there.
One-to-one video and audio chat works pretty well with XMPP. I don't think group video/audio works, but I've never tried. I know that's a really popular thing on Discord, so it'd be nice if XMPP had that too. I think there's some work toward it, but I don't know where that's at.