Yeah, it's not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.
running them in Python with Huggingface's Transformers library (from local models
That's your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.
Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.
Llama.ccp can partially run the model on the gpu for way faster inference.
Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.
Good luck and happy tinkering!
It's AI and your voice won't be used for training if you use a local model.
Use Whisper stt. It run on your computer so nothing will be out. You can adapt the model size based on how powerful your computer is. The bigger the model the better at transcribing it will be.