this post was submitted on 22 Mar 2024

68 points (92.5% liked)

Selfhosted

39980 readers

556 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

[email protected]

Getting Started with Self Hosted LLM (lemmy.zip)

submitted 7 months ago by [email protected] to c/[email protected]

21 comments fedilink hide all child comments

I've been playing around with Ollama in a VM on my machine and it is really useful.

To get started I would start by making sure you have capable hardware. You will need recent hardware so that old computer you have laying around may not be enough. I created a VM on my laptop with KVM and gave it 8gb of ram and 12 cores.

Next, read the readme. You can find the Readme at the github repo

https://github.com/ollama/ollama

Once you run the install script you will need to download models. I would download Llama2, Mistral and LLava. As an example you can pull down llama2 with ollama pull llama2

Ollama models are available in the online repo. You can see all of them here: https://ollama.com/library

Once they are downloaded you need to setup openwebui. First, install docker. I am going to assume you already know how to do that. Once docker is installed pull and deploy open web UI with this command. Notice its a little different than the command in the open web UI docs. docker run -d --net=host -e OLLAMA_BASE_URL="http://localhost:11434 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Notice that the networking is shared with the host. This is needed for the connection. I also am setting the environment variable in order to point open web UI to ollama.

Once that's done open up the host IP on port 8080 and create an account. Once that's done you should be all set.

top 21 comments

sorted by: hot top controversial new old

[–] [email protected] 8 points 7 months ago (1 children)

There's also llamafile, super simple: download and run it.

[–] [email protected] 1 points 7 months ago (1 children)

Not as cool or flexiable though

[–] [email protected] 5 points 7 months ago (1 children)

Iirc it can run anything llama.cpp can because it just uses that under the hood.

[–] [email protected] 2 points 7 months ago (1 children)

Except you can't control it as easily. I like the UI and toolset of open web UI

[–] [email protected] 4 points 7 months ago

Ok. I haven't tried it so I'll take your word for it. I'm just offering an easier alternative since the topic was "getting started"

[–] [email protected] 7 points 7 months ago

Personally I've really enjoyed text-generation-webui. It made it really easy to ramp up and learn. Very cool stuff you got though, I'll probably be looking at a comparison between them!

[–] [email protected] 5 points 7 months ago (2 children)

Does it work out okay with 12 cores purely on CPU? About how fast is the interaction?

I played around a little with Ollama and gpt4all but it seemed to me like it wasn't fast enough to be useful on pure CPU, but if I could just throw cores at it then I might revisit the issue.

[–] [email protected] 3 points 7 months ago

It wasn't usable a few months ago. However, when I setup ollama it was "fast" and it works ok. It takes anywhere from instant to 5min for responses. LLava seems to take the longest which makes sense. For llama2 it is fairly fast unless you ask it for obscure information.

[–] [email protected] 2 points 7 months ago* (last edited 7 months ago)

For the life of me can't remember the scores I was getting on gpt4all. But given that you tried it I'm guessing you'll most likely take a liking to LM studio or perhaps jan.ai. both GUI tools. If the lack open source bothers you go for Jan.ai, if not then go LM studio. LM studio in particular allows for full and partial GPU offloading. So if you have a semi capable but not quite enough vram on it you can load part of the model on the GPU to speed up inference. As a side note pure CPU on my old ryzen 1600 I was looking at 6/it. Which isn't all that much but glass half full its still faster than the average typing speed and takes the load off of having to think about how to creatively word things

[–] [email protected] 3 points 7 months ago (1 children)

Thanks for sharing this. Its a shame that most AI tech is hidden behind steep price tags and cloud subscriptions, while even midrange PCs can run interesting AI models.

[–] [email protected] 3 points 7 months ago

Check out LM Studio. Totally free and very nice GUI. You can do some very cool things with it. That and Kobold AI. Very simple self hosting no need for configuration either. Download and open essentially.

[–] [email protected] 3 points 7 months ago

Ollama has been great for self-hosting, but also checkout vLLM as its the new shiny self-hosting toy

[–] [email protected] 3 points 7 months ago (1 children)

The biggest thing that I want to learn is how to either A: add "tools" for the AI to run, or B: "fine-tune" the model by feeding it data that's relevant to me.

[–] [email protected] 2 points 7 months ago (1 children)

You can teach it things and upload documents for it to process

[–] [email protected] 2 points 7 months ago (1 children)

Yeah, I couldn't find that on ollama; but I did find it in text-generation-webui - which is a little more complicated, but for me, I think it might help springboard me into understanding a few more things.

[–] [email protected] 2 points 7 months ago (1 children)

Ollama is just a the backend. You need open web UI or a similar application to use it

[–] [email protected] 2 points 7 months ago (1 children)

Check AnythingLLM out, its just an appimage

[–] [email protected] 1 points 7 months ago (1 children)

Not as maintainable long term and it doesn't have user management

[–] [email protected] 1 points 7 months ago (1 children)

There's a dockerized version if you need those

https://github.com/Mintplex-Labs/anything-llm/blob/master/docker/HOW_TO_USE_DOCKER.md

[–] [email protected] 1 points 7 months ago* (last edited 7 months ago) (1 children)

So why is it better than OpenwebUI? It seems like each has there own use case.

I'll give it a try just for fun but it doesn't seem to be better as far as I can tell

[–] [email protected] 1 points 7 months ago

No idea if its better, its the thing I tried and it was pretty seamless to set up. With my aging hardware and AMD GPU, I have been pretty much sitting in the sidelines with this whole LLM thing