Technology

70285 readers

4310 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

675

Mozilla lays off 60 people, wants to build AI into Firefox (arstechnica.com)

submitted 1 year ago by [email protected] to c/[email protected]

316 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 0 points 1 year ago (1 children)

Sorry but has anyone in this thread actually tried running local LLMs on CPU? You can easily run a 7B model at varying levels of quantization (ie. 5 bit quantization) and get a generalized prompt-able LLM. Yeah, of course it's going to take ~4GB of RAM (which is mem-mapped and paged into memory), but you can easily fine tune smaller more specific models (like the translation one mentioned above) and have surprising intelligence at a fraction of the resources.

Take, for example, phi-2 which performs as well as 13B param models but with 2.7B params. Yeah, that's still going to take 1.5GB RAM which Firefox wouldn't reasonably ship, but many lighter weight specialized tasks could easily use something like a fine tuned 0.3B model with quantization.

[–] [email protected] 1 points 1 year ago

Yes, I did. And yes, it is possible. It's terribly slow in comparison, making it less useful. It very quickly devolves into random mumbling or get stuck in weird loops. It also hogs resources that are actually used by other tasks you may be doing.

I mainly test dev AI solutions, and moving from 1B to 7B models made them vastly more pertinent. And moving from CPU implementation (Ryzen 7 3700X) to GPU (RTX 3080 Ti) made them fast enough to be used as quick completion and immediate suggestion without breaking workflow, in addition to freeing resources for IDE, building tools and the actual software being run, while running it on CPU had multi-seconds delay, which made this use case completely useless.