this post was submitted on 22 Dec 2023
226 points (95.6% liked)

Technology

59312 readers
5006 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Apple wants AI to run directly on its hardware instead of in the cloud::iPhone maker wants to catch up to its rivals when it comes to AI.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 12 points 10 months ago* (last edited 10 months ago)

It's already possible. A 4bit quant of phi 1.5 1.5B (as smart as a 7b model ) takes 1Gb of ram . Phi 2 2.6B (as smart as a 13b model ) was recently released and it would likely take 2GB of RAM with 4bit Quant (not tried yet) The research only license on these make people not want to port them to android and instead focus on weak 3B models or bigger models ( 7b+) which heavily limit any potential usability.

  1. Apple could mimic and improve the phi models training to make their own powerful but small model and then leverage the fact that they have full knowledge and control over the hardware architecture to maximize every drop of performance. Kinda like how the some people used their deep knowledge of the console architecture to make it do things that seems impossible.

Or

  1. The Apple engineers will choose, either due to time constraints or laziness to simply use llama.cpp which will certainly implement this flash attention and then use an already available model that allow its use for commercial purposes like mistral, add some secret sauce optimizations based on the hardware and voilà.

I bet on 2.