this post was submitted on 15 Oct 2024
463 points (97.1% liked)

Technology

58678 readers
3902 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
(page 2) 47 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 23 hours ago

Someone needs to pull the plug on all of that stuff.

[–] [email protected] 15 points 1 day ago

They predict, not reason....

[–] [email protected] 92 points 1 day ago* (last edited 1 day ago) (2 children)

One time I exposed deep cracks in my calculator's ability to write words with upside down numbers. I only ever managed to write BOOBS and hELLhOLE.

LLMs aren't reasoning. They can do some stuff okay, but they aren't thinking. Maybe if you had hundreds of them with unique training data all voting on proposals you could get something along the lines of a kind of recognition, but at that point you might as well just simulate cortical columns and try to do Jeff Hawkins' idea.

[–] [email protected] 44 points 1 day ago

LLMs aren't reasoning. They can do some stuff okay, but they aren't thinking

and the more people realize it, the better. which is why it's good that a research like that from a reputable company makes headlines.

[–] [email protected] 2 points 1 day ago (2 children)
[–] [email protected] 39 points 1 day ago (2 children)

Are you telling me Apple hasn't seen through the grift and is approaching this with an open mind just to learn how full off bullshit most of the claims from the likes of Altman are? And now they're sharing their gruesome discoveries with everyone while they're unveiling them?

[–] [email protected] 50 points 1 day ago

I would argue that Apple Intelligence™️ is evidence they never bought the grift. It's very focused on tailored models scoped to the specific tasks that AI does well; creative and non-critical tasks like assisting with text processing/transforming, image generation, photo manipulation.

The Siri integrations seem more like they're using the LLM to stitch together the API's that were already exposed between apps (used by shortcuts, etc); each having internal logic and validation that's entirely programmed (and documented) by humans. They market it as a whole lot more, but they market every new product as some significant milestone for mankind ... even when it's a feature that other phones have had for years, but in an iPhone!

[–] [email protected] -2 points 1 day ago (1 children)

What's an example of a claim Altman has made that you'd consider bullshit?

[–] [email protected] 9 points 1 day ago (1 children)

The entirety of "open" ai is complete bullshit. They're no longer even pretending to be nonprofit at all and there is nothing "open" about them since like 2018.

load more comments (1 replies)
[–] [email protected] 36 points 1 day ago (2 children)

The fun part isn't even what Apple said - that the emperor is naked - but why it's doing it. It's nice bullet against all four of its GAFAM competitors.

[–] [email protected] 28 points 1 day ago

This right here, this isn't conscientious analysis of tech and intellectual honesty or whatever, it's a calculated shot at it's competitors who are desperately trying to prevent the generative AI market house of cards from falling

[–] [email protected] 18 points 1 day ago

They're a publicly traded company.

Their executives need something to point to to be able to push back against pressure to jump on the trend.

[–] [email protected] 1 points 1 day ago* (last edited 1 day ago) (2 children)

Are we not flawed too? Does that not makes AI...human?

load more comments (2 replies)
[–] [email protected] -3 points 1 day ago (1 children)

Real headline: Apple research presents possible improvements in benchmarking LLMs.

[–] [email protected] 20 points 1 day ago (2 children)

Not even close. The paper is questioning LLMs ability to reason. The article talks about fundamental flaws of LLMs and how we might need different approaches to achieve reasoning. The benchmark is only used to prove the point. It is definitely not the headline.

[–] [email protected] -3 points 1 day ago* (last edited 1 day ago) (1 children)

You say “Not even close.” in response to the suggestion that Apple’s research can be used to improve benchmarks for AI performance, but then later say the article talks about how we might need different approaches to achieve reasoning.

Now, mind you - achieving reasoning can only happen if the model is accurate and works well. And to have a good model, you must have good benchmarks.

Not to belabor the point, but here’s what the article and study says:

The article talks at length about the reliance on a standardized set of questions - GSM8K, and how the questions themselves may have made their way into the training data. It notes that modifying the questions dynamically leads to decreases in performance of the tested models, even if the complexity of the problem to be solved has not gone up.

The third sentence of the paper (Abstract section) says this “While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.” The rest of the abstract goes on to discuss (paraphrased in layman’s terms) that LLM’s are ‘studying for the test’ and not generally achieving real reasoning capabilities.

By presenting their methodology - dynamically changing the evaluation criteria to reduce data pollution and require models be capable of eliminating red herrings - the Apple researchers are offering a possible way benchmarking can be improved.
Which is what the person you replied to stated.

The commenter is fairly close, it seems.

[–] [email protected] 3 points 19 hours ago

Adding the benchmark back into the training process doesn't mean you get an LLM the can weed out irrelevant data, what you get is an LLM that can pass the new metric and you have to design a new metric with different semantic patterns to actually know if it's "eliminating red herrings".

[–] [email protected] 0 points 1 day ago

Once there’s a benchmark, LLMs can optimise for it. This is just another piece of news where people call “game over” but the money poured into R&D isn’t stopping anytime soon. Wasn’t synthetic data supposed to be game over for LLMs? Its limitations have been identified and it’s still being leveraged.

load more comments
view more: ‹ prev next ›