this post was submitted on 15 Oct 2024
455 points (97.1% liked)

Technology

58678 readers
3941 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 20 points 10 hours ago

cracks? it doesn't even exist. we figured this out a long time ago.

[–] [email protected] 28 points 12 hours ago (1 children)

They are large LANGUAGE models. It's no surprise that they can't solve those mathematical problems in the study. They are trained for text production. We already knew that they were no good in counting things.

[–] [email protected] 17 points 12 hours ago

"You see this fish? Well, it SUCKS at climbing trees."

[–] [email protected] 43 points 18 hours ago (1 children)

So do I every time I ask it a slightly complicated programming question

[–] [email protected] 12 points 15 hours ago (1 children)

And sometimes even really simple ones.

[–] [email protected] 7 points 14 hours ago (2 children)

How many w's in "Howard likes strawberries" It would be awesome to know!

[–] [email protected] 6 points 14 hours ago* (last edited 14 hours ago) (3 children)

So I keep seeing people reference this... And I found it curious of a concept that LLMs have problems with this. So I asked them... Several of them...

Outside of this image... Codestral ( my default ) got it actually correct and didn't talk itself out of being correct... But that's no fun so I asked 5 others, at once.

What's sad is that Dolphin Mixtral is a 26.44GB model...
Gemma 2 is the 5.44GB variant
Gemma 2B is the 1.63GB variant
LLaVa Llama3 is the 5.55 GB variant
Mistral is the 4.11GB Variant

So I asked Codestral again because why not! And this time it talked itself out of being correct...

Edit: fixed newline formatting.

[–] [email protected] 2 points 6 hours ago* (last edited 6 hours ago)

Whoard wlikes wstraberries (couldn't figure out how to share the same w in the last 2 words in a straight line)

[–] [email protected] 1 points 10 hours ago

Interesting. . . I'd say Gemma 2B wasn't actually wrong - it just didn't answer the question you asked! I wonder if they have this problem with other letters - like maybe it's something to do with how we say w as double-you . . . But maybe not, because they seem to be underestimating rather and overestimating. But yeah, I guess the fuckers just can't count. You'd think a question using the phrase 'How many . . .' would be a giveaway that they might need to count something rather than rely on knowledge base.

[–] [email protected] 1 points 11 hours ago

LOL 😆😅! I totally made it up! And it worked! So maybe it's not just R's that it has trouble counting. It's any letter at all.

[–] [email protected] 2 points 14 hours ago (1 children)

I'd be happy to help! There are 3 "w"s in the string "Howard likes strawberries".

[–] [email protected] 1 points 11 hours ago

Are you sure? Can you please double check?

[–] [email protected] 80 points 22 hours ago (12 children)

Did anyone believe they had the ability to reason?

[–] [email protected] 19 points 15 hours ago

People are stupid OK? I've had people who think that it can in fact do math, "better than a calculator"

[–] [email protected] 31 points 20 hours ago
load more comments (10 replies)
[–] [email protected] 16 points 18 hours ago* (last edited 18 hours ago)

Here's the cycle we've gone through multiple times and are currently in:

AI winter (low research funding) -> incremental scientific advancement -> breakthrough for new capabilities from multiple incremental advancements to the scientific models over time building on each other (expert systems, LLMs, neutral networks, etc) -> engineering creates new tech products/frameworks/services based on new science -> hype for new tech creates sales and economic activity, research funding, subsidies etc -> (for LLMs we're here) people become familiar with new tech capabilities and limitations through use -> hype spending bubble bursts when overspend doesn't keep up with infinite money line goes up or new research breakthroughs -> AI winter -> etc...

[–] [email protected] 50 points 22 hours ago (1 children)

The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding "seemingly relevant but ultimately inconsequential statements" to the questions

Good thing they're being trained on random posts and comments on the internet, which are known for being succinct and accurate.

[–] [email protected] 21 points 19 hours ago (2 children)

Yeah, especially given that so many popular vegetables are members of the brassica genus

[–] [email protected] 1 points 3 hours ago

Definitely true! And ordering pizza without rocks as a topping should be outlawed, it literally has no texture without it, any human would know that very obvious fact.

[–] [email protected] 6 points 17 hours ago

Absolutely. It would be a shame if AI didn't know that the common maple tree is actually placed in the family cannabaceae.

[–] [email protected] 40 points 1 day ago (4 children)

statistical engine suggesting words that sound like they'd probably be correct is bad at reasoning

How can this be??

[–] [email protected] 19 points 22 hours ago (1 children)

I would say that if anything, LLMs are showing cracks in our way of reasoning.

[–] [email protected] 11 points 17 hours ago (1 children)

Or the problem with tech billionaires selling "magic solutions" to problems that don't actually exist. Or how people are too gullible in the modern internet to understand when they're being sold snake oil in the form of "technological advancement" when it's actually just repackaged plagiarized material.

[–] [email protected] 1 points 9 hours ago

But what if they're wearing an expensive leather jacket

load more comments (3 replies)
[–] [email protected] 26 points 1 day ago* (last edited 1 day ago) (1 children)

I feel like a draft landed on Tim's desk a few weeks ago, explains why they suddenly pulled back on OpenAI funding.

People on the removed superfund birdsite are already saying Apple is missing out on the next revolution.

[–] [email protected] 16 points 20 hours ago (1 children)

"Superfund birdsite" I am shamelessly going to steal from you

load more comments (1 replies)
[–] [email protected] 20 points 1 day ago (1 children)

I hope this gets circulated enough to reduce the ridiculous amount of investment and energy waste that the ramping-up of "AI" services has brought. All the companies have just gone way too far off the deep end with this shit that most people don't even want.

[–] [email protected] 17 points 22 hours ago (2 children)

People working with these technologies have known this for quite awhile. It's nice of Apple's researchers to formalize it, but nobody is really surprised-- Least of all the companies funnelling traincars of money into the LLM furnace.

load more comments (2 replies)
[–] [email protected] 194 points 1 day ago (25 children)
load more comments (25 replies)
[–] [email protected] 9 points 21 hours ago

Someone needs to pull the plug on all of that stuff.

load more comments
view more: next ›