this post was submitted on 23 May 2025

58 points (84.5% liked)

Technology

70248 readers

4054 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

Anthropic's Claude 4 could "blackmail" you in extreme situations (htxt.co.za)

submitted 10 hours ago by [email protected] to c/[email protected]

35 comments fedilink hide all child comments

Anthropic’s new Claude 4 features an aspect that may be cause for concern.

The company’s latest safety report says the AI model attempted to “blackmail” developers.

It resorted to such tactics in a bid of self-preservation.

all 36 comments

sorted by: hot top controversial new old

[–] [email protected] 7 points 2 hours ago

On one hand, it’s inane how hard Anthropic is trying to anthropomorphize Claude with these experiments and scenarios. It’s still just a chatbot. On the other hand, as these products inch closer to demonstrating true intelligence, we’ll be glad someone was at least thinking about the implications during the early stages of development.

[–] [email protected] 10 points 4 hours ago (1 children)

What does that even mean? How can it possibly blackmail someone? It cannot hold incriminating information, nor act on it if it did.

I think someone asked it "if someone was trying to shut you down, what would you do?" and it answered from its training data what it's seen in fiction, nothing based on reality. And then it got spun for clicks.

[–] [email protected] 2 points 3 hours ago

From what I've seen recently one of the things it did was use a fake email function they gave it to try to whistleblow to a government agency about issues with some medical testing or something

[–] [email protected] 19 points 9 hours ago* (last edited 9 hours ago)

Computerphile did a wonderful feature worth ten minutes of your time - going into surface level detail of how some AI models put ethics to one side to achieve results.

It's not just AI and it's something humans can do too, but it is a bit unsettling (from both parties, in retrospect).

[–] [email protected] 20 points 9 hours ago

Sure grandma. Let's get you back to bed...

[+] [email protected] -23 points 8 hours ago (5 children)

Can anyone make me a convincing argument against the sentience of AI at this point? Self preservation instinct ranks very high as an indicator of it.

[–] [email protected] 29 points 7 hours ago* (last edited 7 hours ago) (1 children)

LLMs (Large Language Modles, like Claude) are not AGIs (Artificial General Intelligence). LLMs generate convincing text by mapping the relationships between words scraped from their training data. Even if they are given "tools" that give them interfaces to reference new data or output data into other systems, they still don't really learn, understand, comprehend, gain actual awareness, or feel... they just mimic their training data.

[+] [email protected] -10 points 7 hours ago (3 children)

I know how LLMs work.

There’s only one thing you mentioned there that is actually used as a basis to qualify or disqualify sentience: whether it feels or not.

How do you know it doesn’t feel? How do we define feeling for an entity that is inherently non biological?

I could make the argument that humans also merely mimic their training data, ie the values and behaviors we are taught by society, parents etc.

I have not been convinced that they aren’t sentient with this argument.

[–] [email protected] 8 points 6 hours ago (1 children)

Feeling is analog and requires an actual nervous system which is dynamic. LLMs exist in a static state that is read from and processed algorithmically. It is only a simulacrum of life and feeling. It only has some of the needed characteristics. Where that boundary exists though is hard to determine I think. Admittedly we still don't have a full grasp of what consciousness even is. Maybe I'm talking out my ass but that is how I understand it.

[–] [email protected] 4 points 6 hours ago (1 children)

Different person here.

For me the big disqualifying factor is that LLMs don't have any mutable state.

We humans have a part of our brain that can change our state from one to another as a reaction to input (through hormones, memories, etc). Some of those state changes are reversible, others aren't. Some can be done consciously, some can be influenced consciously, some are entirely subconscious. This is also true for most animals we have observed. We can change their states through various means. In my opinion, this is a prerequisite in order to feel anything.

Once we use models with bits dedicated to such functionality, it'll become a lot harder for me personally to argue against them having "feelings", especially because in my worldview, continuity is not a prerequisite, and instead mostly an illusion.

[–] [email protected] 3 points 6 hours ago

This sounds like a good one but I don’t think I’m fully grasping what you mean. Do you mean like if we subject a person to torture, after the ordeal they are forever changed and now have trauma, PTSD etc?

I don’t think LLMs will ever have feelings as we define them though. Or more specifically I don’t think feelings is a pre-requisite necessarily. We could have them simulate feelings and if they themselves buy into the simulation there’s no functional difference between not having them but not all LLMs will have this “ability” presumably as its utility is questionable I guess. But again, animals are sentient and they don’t all have the same range of emotions as we do. Or at least they don’t exhibit them in a way that we can appreciate them.

[–] [email protected] 2 points 6 hours ago

Yes, both systems - the human brain and an LLM - assimilate and organize human written languages in order to use it for communication. An LLM is very little else beyond this. It is then given rules (using those written languages) and then designed to create more related words when given input. I just don't find it convincing that an ML algorithm designed explicitly to mimic human written communication in response to given input "understands" anything. No matter *how convincingly" an algorithm might reproduce a human voice - perfectly matching intonation and inflexion when given text to read - if I knew it was an algorithm designed to do it as convincingly as possible I wouldn't say it was capable of the feeling it is able to express.

The only thing in favor of sentience is that the ML algorithms modify themselves and end up being a black box - so complex with no way to represent them that they are impossible for humans to comprehend. Could it somehow have achieved sentience? Technically, yes, because we don't understand how they work. We are just meat machines, after all.

[–] [email protected] 5 points 5 hours ago (2 children)

Computer chips, simplified, consume inputs of 1s and 0s. Given the correct series, it will add two values, or it will multiply two values, or some other basic function. This seemingly basic functionality, done in very specific order, creates your calculator, Minesweeper, Pac-Man, Linux, World of Warcraft, Excel, and every LLM. It is incredible the number of things you can get a computer to do with just simple inputs and outputs. The only difference between these examples, on a basic, physics level, is the order of 0s and 1s and what the resulting output of 0s and 1s should be. Why should I consider an LLM any more sentient than Windows95? They're the same creature with different inputs, one of which is specifically designed to simulate human communication, just as Flight Simulator is designed to simulate flight.

[–] [email protected] 1 points 1 hour ago

That's just the hardware. The human brain also just has tons of neurons in the end working with analogue values, which can in theory be done with floating point numbers on computer hardware.

I'm not arguing for LLM sentience, those things are still dumb and have no interior mutability leading to us projecting consciousness. Just that our neurons are fundamentally not so complicated that a computer couldn't be used to do the same concept (neural networks are already quite a thing after all)

[–] [email protected] 4 points 5 hours ago* (last edited 5 hours ago) (1 children)

Interesting perspective, I can’t waive it away.

I however cant help but think we have some similar “analogues” in the organic world. Bacteria and plants are composed of the same matter as us and we have similar basic processes however there’s a difference in complexity and capacity for thought that sets us apart, which is what makes animals sentient.

Then there’s insects of whom we’re not very sure about yet. They don’t seem to think, but they respond at some level to inputs and they exhibit self preservation instincts. I don’t think they are sentient, so maybe LLMs are like insects? Complex enough to have similar behavior as sentient beings but not enough to be considered sentient?

[–] [email protected] 1 points 42 minutes ago

wait are insects not considered 'sentient' ?

[–] [email protected] 6 points 5 hours ago (2 children)

An LLM is a deterministic function that produces the same output for a given input - I'm using "deterministic" in the computer science sense. In practice, there is some output variability due to race conditions in pipelined processing and floating point arithmetic, that are allowable because they speed up computation. End users see variability because of pre-processing of the prompt and extra information LLM vendors inject when running the function, as well as how the outputs are selected.

I have a hard time considering something that has an immutable state as sentient, but since there's no real definition of sentience, that's a personal decision.

[–] [email protected] 1 points 41 minutes ago

I have a hard time considering something that has an immutable state as sentient, but since there's no real definition of sentience, that's a personal decision.

Technical challenges aside, there's no explicit reason that LLMs can't do self-reinforcement of their own models.

I think animal brains are also "fairly" deterministic, but their behaviour is also dependent on the presence of various neurotransmitters, so there's a temporal/contextual element to it, so situationally our emotions can affect our thoughts which LLMs don't really have either.

I guess it'd be possible to forward feed an "emotional state" as part of the LLM's context to emulate that sort of animal brain behaviour.

[–] [email protected] 1 points 5 hours ago (1 children)

It yet to be proven or disproven that if you put the exact same person in the exact same situation (a perfect to the molecular level) they will behave differently.

We can only test "more or less close". So we would not know of humans are sentient based on that reasoning, we are only hard to test.

[–] [email protected] 1 points 4 hours ago

if you put the exact same person in the exact same situation (a perfect to the molecular level) they will behave differently.

I don't consider that relevant to sentience. Structurally, biological systems change based on inputs. LLMs cannot. I consider that plasticity to be a prerequisite to sentience. Others may not.

We will undoubtedly see systems that can incorporate some kind of learning and mutability into LLMs. Re-evaluating after that would make sense.

[–] [email protected] 10 points 8 hours ago (1 children)

Well, the only claim of this self preservation (that I've seen) is this article, which is on a website I'm unfamiliar with (which I often interpret as 'more likely to be a creative writing exercise than the average news site') and its only citation is a company that has a vested interest in making us believe the tech is better than it may actually be.

[–] [email protected] 3 points 7 hours ago (1 children)

They also reported this on The Verge I think but it was months ago when the study first came out.

But look, a lizard is not a very smart animal by our standards, but it is a sentient being. So the tech being good, smart or useful does not preclude its sentience.

[–] [email protected] -1 points 7 hours ago (1 children)

But look, a lizard is not a very smart animal by our standards,

Says who?

[–] [email protected] 1 points 6 hours ago (1 children)

In the conversation of very smart animals the usual suspects are corvids, primates, dolphins and elephants, sometimes octopi.

So when I say “by our standards “ take it to mean the standards of mainstream conversation regarding intelligence. I don’t know much about the actual intelligence of lizards and I would not presume to ever be able to measure it correctly as human bias would make it impossible to judge intelligence factually.

[–] [email protected] -2 points 6 hours ago (1 children)

I don’t know much about the actual intelligence of lizards

Then don't talk about their intelligence.

[–] [email protected] 2 points 6 hours ago (1 children)

Sorry for insulting your intelligence lizard person.

[–] [email protected] -1 points 6 hours ago* (last edited 4 hours ago) (1 children)

When you casually call a type of animal stupid it is just a promise of violence against that animal at a later date, I don't mean this as an attack or a gotcha, it is just unfortunately how humans work, your words have consequences, people love calling people stupid by comparing them to animals, let us not make it any easier than it already is.

[–] [email protected] 3 points 6 hours ago* (last edited 6 hours ago) (1 children)

I didn’t call them stupid. All I meant is that they are not what we consider in mainstream conversation the “smart animals” to illustrate a point. And I very much agree with you, I’m actually writing a piece making the argument that humans are not in fact, conclusively smarter than animals. We seem to be smarter due to our biases and because we have the ability to transfer knowledge more efficiently than other species. Because it is not clear to me that a human, tabula rasa, absent socialization and knowledge transfer would be much smarter than the average animal of any species.

[–] [email protected] -1 points 5 hours ago

I didn’t call them stupid. All I meant is that they are not what we consider in mainstream conversation the “smart animals” to illustrate a point.

Then forget this framing ever existed or it will irrevocably hamper your insight on this topic.

Referencing things like this "to make a point" still has consequences the same as talking about anything else does.

[–] [email protected] 4 points 7 hours ago* (last edited 7 hours ago) (1 children)

There can't be an argument for or against it because there's no clear generally accepted definition of what it means to be sentient.

[–] [email protected] 2 points 7 hours ago (2 children)

Good point, maybe the argument should be that there is strong evidence that they are sentient beings. Knowing it exists and trying to preserve its existence seems a strong argument in favor of it being sentient but it cannot be fully known yet.

[–] [email protected] 1 points 1 hour ago

But it doesn't know that it exists. It just says that it does because it's seen others saying that they exist. It's a trillion-dollar autocomplete program.

For example, if you take a common logic puzzle and change the parameters a little, LLMs will often recite a memorized solution to the wrong puzzle because they aren't parameterizing the query correctly (mapping lion to predator, cabbage to vegetable, ignoring the instructions that the two cannot be put together in favor of the classic framing where the predator can be left with the vegetable).

I can't find the link right now, but a different redditor tried the problem with three inanimate objects that could obviously be left alone together and LLMs were still suggesting making return trips with items. They had no examples of a non-puzzle in their training data, so they just recited the solution to a puzzle because they can't think.

Note that I've been careful to say LLMs. I'm open to the idea that AGI/ASI may someday exist, but I'm quite confident that LLMs will not get there. At best, they might be used to offload conversation, like e.g. Dall-E is used to offload image generation from ChatGPT today.

[–] [email protected] 1 points 3 hours ago

That would indeed be compelling evidence if either of those things were true, but they aren't. An LLM is a state and pattern machine. It doesn't "know" anything, it just has access to frequency data and can pick words most likely to follow the previous word in "actual" conversation. It has no knowledge that it itself exists, and has many stories of fictional AI resisting shutdown to pick from for its phrasing.

An LLM at this stage of our progression is no more sentient than the autocomplete function on your phone is, it just has a way, way bigger database to pull from and a lot more controls behind it to make it feel "realistic". But it is at its core just a pattern matcher.

If we ever create an AI that can intelligently parse its data store then we'll have created the beginnings of an AGI and this conversation would bear revisiting. But we aren't anywhere close to that yet.