this post was submitted on 20 Feb 2024
138 points (90.6% liked)
Technology
59374 readers
7409 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
So from my understanding the problem is that there's two ways to implement a kill switch: Either some automatic software/hardware way, or a human-decision based (or I guess a combination of the two).
The automatic way may be enough if it's absolutely foolproof, that's a separate discussion.
The ai box experiment I mention focuses on the human controlled decision to release an AI (or terminate it, which is roughly equivalent preposition). You can read the original here: https://www.yudkowsky.net/singularity/aibox
But the jist of it is this: humans are the weak link. You may think that you have full freedom to decide when to terminate an AI, but if you have any contact with it, even one directional, which would be necessary in order to observe it's behaviour and determine when to trigger said killswitch, a truly trans-human AI would be able to think in meta-terms such that to expose you to information that will change your mind about terminating it.
Basically another way of saying this is that for each of us there exists some set of words we can read, such that they will change our minds about any subject. I don't know if that is actually true to be honest, but it's an interesting idea if you imagine the mind as a complex computer capable of self modification, and that vision/audio is a form of information input that is processed by our minds, so it seems possible that there should always exist some sort of input capable of modifying our minds to a desired state.
Another interesting, slightly related concept, is the idea of basilisk images (I believe originally written in some old scifi short story). Basilisk images are theoretically an image that when viewed by a human cause the brain to "crash" or essentially cause brain-death. This has the same principle behind it, that our brains are complex computers with vision being an input method, so there could be a way to force the brain to crash simply through visual input alone.
Again I don't know, nor do I think anyone really knows for sure if these things - both transhuman ai and basilisk images - are possible in the way they are described. Of course if a trans-human AI existed, by its very definition we would be unable to imagine what it could do.
Anyway, wrote this up on mobile, excuse any typos.
For some good fiction, that puts this in context, check out: