overview for SatanicNotMessianic

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data in c/[email protected]

[–] [email protected] 2 points 1 year ago (7 children)

Describe it. Imagine I’ve never encountered a cat, because I’m from Mars.

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data in c/[email protected]

[–] [email protected] 2 points 1 year ago (9 children)

Could you outline what you think a human cognitive model of “cat” looks like without referring to anything non-cat?

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data in c/[email protected]

[–] [email protected] 5 points 1 year ago (11 children)

No, I disagree. Human knowledge is semantic in nature. “A cat walks across a room” is very close, in semantic space, to “The dog walked through the bedroom” even though they’re not sharing any individual words in common. Cat maps to dog, across maps to through, bedroom maps to room, and walks maps to walked. We can draw a semantic network showing how “volcano” maps onto “migraine” using a semantic network derived from human subject survey results.

LLMs absolutely have a model of “cats.” “Cat” is a region in an N dimensional semantic vector space that can be measured against every other concept for proximity, which is a metric space measure of relatedness. This idea has been leveraged since the days of latent semantic analysis and all of the work that went into that research.

For context, I’m thinking in terms of cognitive linguistics as described by researchers like Fauconnier and Lakoff who explore how conceptual bundling and metaphor define and constrain human thought. Those concepts imply that a realization can be made in a metric space such that the distance between ideas is related to how different those ideas are, which can in turn be inferred by contextual usage observed over many occurrences.

The biggest difference between a large model (as primitive as they are, but we’re talking about model-building as a concept here) and human modeling is that human knowledge is embodied. At the end of the day we exist in a physical, social, and informational universe that a model trained on the artifacts can only reproduce as a secondary phenomenon.

But that’s world apart from saying that the cross-linking and mutual dependencies in a metric concept-space is not remotely analogous between humans and large models.

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data in c/[email protected]

[–] [email protected] 10 points 1 year ago (13 children)

It’s actually because they do know things in a way that’s analogous to how people know things.

Let’s say you wanted to forget that cats exist. You’d have to forget every cat meme you’ve ever seen, of course, but your entire knowledge of memes would also have to change. You’d have to forget that you knew how a huge part of the trend started with “i can haz cheeseburger.”

You’d have to forget that you owned a cat, which will change your entire memory of your life history about adopting the cat, getting home in time to feed it, and how it interacted with your other animals or family. Almost every aspect of your life is affected when you own an animal, and all of those would have to somehow be remembered in a no-cat context. Depending on how broadly we define “cat,” you might even need to radically change your understanding of African ecosystems, the history of sailing, evolutionary biology, and so on. Your understanding of mice and rats would have to change. Your understanding of dogs would have to change. Your memory of cartoons would have to change - can you even remember Jerry without Tom? Those are just off the top of my head at 8 in the morning. The ramifications would be huge.

Concepts are all interconnected, and that’s how this class of AI works. I’ve owned cars most of my life, so it’s a huge part of my personal memory and self-definition. They’re also ubiquitous in culture. Hundreds of thousands to millions of concepts relate to cats in some way, and each one of them would need to change, as would each concept that relates to those concepts. Pretty much everything is connected to everything else and as new data are added, they’re added in such a way that they relate to virtually everything that’s already there. Removing cats might not seem to change your knowledge of quarks, but there’s some very very small linkage between the two.

Smaller impact memories are also difficult. That guy with the weird mustache you saw during your vacation to Madrid ten years ago probably doesn’t have that much of a cascading effect, but because Esteban (you never knew his name) has such a tiny impact, it’s also very difficult to detect and remove. His removal won’t affect much of anything in terms of your memory or recall, but if you’re suddenly legally obligated to demonstrate you’ve successfully removed him from your memory, it will be tough.

Basically, the laws were written at a time when people were records in a database and each had their own row. Forgetting a person just meant deleting that row. That’s not the case with these systems.

The thing is that we don’t compel researchers to re-train their models on a data set if someone requests their removal. If you have traditional research on obesity, for instance, and you have a regression model that’s looking at various contributing factors, you do not have to start all over again if someone requests their data be deleted. It should mean that the person’s data are removed from your data set it it doesn’t mean that you can’t continue to use that model - at least it never has, to my knowledge. Your right to be forgotten doesn’t translate to you being allowed to invalidate the scientific models generated that glom together your data with that of tens of thousands of others. You can be left out of the next round of research on that dataset, but I have never heard of people being legally compelled to regenerate a model based on that.

There are absolutely novel legal questions that are going to be involved here, but I just wanted to clarify that it’s really not a simple answer from any perspective.

Please stop with the fearmongering, if you are a US lemmy host follow preserve and report guidelines, stay vigilant, follow ongoing discussions of fixes and solutions and you should be fine. in c/[email protected]

[–] [email protected] 3 points 1 year ago

Yeah, my client crashed when I was trying to edit it. Thanks for the reminder!

Please stop with the fearmongering, if you are a US lemmy host follow preserve and report guidelines, stay vigilant, follow ongoing discussions of fixes and solutions and you should be fine. in c/[email protected]

[–] [email protected] 20 points 1 year ago* (last edited 1 year ago) (3 children)

There should be a full write up from a lawyer - or, better yet, an organization like the EFF. Because lemmy.world is such a prominent instance, it would probably garner some attention if the people who run it were to approach them.

People would still have to decide what their own risk tolerances are. Some might think that even if safe harbor applies, getting swatted or doxxed just isn’t worth the risk.

Others might look at it, weigh their rights under the current laws, and decide it’s important to be part of the project. A solid communication on the specific application of S230 to a host of a federated service would go a long way.

I worked as a sys admin for a while in college in the mid-90s, and it was a time when ISPs were trying to get considered common carriers. Common carrier covers phone companies from liability if people use their service to commit crimes. The key provision of common carrier status was that the company exercised no control whatsoever over what went across their wires.

In order to make the same argument, the systems I helped manage had a policy of no policing. You could remove a newsgroup from usenet, but you couldn’t any other kind of content oriented filtering. The argument went that as soon as you start moderating, you’re now responsible for moderating it all. True or not, that’s the argument made and policy adopted on multiple university networks and private ISPs. And to be clear, we’re not talking about a company like facebook or reddit which have full control over their content. We’re talking things like the web in general, such as it was, and usenet.

Usenet is probably the best example, and I knew some BBS operators who hosted usenet content. The only BBS owners that got arrested (as far as I know) were arrested for being the primary host of illegal material.

S230 or otherwise, someone should try to get a pro bono from a lawyer (or lawyers) who know the subject.

Edit: Looks like EFF already did a write up. With the amount of concerned people posting on this optic, this link should be in every official reply and as a post in the topic.

Elon Musk’s FSD v12 demo includes a near miss at a red light and doxxing Mark Zuckerberg — 45-minute video was meant to demonstrate v12 of Tesla’s Full Self-Driving but ended up being a list of thi... in c/[email protected]

[–] [email protected] 15 points 1 year ago (1 children)

That’s not how it works, unfortunately. That’s how people want it to work, but it’s not how it works.

This is just more of Elon’s pie in the sky.

Elon Musk’s FSD v12 demo includes a near miss at a red light and doxxing Mark Zuckerberg — 45-minute video was meant to demonstrate v12 of Tesla’s Full Self-Driving but ended up being a list of thi... in c/[email protected]

[–] [email protected] 26 points 1 year ago (3 children)

I do this kind of thing for a living, and have done so for going on 30 years. I study complex systems and how they use learning and adaptation.

Musk’s approach to these systems is idiotic and shows no understanding of or appreciation for how complex systems - animals, in particular - actually work. He wanted to avoid giving his vehicles lidar, for instance, because animals can navigate the world without it. Yet he didn’t give them either the perceptual or cognitive capabilities that animals have, nor did he take into account the problems of animal locomotion being solved by evolution are very different from the problems solved by people driving vehicles. It, of course, didn’t work, and now Tesla is trailing the pack on self-driving capabilities with the big three German car makers and others prepping class 3 vehicles for shipping.

If he is trying to chatgpt his way out of the corner he’s painted himself into, he’s just going to make it worse - and, amusingly, for the same reasons. Vision is just one dimension of sensation, and cars are not people, or antelopes, or fish, or whatever his current analogy is.

This is just Elon Eloning again. No one predicts a car coming towards them is going to do a California stop at a stop sign. If Om pulling into an intersection and I see someone rolling through a stop sign, I’m hitting the brakes because obviously a) they didn’t see me and b) they don’t know the rules of the road. Elon’s cars have a problem with cross traffic and emergency vehicles anyway, making the logic fuzzier is not going to improve the situation. If he thinks throwing video and telemetry data at a large model is going to overcome his under-engineered autonomous system, I suspect he’s going to be in for a rude discovery.

If there’s anything kids today can learn from Elon (or from Trump for that matter), it’s how to be so confidently wrong that people throw money at you. The problem is that if you’re not already born into wealth and privilege, you’re likely to merely become the owner of the most successful line of car dealerships in a suburban county in Pennsylvania, or else in prison for fraud.

X, formerly Twitter, faces 2,200 arbitration cases and filing fees over $3 million in c/[email protected]

[–] [email protected] 15 points 1 year ago

If Elmo owned up to and apologized for his transphobia, resolving not to do that kind of thing again in the future, I would be more than happy to call his microblogging service whatever he ends up deciding to name it. I’m not sure this is going to be enough to convince him, but go ahead and forward that along if you think it will help.

You are free to go, I guess... in c/[email protected]

[–] [email protected] 9 points 1 year ago

You still can’t use the 5th to infer anything about the defendant in a criminal case. In a civil case, the court can take a person’s refusal to answer into account.

X, formerly Twitter, faces 2,200 arbitration cases and filing fees over $3 million in c/[email protected]

[–] [email protected] 58 points 1 year ago (2 children)

I continue to use “twitter” because Musk is a transphobe. If he feels obligated to deadname or misgender people, or defend those who do, I don’t see the need to follow what he wants to identify as, either.

[Rant] It is way too time consuming to clean up your digital presence in c/[email protected]

[–] [email protected] 11 points 1 year ago (1 children)

The problem is twofold. The first part is that companies cannot be trusted to act in good faith when it comes to complying with the intent of laws they disagree with. This doesn’t apply to every company, but it applies to enough of them to make life difficult. I think it was Enron who, when ordered to supply prosecutors with emails, opted to print them out and hand over reams of paper that then had to be re-scanned. This is the same approach as companies that require physical mail to delete a record and who only do so for locations where it’s required by law. There’s no reason that it cannot be done more easily with a login and password. When I was deleting my reddit accounts, I had to use a script to delete all of my posts and comments because reddit did not support that functionality.

The second, related problem is that the legislators writing the laws aren’t skilled technologists, and that technology keeps evolving. It’s like having people with no background in finance writing laws to regulate wall street (which also happens). Cynical people might think this is seen as a feature not a bug.