this post was submitted on 18 Oct 2023
98 points (96.2% liked)

Technology

59374 readers
3463 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Vechev and his team found that the large language models that power advanced chatbots can accurately infer an alarming amount of personal information about users—including their race, location, occupation, and more—from conversations that appear innocuous.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 29 points 1 year ago (14 children)

“It's not even clear how you fix this problem,” says Martin Vechev, a computer science professor at ETH Zürich in Switzerland who led the research.

You fix this problem with locally-run models that do not send your conversations to a cloud provider. That is the only real technical solution.

Unfortunately, the larger models are way too big to run client-side. You could launder your prompts through a smaller LLM to standardize phrasing (e.g. removing idiosyncrasies or local dialects), but there's only so far you can go with that, because language is deeply personal, and the things people will use chatbots for are deeply personal.

This is by no means exclusive to LLMs, of course. Google has your lifetime search history and they can glean all kinds of information from that alone. If you're older than ~30 or so, you might remember these same conversations from when Gmail first launched. You'd have to be crazy to let Google store all your personal emails for all eternity! And yet everybody does it (myself included, though I'm somewhat ashamed to admit it).

This same problem exists with pretty much any cloud service. When you send data to a third party, they're going to have that data. And I guarantee you are leaking more information about yourself than you realize. You can even tell someone's age and gender with fairly high accuracy from a small sample of their mouse movements.

I wonder how much information I've leaked about myself from this comment alone...

[–] [email protected] 4 points 1 year ago

Unfortunately, the larger models are way too big to run client-side.

There is some hope. Mistral is pretty incredible for it's size, and it's a 7b model. There are finetunes on top of that which makes it even better - my favorite right now is Open Hermes 2

There's still room for improvement, and it's getting better and better.

load more comments (13 replies)