Technology

59421 readers

2850 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

435

Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data (www.404media.co)

submitted 11 months ago by [email protected] to c/[email protected]

97 comments fedilink hide all child comments

ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 46 points 11 months ago (38 children)

And just the other day I had people arguing to me that it simply wasn't possible for ChatGPT to contain significant portions of copyrighted work in its database.

[–] [email protected] 5 points 11 months ago (1 children)

yea this "attack" could potentially sink closedAI with lawsuits.

[–] [email protected] 10 points 11 months ago

This isn't just an OpenAI problem:

We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT...

If a model uses copyrighten work for training without permission, and the model memorized it, that could be a problem for whoever created it, open, semi open, or closed source.

load more comments (36 replies)