this post was submitted on 21 May 2024
509 points (95.4% liked)

Technology

59374 readers
3794 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 2 points 5 months ago (1 children)

my main question is: how much csam was fed into the model for training so that it could recreate more

i think it'd be worth investigating the training data usued for the model

[โ€“] [email protected] 24 points 5 months ago* (last edited 5 months ago)

This did happen a while back, with researchers finding thousands of hashes of CSAM images in LAION-2B. Still, IIRC it was something like a fraction of a fraction of 1%, and they weren't actually available in the dataset because they had already been removed from the internet.

You could still make AI CSAM even if you were 100% sure that none of the training images included it since that's what these models are made for - being able to combine concepts without needing to have seen them before. If you hold the AI's hand enough with prompt engineering, textual inversion and img2img you can get it to generate pretty much anything. That's the power and danger of these things.