this post was submitted on 28 Mar 2025
298 points (96.3% liked)

Technology

68245 readers
6559 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 8 points 4 days ago (1 children)

There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.

By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.

[–] [email protected] 0 points 4 days ago (1 children)

Stable Diffusion was trained on the LIAON-5B image dataset, which as the name implies has around 5 billion images in it. The resulting model was around 3 gigabytes. If this is indeed a "compression" algorithm then it's the most magical and physics-defying ever, as it manages to compress images to less than one byte each.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on.

That is a completely separate issue. You can sue them for copyright violation regarding the actual acts of copyright violation. If an artist steals a bunch of art books to study then sue him for stealing the art books, but you can't extend that to say that anything he drew based on that learning is also a copyright violation or that the knowledge inside his head is a copyright violation.

[–] [email protected] 1 points 4 days ago* (last edited 4 days ago) (1 children)

There's a difference between lossy and lossless. You can compress anything down to a single bit if you so wish, just don't expect to get everything back. That's how lossy compression works.

[–] [email protected] 1 points 4 days ago

It's perfectly legal to compress something to a single bit and publish it.

Hell, if I take and publish the average color of any copyrighted image that is at least 24 bits. That's lossy compression yet legal.