this post was submitted on 21 Feb 2024
165 points (95.1% liked)

Technology

59374 readers
7113 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Why The New York Times might win its copyright lawsuit against OpenAI::The AI community needs to take copyright lawsuits seriously.

top 39 comments
sorted by: hot top controversial new old
[–] [email protected] 61 points 8 months ago* (last edited 8 months ago) (3 children)

Some of the prior cases described in this article, as precedents that could spell trouble for OpenAI, frankly sound like miscarriages of justice. Using copyright to prevent organizations from photocopying articles for internal use? What the heck?

If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.

[–] [email protected] 28 points 8 months ago (2 children)

You are not wrong that monopolies granted by copyright are regularly and unfairly abused.

That being said, AI trainers are getting away with plagiarism right now. More importantly, it's not just violation of a single copy, it's potentially the creation of tools that enable mass derivative copies. Authors that create training data need to be compensated.

[–] [email protected] -4 points 8 months ago (1 children)

Authors that create training data need to be compensated.

There should not be a problem with that. The people who work on training datasets are already being paid.

The reason you are getting downvoted is that these lawsuits are not about that. These are about giving money to corporations like the NYT - or Reddit, or Facebook, etc - for the "intellectual property" that they already have lying around. It's pure grift.

Because the creation of all that is already paid for, that leaves all the more money for lawyers and PR campaigns to extract money for nothing from society.

[–] [email protected] 15 points 8 months ago (1 children)

There should not be a problem with that. The people who work on training datasets are already being paid.

How are the people whose articles and comments are being scraped compensated?

Because the creation of all that is already paid for

"This perfectly good movie has already been made and paid for, that means I can watch it without compensating the studio."

I do not agree with Reddit selling the comments of their users. Even so that's a ridiculous statement to make.

[–] [email protected] 6 points 8 months ago

If anything, my take home message is that the reach of copyright law is too long and needs to be taken down a peg.

Exactly! Copyright law is terrible. We need to hold AI companies to the same standard that everyone else is held. Then we might actually get big corporations lobbying to improve copyright law for once. Giving them a free pass right now would be a terrible waste of an opportunity in addition to being an injustice.

[–] [email protected] 1 points 8 months ago

I think the photocopying thing models fairly well with user licenses for software. Without commenting on whether that's right in the grand scheme of things, I can see that as analogous. Most folks accept that they need individual user licenses for software right? I get that photocopying can't be controlled the same way software can but the case was in the 90s? I mean these things aren't about whether the provider of the article/software faces increased marginal cost for additional copies/users but that the user/company is getting more use than they paid for. License agreements. Seems like a problem with the terms of licenses and laws rather than how they were judged as following them or not. Their use didn't seem to be transformative and the for profit nature of their use sort of overruled the "research" fair use.

I also think the mp3.com thing sucks, but again, the way the law is, that's a reasonable/logical outcome. Same thing that will kill someone offering ebooks to people who show a proof of purchase.

I don't know the solution to the situation with NYT/open AI. It's a pretty bad look to be able to spit out an article nearly verbatim. We do need copyright reform, but I think that's at the feet of the legislators, not judges. I only need to see the recent Alabama IVF court ruling to be reminded of the danger of more... interpretative rulings.

[–] [email protected] 7 points 8 months ago

This is the best summary I could come up with:


In its blog post responding to the Times lawsuit, OpenAI wrote that “training AI models using publicly available Internet materials is fair use, as supported by long-standing and widely accepted precedents.”

The most important of these precedents is a 2015 decision that allowed Google to scan millions of copyrighted books to create a search engine.

Stability AI and Anthropic will undoubtedly make similar arguments as they face copyright lawsuits of their own.

But fewer people remember MP3.com, a music startup that tried harder to color inside the lines but still got crushed in the courts.

When a customer wanted to add a CD to their collection, they would put it in their CD-ROM drive just long enough to prove they owned it.

“Defendant purchased tens of thousands of popular CDs in which plaintiffs held the copyrights, and, without authorization, copied their recordings onto its computer servers,” wrote Judge Jed Rakoff in a decision against MP3.com.


The original article contains 644 words, the summary contains 155 words. Saved 76%. I'm a bot and I'm open source!