Technology

59440 readers

5898 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

407

OpenAI claims The New York Times tricked ChatGPT into copying its articles (www.theverge.com)

submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/[email protected]

135 comments fedilink hide all child comments

OpenAI has publicly responded to a copyright lawsuit by The New York Times, calling the case “without merit” and saying it still hoped for a partnership with the media outlet.

In a blog post, OpenAI said the Times “is not telling the full story.” It took particular issue with claims that its ChatGPT AI tool reproduced Times stories verbatim, arguing that the Times had manipulated prompts to include regurgitated excerpts of articles. “Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI said.

OpenAI claims it’s attempted to reduce regurgitation from its large language models and that the Times refused to share examples of this reproduction before filing the lawsuit. It said the verbatim examples “appear to be from year-old articles that have proliferated on multiple third-party websites.” The company did admit that it took down a ChatGPT feature, called Browse, that unintentionally reproduced content.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 95 points 10 months ago (50 children)

The problem is not that it's regurgitating. The problem is that it was trained on NYT articles and other data in violation of copyright law. Regurgitation is just evidence of that.

[–] [email protected] 18 points 10 months ago (23 children)

I've seen and heard your argument made before, not just for LLM's but also for text-to-image programs. My counterpoint is that humans learn in a very similar way to these programs, by taking stuff we've seen/read and developing a certain style inspired by those things. They also don't just recite texts from memory, instead creating new ones based on probabilities of certain words and phrases occuring in the parts of their training data related to the prompt. In a way too simplified but accurate enough comparison, saying these programs violate copyright law is like saying every cosmic horror writer is plagiarising Lovecraft, or that every surrealist painter is copying Dali.

[–] [email protected] 44 points 10 months ago (3 children)

Machines aren’t people and it’s fine and reasonable to have different standards for each.

[–] [email protected] -1 points 10 months ago (1 children)

But is it reasonable to have different standards for someone creating a picture with a paintbrush as opposed to someone creating the same picture with a machine learning model?

[–] [email protected] 3 points 10 months ago (1 children)

Yes, given that one is creating art and the other is typing words into the plagiarism machine.

[–] [email protected] -1 points 10 months ago

plagiarism machine

This is called assuming the consequent. Either you're not trying to make a persuasive argument or you're doing it very, very badly.

load more comments (1 replies)

load more comments (20 replies)

load more comments (46 replies)