AlexanderESmith

joined 5 months ago
[–] [email protected] 5 points 5 months ago (22 children)

"Your honor, we can use whatever data we want because model training is probably fair use, or whatever".

I don't know what's worse, the fact that you think creators don't have the right to dictate how their works are used, or that you apparently have no idea what fair use is.

This might help; https://copyright.gov/fair-use/

[–] [email protected] 7 points 5 months ago

This "fair use" argument is excellent if used specifically in the context of "education, not commercialization". Best one I've seen yet, actually.

The only problem is that perplexity.ai isn't marketing itself as educational, or as a commentary on the work, or as parody. They tout themselves as a search engine. They also have paid "pro" and "enterprise" plans. Do you think they're specifically contextualizing their training data based on which user is asking the question? I absolutely do not.

[–] [email protected] 9 points 5 months ago

In fairness, a lot of the more exceptional engineers I've worked with couldn't write their way out of a wet paper bag.

On top of that, even great technical writers are often bad at picking - or sticking with - an appropriate target audience.

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago) (24 children)

you got some criticism and now you’re saying everyone else is a bot or has an agenda

Please look up ad hominem, and stop doing it. Yes, their responses are a distraction from the topic at hand, but so were the random posts calling OP paranoid. I'd have been on the defensive too.

[Our company] publish[s] open source work ... anyone is free to use it for any purpose, AI training included

Great, I hope this makes the models better. But you made that decision. OP clearly didn't. In fact, they attempted to use several methods to explicitly block it, and the model trainers did it anyway.

I think that the anti-AI hysteria is stupid virtue signaling for luddites

Many loudly outspoken figures against the use of stolen data for the training of generative models work in the tech industry, myself included (I've been in the industry for over two decades). We're far from Luddites.

LLMs are here

I've heard this used as a justification for using them, and reasonable people can discuss the merits of the technology in various contexts. However, this is not a justification for defending the blatant theft of content to train the models.

whether or not they train on your random project isn’t going to affect them in any meaningful way

And yet, they did it while ignoring explicit instructions to the contrary.

there are more than enough fully open source works to train on

I agree, and model trainers should use that content, instead of whatever they happen to grab off every site they happen to scrape.

Better to have your work included so that the LLM can recommend it to people or answer questions about it

I agree if you give permission for model trainers to do so. That's not what happened here.

[–] [email protected] 17 points 5 months ago (7 children)

"The world seeing [their] work" is not equal to "Some random company selling access to their regurgitated content, used without permission after explicitly attempting to block it".

LLMs and image generators - that weren't trained on content that is wholly owned by the group creating the model - is theft.

Not saying LLMs and image generators are innately thievery. It's like the whole "illegal mp3" argument. mp3s are just files with compressed audio. If they contain copyrighted work, and obtained illegitimately, THEN their thievery. Same with content generators.

[–] [email protected] 5 points 5 months ago

Eh. This is not a new argument, and not the first evidence of it. I don't think you're gonna be high on their list of retaliation targets, if you register at all (to say nothing of the low-to-middling reach of the fediverse in general).

Hell, just look at photographers/painters v. image generators, or the novel/article/technical authors v. ... practically all LLMs really, or any other of a dozen major stories about "AI" absorbing content and spitting out huge chunks of essentially unmodified code/writing/images.

[–] [email protected] 6 points 5 months ago

I agree that their replies are a little... over the top. That's all kind of a distraction from the main topic though, isn't it? Do we really need to be rendering armchair diagnoses about someone we know very little about?

I mean, if I posted a legitimate concern - with evidence - and I was dog-piled with a bunch of responses that I was a nutter, I'd probably go on the defensive too. Some people don't know how to handle criticism or stressful interactions, it doesn't mean we should necessarily write them (or their verified concerns) off.

[–] [email protected] 2 points 5 months ago (1 children)

I'm not quite sure who's argument you're making here. It reads like you agree with OP and I (e.g. "LLMs shouldn't be using other people's content without permission", et al).

But you called OP paranoid... I assumed because you thought OP thought their content was being used without their permission. And it's extremely clear that this is what is happening...

What am I missing?

[–] [email protected] 0 points 5 months ago* (last edited 5 months ago) (8 children)

It's not paranoia if you have proof that they're stealing your content without permission or compensation.

You come off as an AI bro apologist. What they're doing isn't okay.

[–] [email protected] 51 points 5 months ago (1 children)

I was wondering what that ominous music was when I woke up this morning

[–] [email protected] 5 points 5 months ago

I always use the browser versions (partly because I don't like installing things, and partly because I run Linux), so it pretty much always shows me away. And I don't care.

[–] [email protected] 4 points 5 months ago

rsync can resume partial transfers, but you really should break that file up. Trying to do it in one go is crazy.

view more: ‹ prev next ›