this post was submitted on 21 Jan 2024

827 points (95.0% liked)

Technology

69421 readers

3037 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

827

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use (venturebeat.com)

submitted 1 year ago by [email protected] to c/[email protected]

277 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 223 points 1 year ago (3 children)

Reminder that this is made by Ben Zhao, the University of Chicago professor who stole open source code for his last data poisoning scheme.

[–] [email protected] 67 points 1 year ago (2 children)

Pardon my ignorance but how do you steal code if it's open source?

[–] [email protected] 224 points 1 year ago (9 children)

You don’t follow the license that it was distributed under.

Commonly, if you use open source code in your project and that code is under a license that requires your project to be open source if you do that, but then you keep yours closed source.

load more comments (9 replies)

[–] [email protected] 78 points 1 year ago (5 children)

He took GPLv3 code, which is a copyleft license that requires you share your source code and license your project under the same terms as the code you used. You also can't distribute your project as a binary-only or proprietary software. When pressed, they only released the code for their front end, remaining in violation of GPLv3.

load more comments (5 replies)

[–] [email protected] 25 points 1 year ago* (last edited 1 year ago)

And as I said there, it is utterly hypocritical for him to sell snake oil to artists, allegedly to help them fight copyright violations, while committing actual copyright violations.

load more comments (1 replies)

[–] [email protected] 106 points 1 year ago (13 children)

Is there a similar tool that will "poison" my personal tracked data? Like, I know I'm going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn't know if I'm a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?

[–] [email protected] 126 points 1 year ago (3 children)

The only way to taint your behavioral data so that you don’t get lumped into a targetable cohort is to behave like a manic. As I’ve said in a past comment here, when you fill out forms, pretend your gender, race, and age is fluid. Also, pretend you’re nomadic. Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.

Your data will be absolute trash. You’ll also be miserable because you’re going to be visiting the Amazon drop off center with gag balls and porcelain Jesus figurines to return every week.

[–] [email protected] 42 points 1 year ago (1 children)

Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.

...in the same transaction. It all needs to be bought and then shipped together. Not only to fuck with the algorithm, but also to fuck with the delivery guy. Because we usually know what you ordered. Especially when it's in the soft bag packaging. Might as well make everyone outside your personal circle think you're a bit psychologically disturbed, just to be safe.

[–] [email protected] 20 points 1 year ago (1 children)

How? Aren't most items in boxes even in the bags? It's not like they just toss a butt plug into a bag and ship it...right?

load more comments (1 replies)

load more comments (2 replies)

[–] [email protected] 35 points 1 year ago (1 children)

The browser addon "AdNauseum" can help with that, although it's not a complete solution.

[–] [email protected] 28 points 1 year ago (4 children)

That and trackmenot.

It searches random shit in the background.

https://www.trackmenot.io/

load more comments (4 replies)

[–] [email protected] 19 points 1 year ago (2 children)

Is there a similar tool that will “poison” my personal tracked data? Like, I know I’m going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn’t know if I’m a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?

Have you considered just being utterly incoherent, and not making sense as a person? That could work.

[–] [email protected] 27 points 1 year ago

According to my exes, yes.

load more comments (1 replies)

load more comments (10 replies)

[–] [email protected] 89 points 1 year ago (48 children)

The tool's creators are seeking to make it so that AI model developers must pay artists to train on data from them that is uncorrupted.

That's not something a technical solution will work for. We need copyright laws to be updated.

[–] [email protected] 25 points 1 year ago (14 children)

You should check out this article by Kit Walsh, a senior staff attorney at the EFF. The EFF is a digital rights group who recently won a historic case: border guards now need a warrant to search your phone.

A few quotes:

First, copyright law doesn’t prevent you from making factual observations about a work or copying the facts embodied in a work (this is called the “idea/expression distinction”). Rather, copyright forbids you from copying the work’s creative expression in a way that could substitute for the original, and from making “derivative works” when those works copy too much creative expression from the original.

Second, even if a person makes a copy or a derivative work, the use is not infringing if it is a “fair use.” Whether a use is fair depends on a number of factors, including the purpose of the use, the nature of the original work, how much is used, and potential harm to the market for the original work.

and

Even if a court concludes that a model is a derivative work under copyright law, creating the model is likely a lawful fair use. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. Here, the fact that the model is used to create new works weighs in favor of fair use as does the fact that the model consists of original analysis of the training images in comparison with one another.

load more comments (14 replies)

load more comments (47 replies)

[–] [email protected] 58 points 1 year ago (38 children)

Explanation of how this works.

These "AI models" (meaning the free and open Stable Diffusion in particular) consist of different parts. The important parts here are the VAE and the actual "image maker" (U-Net).

A VAE (Variational AutoEncoder) is a kind of AI that can be used to compress data. In image generators, a VAE is used to compress the images. The actual image AI only works on the smaller, compressed image (the latent representation), which means it takes a less powerful computer (and uses less energy). It’s that which makes it possible to run Stable Diffusion at home.

This attack targets the VAE. The image is altered so that the latent representation is that of a very different image, but still roughly the same to humans. Say, you take images of a cat and of a dog. You put both of them through the VAE to get the latent representation. Now you alter the image of the cat until its latent representation is similar to that of the dog. You alter it only in small ways and use methods to check that it still looks similar for humans. So, what the actual image maker AI "sees" is very different from the image the human sees.

Obviously, this only works if you have access to the VAE used by the image generator. So, it only works against open source AI; basically only Stable Diffusion at this point. Companies that use a closed source VAE cannot be attacked in this way.

I guess it makes sense if your ideology is that information must be owned and everything should make money for someone. I guess some people see cyberpunk dystopia as a desirable future. I wonder if it bothers them that all the tools they used are free (EG the method to check if images are similar to humans).

It doesn’t seem to be a very effective attack but it may have some long-term PR effect. Training an AI costs a fair amount of money. People who give that away for free probably still have some ulterior motive, such as being liked. If instead you get the full hate of a few anarcho-capitalists that threaten digital vandalism, you may be deterred. Well, my two cents.

[–] [email protected] 20 points 1 year ago* (last edited 1 year ago)

So, it only works against open source AI; basically only Stable Diffusion at this point.

I very much doubt it even works against the multitude of VAEs out there. There's not just the ones derived from StabilitiyAI's models but ones right now simply intended to be faster (at a loss of quality): TAESD can also encode and has a completely different architecture thus is completely unlikely to be fooled by the same attack vector. That failing, you can use a simple affine transformation to convert between latent and rgb space (that's what "latent2rgb" is) and compare outputs to know whether the big VAE model got fooled into generating something unrelated. That thing just doesn't have any attack surface, there's several magnitudes too few weights in there.

Which means that there's an undefeatable way to detect that the VAE was defeated. Which means it's only a matter of processing power until Nightshade is defeated, no human input needed. They'll of course again train and try to fool the now hardened VAE, starting another round, ultimately achieving nothing but making the VAE harder and harder to defeat.

It's like with Russia: They've already lost the war but they haven't noticed, yet -- though I wouldn't be too sure that Nightshade devs themselves aren't aware of that: What they're doing is a powerful way to grift a lot of money from artists without a technical bone in their body.

load more comments (37 replies)

[–] [email protected] 55 points 1 year ago (2 children)

Begun, the AI Wars have.

[–] [email protected] 23 points 1 year ago (1 children)

Excited to see the guys that made Nightshade get sued in a Silicon Valley district court, because they're something something mumble mumble intellectual property national security.

[–] [email protected] 47 points 1 year ago (12 children)

They already stole GPLv2 code for their last data poisoning scheme and remain in violation of that license. They're just grifters.

load more comments (12 replies)

load more comments (1 replies)

[–] [email protected] 53 points 1 year ago (8 children)

This doesn't work outside of laboratory conditions.

It's the equivalent of "doctors find cure for cancer (in mice)."

[–] [email protected] 18 points 1 year ago (2 children)

I like that example, everytime you hear about some discovery that x kills 100% of cancer cells in a petri dish. You always have to think, so does bleach.

load more comments (2 replies)

load more comments (7 replies)

[–] [email protected] 45 points 1 year ago (3 children)

Apparently people who specialize in AI/ML have a very hard time trying to replicate the desired results when training models with 'poisoned' data. Is that true?

[–] [email protected] 42 points 1 year ago* (last edited 1 year ago) (7 children)

I've only heard that running images through a VAE just once seems to break the Nightshade effect, but no one's really published anything yet.

You can finetune models on known bad and incoherent images to help it to output better images if the trained embedding is used in the negative prompt. So there's a chance that making a lot of purposefully bad data could actually make models better by helping the model recognize bad output and avoid it.

load more comments (7 replies)

load more comments (2 replies)

[–] [email protected] 41 points 1 year ago (3 children)

In the long run this will only improve the strength of models as they adapt to the changes this introduces and get that much stronger for it.

load more comments (3 replies)

[–] [email protected] 41 points 1 year ago (2 children)

It's not FOSS and I don't see a way to review if what they claim is actually true.

It may be a way to just help to diferentiate legitimate human made work vs machine-generated ones, thus helping AI training models.

Can't demostrate that fact neither, because of its license that expressly forbids sofware adaptions to other uses.

Edit, alter, modify, adapt, translate or otherwise change the whole or any part of the Software nor permit the whole or any part of the Software to be combined with or become incorporated in any other software, nor decompile, disassemble or reverse engineer the Software or attempt to do any such things

sauce: https://nightshade.cs.uchicago.edu/downloads.html

[–] [email protected] 19 points 1 year ago (1 children)

The EULA also prohibits using Nightshade "for any commercial purpose", so arguably if you make money from your art—in any way—you're not allowed to use Nightshade to "poison" it.

load more comments (1 replies)

[–] [email protected] 40 points 1 year ago (18 children)

Fascinating that they develop this tool and then only release Windows and MacOS versions.

load more comments (18 replies)

[–] [email protected] 33 points 1 year ago

Ironic that they used an AI picture for the article...

[–] [email protected] 31 points 1 year ago (12 children)

big companies already have all your uncorrupted artwork, all this does is eliminate any new competition from cropping up.

load more comments (12 replies)

[–] [email protected] 27 points 1 year ago

I bet that before the end of this year this tool will be one of the things that helped improve the performance and quality of AI.

[–] [email protected] 22 points 1 year ago (8 children)

is anyone else excited to see poisoned AI artwork? This might be the element that makes it weird enough.

Also, re: the guy lol'ing that someone says this is illegal - it might be. is it wrong? absolutely not. does the woefully broad computer fraud and abuse act contain language that this might violate? it depends, the CFAA has two requirements for something to be in violation of it.

the act in question affects a government computer, a financial institution's computer, OR a computer "which is used in or affecting interstate or foreign commerce or communication" (that last one is the biggie because it means that almost 100% of internet activity falls under its auspices)
the act "knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;" (with 'protected computer' being defined in 1)

Quotes are from the law directly, as quoted at https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act

the poisoned artwork is information created with the intent of causing it to be transmitted to computers across state or international borders and damaging those computers. Using this technique to protect what's yours might be a felony in the US, and because it would be considered intentionally damaging a protected computer by the knowing transmission of information designed to cause damage, you could face up to 10 years in prison for it. Which is fun because the people stealing from you face absolutely no retribution at all for their theft, they don't even have to give you some of the money they use your art to make, but if you try to stop them you go to prison for a decade.

The CFAA is the same law that Reddit co-founder Aaron Swartz was prosecuted under. His crime was downloading things from JSTOR that he had a right to download as an account holder, but more quickly than they felt he should have. He was charged with 13 felonies and faced 50 years and over a million dollars in fines alongside a lifetime ban from ever using an internet connected computer again when he died by suicide. The charges were then dropped.

[–] [email protected] 19 points 1 year ago (2 children)

It's not damaging a computer, it's poisoning the models ai uses to create the images. The program will work just fine, and as expected given the model that it has, the difference is the model might not be accurate. It's like saying you're breaking a screen if you're now looking at a low res version of an image

load more comments (2 replies)

load more comments (7 replies)

[–] [email protected] 22 points 1 year ago

Oily snakes slither such that back and forth looks like production..

[–] [email protected] 17 points 1 year ago

Ah, another arms race has begun. Just be wary, what one person creates another will circumvent.

load more comments