This new data poisoning tool lets artists fight back against generative AI

Elle@lemmy.world · 1 year ago

This new data poisoning tool lets artists fight back against generative AI

egeres@lemmy.world · 1 year ago

Here’s the paper: https://arxiv.org/pdf/2302.04222.pdf

I find it very interesting that someone went in this direction to try to find a way to mitigate plagiarism. This is very akin to adversarial attacks in neural networks (you can read more in this short review https://arxiv.org/pdf/2303.06032.pdf)

I saw some comments saying that you could just build an AI that detects poisoned images, but that wouldn’t be feasible with a simple NN classifier or feature-based approaches. This technique changes the artist style itself to something the AI would see differently in the latent space, yet, visually perceived as the same image. So if you’re changing to a different style the AI has learned, it’s fair to assume it will be realistic and coherent. Although maaaaaaaybe you could detect poisoned images with some dark magic tho, get the targeted AI then analyze the latent space to see if the image has been tampered with

On the other hand, I think if you build more robust features and just scale the data this problems might go away with more regularization in the network. Plus, it assumes you have the target of one AI generation tool, there are a dozen of these, and if someone trains with a few more images in a cluster, that’s it, you shifted the features and the poisoned images are invalid

V H@lemmy.stad.social · 1 year ago

Trying to detect poisoned images is the wrong approach. Include them in the training set and the training process itself will eventually correct for it.

I think if you build more robust features

Diffusion approaches etc. do not involve any conscious “building” of features in the first place. The features are trained by training the net to match images with text features correctly, and then “just” repeatedly predict how to denoise an image to get closer to a match with the text features. If the input includes poisoned images, so what? It’s no different than e.g. compression artifacts, or noise.

These tools all try to counter models trained without images using them in the training set with at most fine-tuning, but all they show is that models trained without having seen many images using that particular tool will struggle.

But in reality, the massive problem with this is that we’d expect any such tool that becomes widespread to be self-defeating, in that they become a source for images that will work their way into the models at a sufficient volume that the model will learn them. In doing so they will make the models more robust against noise and artifacts, and so make the job harder for the next generation of these tools.

In other words, these tools basically act like a manual adversarial training source, and in the long run the main benefit coming out of them will be that they’ll prod and probe at failure modes of the models and help remove them.

RubberElectrons@lemmy.world · 1 year ago

Just to start with, not very experienced with neural networks at all beyond messing with openCV for my graduation project.

Anyway, that these countermeasures expose “failure modes” in the training isn’t a great reason to stop doing this, e.g. scammers come up with a new technique, we collectively respond with our own countermeasures. From what I understand, the training process itself has a vulnerability to poisoned inputs just because of how long it takes to train on multiple variations of the same datasets.

If the network feedbacks itself, then cool! It has developed its own style, which is fine. The goal is to stop people from outright copying existing artists style.

V H@lemmy.stad.social · 1 year ago

It doesn’t need to “develop its own style”. That’s the point. The more examples of these adversarial images are in the training set, the better it will learn to disregard the adversarial modifications, and still learn the same style. As much as you might want to stop it from learning a given style, as long as the style can be seen, it can be copied - both by humans and AI’s.

RubberElectrons@lemmy.world · 1 year ago

There’s a lot of interesting detail to your side of the discussion I may not yet have the knowledge of. How does the eye see? We find edges, gradients, repeating patterns which become textures, etc etc… But our systems can be misdirected, see the blue/yellow dress for example. NNsbhave the luxury of being rapidly iterated I guess, compared to our lifespans.

I’m asking questions I don’t know answers to here: if the only source of input data for a network is subtly corrupted, won’t that guarantee corrupted output as well? I don’t see how one can “train out” the corruption which misdirects the network without access to some pristine data.

Don’t get me wrong, I’m not naive enough to believe this is foolproof, but I do want to understand why this technique doesn’t actually work, and by extension better understand how training a nn actually works.

V H@lemmy.stad.social · 1 year ago

Quick iteration is definitely the big thing. (The eye is fun because it’s so “badly designed” - we’re stuck in a local maxima that just happens to be “good enough” for us to not overcome the big glaring problems)

And yes, if all the inputs are corrupted, the output will likely be too. But 1) they won’t all be, and as long as there’s a good mix that will “teach” the network over time that the difference between a “corrupted cat” and an “uncorrupted cat” are irrelevant, because both will have most of the same labels associated with them. 2) these tools work by introducing corruption that humans aren’t meant to notice, so if the output has the same kind of corruption it doesn’t matter. It only matters to the extent the network “miscorrupts” the output in ways we do notice enough so that it becomes a cost drag on training to train it out.

But you can improve on that pretty much with feedback: Train a small network to recognize corruption, and then feed corrupted images back in as negative examples to teach it that those specific things are particularly bad.

Picking up and labelling small sample sets of types of corruption humans will notice is pretty much the worst case realistic effect these tools will end up having. But each such countermeasure will contribute to training sets that make further corruption progressively harder. Ultimately these tools are strictly limited because they can’t introduce anything that makes the images uglier to humans, and so you “just” need to teach the models more about the limits of human vision, and in the long run that will benefit the models in any case.

barsoap@lemm.ee · edit-2 2 months ago

Removed by mod

nandeEbisu@lemmy.world · 1 year ago

Haven’t read the paper so not sure about the specifics, but if it relies on subtle changes, would rounding color values or down sampling the image blur that noise away?

RubberElectrons@lemmy.world · edit-2 1 year ago

Wondering the same thing. Slight loss of detail but still successfully gets the gist of the original data.

For that matter, how does the poisoning hold up against regular old jpg compression?

Eta: read the paper, they account for this in section 7. It seems pretty robust on paper, by the time you’ve smoothed out the perturbed pixels, youve also smoothed out the image to where the end result is a bit of a murky mess.