spoiler
Since ChatGPT burst onto the scene nearly a year ago, the generative AI era has kicked into high gear, but so too has the opposition.
A number of artists, entertainers, performers and even record labels have filed lawsuits against AI companies, some against ChatGPT maker OpenAI, based on the “secret sauce” behind all these new tools: training data. That is, these AI models would not work without accessing large amounts of multimedia and learning from it, including written material and images produced by artists who had no prior knowledge, nor were given any chance to oppose their work being used to train new commercial AI products.
In the case of these AI model training datasets, many include material scraped from the web, a practice that artists previously by-and-large supported when it was used to index their material for search results, but which now many have come out against because it allows the creation of competing work through AI.
But even without filing lawsuits, artists have a chance to fight back against AI using tech. MIT Technology Review got an exclusive look at a new open source tool still in development called Nightshade, which can be added by artists to their imagery before they upload it to the web, altering pixels in a way invisible to the human eye, but that “poisons” the art for any AI models seeking to train on it.
Where Nightshade came from
Nightshade was developed by University of Chicago researchers under computer science professor Ben Zhao and will be added as an optional setting to their prior product Glaze, another online tool that can cloak digital artwork and alter its pixels to confuse AI models about its style.
In the case of Nightshade, the counterattack for artists against AI goes a bit further: it causes AI models to learn the wrong names of the objects and scenery they are looking at.
For example, the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat.
After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances.
After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a dog returned a near perfect looking cat.
The researchers used Stable Diffusion, an open source text-to-image generation model, to test Nightshade and obtain the aforementioned results.
Thanks to the nature of the way generative AI models work — by grouping conceptually similar words and ideas into spatial clusters known as “embeddings” — Nightshade also managed to trick Stable Diffusion into returning cats when prompted with the words “husky,” “puppy” and “wolf.”
Moreover, Nightshade’s data poisoning technique is difficult to defend against, as it requires AI model developers to weed out any images that contain poisoned pixels, which are by design not obvious to the human eye and may be difficult even for software data scraping tools to detect.
Any poisoned images that were already ingested for an AI training dataset would also need to be detected and removed. If an AI model were already trained on them, it would likely need to be re-trained.
While the researchers acknowledge their work could be used for malicious purposes, their “hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists’ copyright and intellectual property,” according to the MIT Tech Review article on their work.
Hours after MIT Tech Review published its article, the Glaze project from Zhao’s team at the University of Chicago posted a thread of short messages on the social platform X (formerly Twitter) explaining more about the impetus for Nightshade and how it works. The “power asymmetry between AI companies and content owners is ridiculous,” they posted.
The researchers have submitted a paper on Nightshade for peer review to computer security conference Usinex, according to the report.