Fuck that guy first of all.
What makes me think is, what about all that cartoon porn showing cartoon kids? What about hentai showing younger kids? What’s the difference if all are fake and being distributed online as well?
Not defending him.
I think there’s certainly an argument here. What if the hentai was more lifelike? What if the AI stuff was less realistic? Where’s the line?
At least in the US, courts have been pretty shitty at defining things like “obscenity”. This AI stuff might force them to delineate more clearly.
What if someone draws their own CSAM and they’re terrible at drawing but it’s still recognizable as CSAM?
Ethically is one question, but the law is written such that it’s pretty narrowly covering only photograph-style visual depictions that are virtually indistinguishable from an actual child engaged in explicit conduct in the view of a reasonable person that is also lacking in any other artistic or cultural significance.
Or in short: if it looks like an actual image of actual children being actually explicit, then it’s illegal.Makes sense.
So it’s all good as long as they have elf ears or that counts as realistic too?
Two things:
- please don’t generate child like pornography. Legal or not it’s disturbing and gross to even think about.
- Yes, per the law it must be “virtually indistinguishable”. “the term ‘indistinguishable’ used with respect to a depiction, means virtually indistinguishable, in that the depiction is such that an ordinary person viewing the depiction would conclude that the depiction is of an actual minor engaged in sexually explicit conduct. This definition does not apply to depictions that are drawings, cartoons, sculptures, or paintings depicting minors or adults.”. If it looks like a “real” elf and not a child wearing an elf costume it would be fine.
So long as an ordinary person would know that it’s not a real child being abused, or a real child being depicted (placing a real child’s face on a compromising photo), it’s protected, albeit extremely unpleasant, speech.
While I think Hentai showing that stuff is disgusting AI is worse because you need to get the training material from somewhere so its far from victimless. Edit: I just learned that it does not have to be in the dataset though there should be regulations that forces the companies to open source the data set.
It’s not what’s used by all of them, but it’s pretty popular.
One thing to consider, if this turned out to be accepted, it would make it much harder to prosecute actual csam, they could claim “ai generated” for actual images
I get this position, truly, but I struggle to reconcile it with the feeling that artwork of something and photos of it aren’t equal. In a binary way they are, but with more precision they’re pretty far apart. But I’m not arguing against it, I’m just not super clear how I feel about it yet.
I’m a professional artist and have no issue banning ai generated CSAM. People can call it self expression if they want, but that doesn’t change the real world consequences of it.
Allowing ai generated CSAM basically creates camouflage for real CSAM. As ai gets more advanced it will become harder to tell the difference. The scum making real CSAM will be emboldened to make even more because they can hide it amongst the increasing amounts of ai generated versions, or simply tag it as AI generated. Now authorities will have to sift through all of it trying to decipher what’s artifical and what isn’t.
The liklihood of them being able to identify, trace, and convict child abusers will become even more difficult as more and more of that material is generated and uploaded to various sites with real CSAM mixed in.
Even with hyper realistic paintings you can still tell it’s a painting. Anime loli stuff can never be mistaken for real CSAM. Do I find that sort of art distasteful? Yep. But it’s not creating an environment where real abusers can distribute CSAM and have a higher possibility of getting away with it.
I guess my question is, why would anyone continue to “consume” – or create – real csam? If fake and real are both illegal, but one involves minimal risk and 0 children, the only reason to create real csam is for the cruelty – and while I’m sure there’s a market for that, it’s got to be a much smaller market. My guess is the vast majority of “consumers” of this content would opt for the fake stuff if it took some of the risk off the table.
I can’t imagine a world where we didn’t ban ai generated csam, like, imagine being a politician and explaining that policy to your constituents. It’s just not happening. And i get the core point of that kind of legislation – the whole concept of csam needs the aura of prosecution to keep it from being normalized – and normalization would embolden worse crimes. But imagine if ai made real csam too much trouble to produce.
AI generated csam could put real csam out of business. If possession of fake csam had a lesser penalty than the real thing, the real stuff would be much harder to share, much less monetize. I don’t think we have the data to confirm this but my guess is that most pedophiles aren’t sociopaths and recognize their desires are wrong, and if you gave them a way to deal with it that didn’t actually hurt chicken, that would be huge. And you could seriously throw the book at anyone still going after the real thing when ai content exists.
Obviously that was supposed to be children not chicken but my phone preferred chicken and I’m leaving it.
I try to think about it this way. Simulated rape porn exists, and yet terrible people still upload actual recordings of rapes to porn sites. And despite the copious amounts of the fake stuff available all over the internet… rape statistics haven’t gone down and there’s still sexual assaults happening.
I don’t think porn causes rape btw, but I don’t think it prevents it either. It’s the same with CSAM.
Criminally horrible people are going to be horrible.
So long as the generation is without actual model examples that are actual minors there’s nothing technically illegal about having sexual material of what appears to be a child. They would then have a mens rea question and a content question, what actual defines in a visual sense a child? Could those same things equally define a person of smaller stature? And finally could someone like tiny texie be charged for producing csam as she by all appearance or of context looks to be a child.
The problem is that the only way to train an AI model is on real images, so the model can’t exist without crimes and suffering having been committed.
This isn’t true. AI can generate tan people if you show them the color tan and a pale person – or green people or purple people. That’s all ai does, whether it’s image or text generation – it can create things it hasn’t seen by smooshing together things it has seen.
And this is proven by reality: ai CAN generate csam, but it’s trained on that huge image database, which is constantly scanned for illegal content.
Real images that don’t have to be of csam but rather of children, it could theoretically train anything sexual with legal sexual content and let the ai connect the dots.
It is illegal in Canada to have sexual depictions of a child whether its a real image or you’ve just sat down and drawn it yourself. The rationale being that behavior escalated, and looking at images goes to wanting more
It borders on thought crime which I feel kind of high about but only pedophiles suffer which I feel great about. There’s no legitimate reason to have sexualized image of a child whether computer geneerate, hand drawn, or whatever.
This article isn’t about Canada homeboy.
Also that theory is not provable and never will be, morality crime is thought crime and thought crime is horseshit. We criminalize criminal acts not criminal thoughts.
Similarly, you didn’t actually offer a counterpoint to any of my points.
It’s not a difficult test. If a person can’t reasonably distinguish it from an actual child, then it’s CSAM.
Just to play devil’s advocate:
What about hentai where little girls get fondled by tentacles? (Please please please don’t make this be my most up voted post)
(Please please please don’t make this be my most up voted post)
I can downvote to prevent that if you like
Yeah, no. The commenter has stated actual child, not cartoon one. It is a different discussion entirely, and a good one too. Because artwork is a part of freedom of expression. An artwork CAN be made without hurting anyone or abusing anyone. We fully know that a human has creative capabilities to come up with something without having those actual something exist beforehand. It implies that humans can come up with CSAM without ever having seen a CSAM.
And yet, it is still actually illegal I’m every state. CSAM of any kind in any medium is legally identical. Hand drawn stick figures with ages written under them is enough for some judges/prosecutors.
Honestly, I am of the firm belief that the FBI should set up a portal that provides user account bound access to their seized materials. This may seem extreme and abhorrent, but it provides MANY benefits.
- They are able to eliminate the black market for it by providing free, legal access to already existing materials, no more children will be harmed in the production of “new materials”.
- They can mandate that accounts are only able to be made by those actively pursuing mental health treatments for their mental illness. It is a mental illness long before it is a crime.
- They are able to monitor who is accessing and from where, and are able to coordinate efforts with mental health providers to give better treatment.
- They can compile statistical data on the prevailing patterns of access to get a better analytical understanding of how those with the mental illness behave so they can better police those who still utilize extra-legal avenues.
Always keep in mind that this is a mental illness. Often times it is rooted in the person’s own traumatic past. Many were themselves victims of sexual abuse as children and are as much victims as the children they abuse. I am not, in ANY way, absolving them of the harm that they have done and they absolutely should repent for it. What I am attempting to articulate is that we need to, as a society, avoid vilifying them into boogy-people so we can justify hate and violence. They are people, they are mentally ill, they can be treated, and they can be healthy. It is no different than something like BPD, Malignant Narcissism, or Munchausen by Proxy. All can do real harm, all should face consequences of their harm, but those three are all so normalized at this point that unless the abuse results in death, most people will handwave the actions and push for treatment. Now I feel we have gotten too lax on these (and others) and are far too harsh on others. All mental illnesses deserve ardent and effective treatment.
Nay, I just replied to you in the context of the commenter. The other commenter stated about real life children so your point about hentai is irrelevant to him. I do know the legal definition of CSAM is the end result and not the act. And hence, why I stated that yours is a different discussion entirely.
Edit: Sorry I read it again and I think I didn’t get my point across very well. I think your point about artwork falls into the debate about the definition of CSAM. Why? Because the word abuse implies an abusive act is being done. But the current definition states that what matters is the end result only. This poses a problem in my opinion because it slightly touch your freedom of expression. By the current definition, art has its limit
Yeah but then it gets very messy and complicated fast. What about photo perfect AI pornography of minors? When and where do you draw the line?
What he probably means is that for a “photo”, an actual act of photography must be performed. While “artwork” can be fully digital. Now, legal definition aside, the two acts are indeed different even if the resulting “image” is a bit-by-bit equivalent. A computer could just output something akin to a photograph but no actual act of photography has taken place. I said the legal definition aside because I know the legal definition only looks at the resulting image. Just trying to convey the commenter words better.
Edit to clarify a few things.
This would also outlaw “teen” porn as they are explicitly trying to look more childlike as well as models that only appear to be minors.
I get the reason people think it’s a good thing but all censorship has to be narrowly tailored to content lest it be too vague or overly broad.
And nothing was lost…
But in seriousness, as you said they are models who are in the industry, verified, etc. It’s not impossible to have a white-list of actors, and if anything there should be more scrutiny on the unknown “actresses” portraying teenagers…
Except jobs dude, you may not like their work but it’s work. That law ignores verified age, that’s a not insignificant part of my point…
I hate the future
that was just the present
I find it interesting that the relabeling of CP to CSAM weakens their argument here. “CP generated by AI is still CP” makes sense, but if there’s no abusee, it’s just CSM. Makes me wonder if they would have not rebranded if they knew about the proliferation of AI pornography.
The problem is that it abets the distribution of legitimate CSAM more easily. If a government declares “these types of images are okay if they’re fake”, you’ve given probable deniability to real CSAM distributors who can now claim that the material is AI generated, placing the burden on the legal system to prove it to the contrary. The end result will be a lot of real material flying under the radar because of weak evidence, and continued abuse of children.
Better to just blanket ban the entire concept and save us all the trouble, in my opinion. Back before it was so easy to generate photorealistic images, it was easier to overlook victimless CP because illustrations are easy to tell apart from reality, but times have changed, and so should the laws.
Not necessarily. There’s been a lot of advances in watermarking AI outputs.
As well, there’s the opposite argument.
Right now, pedophile rings have very high price points to access CSAM or require users to upload original CSAM content, adding a significant motivator to actually harm children.
The same way rule 34 artists were very upset with AI being able to create what they were getting commissions to create, AI generated CSAM would be a significant dilution of the market.
Is the average user really going to risk prison, pay a huge amount of money or harm a child with an even greater prison risk when effectively identical material is available for free?
Pretty much overnight the CSAM dark markets would lose the vast majority of their market value and the only remaining offerings would be ones that could demonstrate they weren’t artificial to justify the higher price point, which would undermine the notion of plausible deniability.
Legalization of AI generated CSAM would decimate the existing CSAM markets.
That said, the real question that needs to be answered from a social responsibility perspective is what the net effect of CSAM access by pedophiles has on their proclivity to offend. If there’s a negative effect then it’s an open and shut case that it should be legalized. If it’s a positive effect than we should probably keep it very much illegal, even if that continues to enable dark markets for the real thing.
Not necessarily. There’s been a lot of advances in watermarking AI outputs.
That presumes that the image generation is being done by some corporation or government entity that adds the watermarks to AI outputs and doesn’t add them to non-AI outputs. I’m not thrilled that AI of this sort exists at all, but given that it does, I’d rather not have it controlled by such entities. We’re heading towards a world where we can all run that stuff on our own computers and control the watermarks ourselves. Is that good or bad? Probably bad, but having it under the exclusive control of megacorps has to be even worse.
How about any photo realistic image without a watermark is illegal? And the watermark kind of has to be traced back to author so you can’t just add it to real CP?
If you can generate the watermarks, you can put them on non-AI images.
Well the watermark would be a kind of signature that leads back to a registered artist.
I think it makes sense to enforce this for all AI art, basically label it in a way that can be traced back to who produced it.
And if you don’t want people to know you produced it, then you probably shouldn’t share it
Sorry but the concept of a “registered artist” sounds dystopian.
Is the average user really going to risk prison, pay a huge amount of money or harm a child with an even greater prison risk when effectively identical material is available for free?
Average users aren’t pedophiles and it would appear that yes they would considering he did exactly that. He had access to tools that generated the material for free, which he then used to entice boys.
I agree, just the linguistics are interesting.
Better to just blanket ban the entire concept and save us all the trouble, in my opinion.
That’s the issue though, blindly banning things that can be victimless crimes never ends, like prohibition.
Well, you don’t hear many people decrying the places that already have. Canada many US states, parts of Europe too have outlawed sexual imagery of children, real or fake.
I am just proposing that that should be the standard approach going forward, for the sole fact that the fake stuff is identical to the real stuff and real stuff can be used to make more convincing “fake” stuff.
Isn’t Canada’s law based on age and not if they “look like children”, so all they have to say is that the subject isn’t human and is over 18 years of age?
My entire point was that things like this become a game of wack o mole.
I don’t think that’s a good standard, reminds me of 0 tolerance policies and war on drugs.
placing the burden on the legal system to prove it to the contrary.
That’s how it should be. Everyone is innocent until proven otherwise.
Right, but what I am suggesting is that laws should be worded to criminalize any sexualized depiction of children, not just ones with a real victim. It is no longer as simple to prove a photograph or video is actual CSAM with a real victim, making it easier for real abuse to avoid detection.
This same “think about the children” -argument is used when advocating for stuff such as banning encryption aswell which in it’s current form enables the easy spreading of such content AI generated or not. I do not agree with that. It’s a slippery slope despite the good intentions. We’re not criminalizing fictional depictions of violence either. I don’t see how this is any different. I don’t care what people are jerking off to as long as they’re not hurting anyone and I don’t think you should either. Banning it haven’t gotten rid of actual CSAM content and it sure wont work for AI generated stuff either. No one benefits from the police running after people creating/sharing fictional content.
I think you’re painting a false equivalency. This isn’t about surveillance or incitement or any other pre-crime hypotheticals, but simply adjusting what material is considered infringing in light of new developments which can prevent justice from being carried out on actual cases of abuse.
How do you prove what is fictional versus what is real? Unless there is some way to determine with near 100% certainty that a given image or video is AI generated and not real, or even that an AI generated image wasn’t trained on real images of abuse, you invite scenarios where real images of abuse get passed off as “fictional content” and make it easier for predators to victimize more children.
Have to agree. Because I have no clue what CSAM is. My first glance at the title made me think it was CSPAN (the TV channel)… So CP is better identifier, as of at least recognize the initialism.
If we could stop turning everything, and especially important things, into acronyms and initialisms that’d be great.
Who’s even in charge of that lol
A generative AI could not generate CSAM without access to CSAM training data. Abuse was a necessary step in the generation.
oh man, i love the future, we havent solved world hunger, or reduce carbon emissions to 0, and we are on the brink of a world war, but now we have AI’s that can generate CSAM and fake footage on the fly 💀
Technically we’ve solved world hunger. We’ve just not fixed it, as the greedy fucks who hoard most of the resources of this world don’t see immediate capital gains from just helping people.
Pretty much the only real problem is billionaires being in control.
True that. We have the means to fix so many problems, we just have a very very very small few that reeeeally don’t like to do anything good with their money, and instead choose to hoard it, at the expense of everyone else.
Oh cmon they don’t hoard the money. They use it to pay each other/politicians to make sure the status quo remains.
They hoard rights and powers, usually. The right to control property and capital far in excess of reasonable private comfort, the right to a share of a company’s profit for using that property and capital, the right to influence its course and all the powers deriving from that.
Honestly not as bad as I would have thought it would be by now with fake propaganda videos, but the quality isn’t there yet I suppose.
Can’t generate Abuse Material without Abuse. Generative AI does not need any indecent training to be able to produce indecent merial.
But it is a nice story to shock and scare many people so i guess the goal is reached.
Quick things to note.
One, yes, some models were trained on CSAM. In AI you’ll have checkpoints in a model. As a model learns new things, you have a new checkpoint. SD1.5 was the base model used in this. SD1.5 itself was not trained on any CSAM, but people have giving additional training to SD1.5 to create new checkpoints that have CSAM baked in. Likely, this is what this person was using.
Two, yes, you can get something out of a model that was never in the model to begin with. It’s complicated, but a way to think about it is, a program draws raw pixels to the screen. Your GPU applies some math to smooth that out. That math adds additional information that the program never distinctly pushed to your screen.
Models have tensors which long story short, is a way to express an average way pixels should land to arrive at some object. This is why you see six fingered people in AI art. There wasn’t any six fingered person fed into the model, what you are seeing the averaging of weights pushing pixels between two different relationships for the word “hand”. That averaging is adding new information in the expression of an additional finger.
I won’t deep dive into the maths of it. But there’s ways to coax new ways to average weights to arrive at new outcomes. The training part is what tells the relationship between A and C to be B’. But if we wanted D’ as the outcome, we could retrain the model to have C and E averaging OR we could use things call LoRAs to change the low order ranking of B’ to D’. This doesn’t require us to retrain the model, we are just providing guidance on ways to average things that the model has already seen. Retraining on C and E to D’ is the part old models and checkpoints used to go and that requires a lot of images to retrain that. Taking the outcome B’ and putting a thumb on the scale to put it to D’ is an easier route, that just requires a generalized teaching of how to skew the weights and is much easier.
I know this is massively summarizing things and yeah I get it, it’s a bit hard to conceptualize how we can go from something like MSAA to generating CSAM. And yeah, I’m skipping over a lot of steps here. But at the end of the day, those tensors are just numbers that tell the program how to push pixels around given a word. You can maths those numbers to give results that the numbers weren’t originally arranged to do in the first place. AI models are not databases, they aren’t recalling pixel for pixel images they’ve seen before, they’re averaging out averages of averages.
I think this case will be slam dunk because highly likely this person’s model was an SD1.5 checkpoint that was trained on very bad things. But with the advent of being able to change how averages themselves and not the source tensors in the model work, you can teach new ways for a model to average weights to obtain results the model didn’t originally have, without any kind of source material to train the model. It’s like the difference between Spatial antialiasing and MSAA.
Shouldn’t the company’s who have the CSAM face consequences for possession of it? Seems like a double standard.
The government should be shutting down the source material.
In the eyes of the law, intent does matter, as well as how it’s responded to.
For csam material, you have to knowingly possess it or have sought to possess it.The AI companies use a project that indexes everything on the Internet, like Google, but with publicly available free output.
They use this data via another project, https://laion.ai/ , which uses the data to find images with descriptions attached, do some tricks to validate that the descriptions make sense, and then publish a list of “location of the image, description of the image” pairs.
The AI companies use that list to grab the images train an AI on them in conjunction with the description.
So, people at Stanford were doing research on the laion dataset when they found the instances of csam. The laion project pulled their datasets from being available while things were checked and new safeguards put in place.
The AI companies also pulled their models (if public) while the images were removed from the data set and new safeguards implemented.
Most of the csam images in the dataset were already gone by the time the AI companies would have attempted to access them, but some were not.A very obvious lack of intent to acquire the material, in fact a lack of awareness the material was possessed at all, transparency in response, taking steps to prevent further distribution, and taking action to prevent it from happening again both provides a defensive against accusations, and will make anyone interested less likely to want to make those accusations.
On the other hand, the people who generated the images were knowingly doing so, which is a nono.
They wouldn’t be able to generate it had there been none in the training data, so I assume the labelling and verification systems you talk about aren’t very good.
That’s not accurate. The systems are designed to generate previously unseen concepts or images by combining known concepts.
It’s why it can give you an image of a pony using a hangglider, despite never having seen that. It knows what ponies look like, and it knows what hanggliding looks like, so it can find a way to put both into the image. Where it doesn’t know, it will make stuff up from what it does know, often requiring potentially very detailed user explanation to describe how a horse would fit in a hangglider, or that it shouldn’t have a little person sticking out of it’s back.
I think it would just create adults naked with children’s faces unless it actually had CSAM… Which it probably does have.
Again, that’s not how it works.
Could you hypothetically describe csam without describing an adult with a child’s head, or specifying that it’s a naked child?
That’s what a person trying to generate csam would need to do, because it doesn’t have those concepts.
If you just asked it directly, like I said “horse flying a hangglider” before, you would get what you describe because it’s using the only “naked” it knows.
You would need to specifically ask it to demphasize adult characteristics and emphasize child characteristics.That doesn’t mean that it was trained on that content.
For context from the article:
The DOJ alleged that evidence from his laptop showed that Anderegg “used extremely specific and explicit prompts to create these images,” including “specific ‘negative’ prompts—that is, prompts that direct the GenAI model on what not to include in generated content—to avoid creating images that depict adults.”
Removed by mod
Good
The cats out of the bag on this. It’s enforceable for now to try and ban it, maybe. Because the models are mostly online and intensive.
In 2028 though, when you can train your own model and generate your own local images without burning a server farm? This has to happen for ML to keep growing and catch on.
welp. Then there is infinite fake child porn. Because you cannot police every device and model.
Because of how tech companies have handled this technology, this is not an if scenario. This is guaranteed now.
Uhhh these types of images kinda already require local models…
I remember when they tried to do the same with CRISPR. Glad that didn’t take off and remained largely limited to the industry and academia. But then again, Wuhan …
I wanna know if this applies to copyrighted content as well. For example, if by any chance a whole ass book was outputted by a LLM, does the output retain the original copyright?
If it completely rewrites a book whose copyright is owned by a large corporation or publishing company in the US, they’ll probably take whatever company respond for it if it’s a public LLM behind the shed and shoot them to death with legal battles. So, I’m gonna assume yes.
I sure hope so. It is important because otherwise copyright will mean jackshit.
*Rant I truly hope politicians spend their time on more pressing issues than squabbling among themselves. Climate change, technological advancement that outpaces our legal framework, consumer protection. So much shit to do.
Is CSAM a commonly used term? I had to look it up.
It is
I read that its more accurate to say “child sexual abuse material” than child porn because it carries the message of just how bad the stuff is better than just calling it porn and it sounds more professional
And I suppose it’s also saying that the form it’s in doesn’t matter. Any type of material is the same.
It is amazing how Lemmy can usually be such a well informed audience but for some reason when it comes to AI people simply refuse to acknowledge that it was trained on CSAM https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
And don’t understand how generative AI combines existing concepts to synthesize images - it doesn’t have the ability to create novel concepts.
AI models don’t resynthesize their training data. They use their training data to determine parameters which enable them to predict a response to an input.
Consider a simple model (too simple to be called AI but really the underlying concepts are very similar) - a linear regression. In linear regression we produce a model which follows a straight line through the “middle” of our training data. We can then use this to predict values outside the range of the original data - albeit will less certainty about the likely error.
In the same way, an LLM can give answers to questions that were never asked in its training data - it’s not taking that data and shuffling it around, it’s synthesising an answer by predicting tokens. Also similarly, it does this less well the further outside the training data you go. Feed them the right gibberish and it doesn’t know how to respond. ChatGPT is very good at dealing with nonsense, but if you’ve ever worked with simpler LLMs you’ll know that typos can throw them off notably… They still respond OK, but things get weirder as they go.
Now it’s certainly true that (at least some) models were trained on CSAM, but it’s also definitely possible that a model that wasn’t could still produce sexual content featuring children. It’s training set need only contain enough disparate elements for it to correctly predict what the prompt is asking for. For example, if the training set contained images of children it will “know” what children look like, and if it contains pornography it will “know” what pornography looks like - conceivably it could mix these two together to produce generated CSAM. It will probably look odd, if I had to guess? Like LLMs struggling with typos, and regression models being unreliable outside their training range, image generation of something totally outside the training set is going to be a bit weird, but it will still work.
None of this is to defend generating AI CSAM, to be clear, just to say that it is possible to generate things that a model hasn’t “seen”.
Okay for anyone who might be confused on how a model that’s not been trained on something can come up with something it wasn’t trained for, a rough example of this is antialiasing.
In the simplest of terms antialiasing looks at a vector over a particular grid, sees what percentage it is covering, and then applies that percentage to to shade the image and reduce the jaggies.
There’s no information to do this in the vector itself, it’s the math that is what is giving the extra information. We’re creating information from a source that did not originally have it. Now, yeah this is really simple approach and it might have you go “well technically we didn’t create any new information”.
At the end of the day, a tensor is a bunch of numbers that give weights to how pixels should arrange themselves on the canvas. We have weights that show us how to fall pixels to an adult. We have weights that show us how to fall pixels to children. We have weights that show us how to fall pixels to a nude adult. There’s ways to adapt the lower order ranking of weights to find new approximations. I mean, that’s literally what LoRAs do. I mean that’s literally their name, Low-Rank Adaptation. As you train on this new novel approach, you can wrap that into a textual inversion. That’s what that does, it allows an ontological approach to particular weights within a model.
Another way to think of this. Six finger people in AI art. I assure you that no model was fed six fingered subjects, so where do they come from? The answer is that the six finger person is a complex “averaging” of the tensors that make up the model’s weights. We’re getting new information where there originally was none.
We have to remember that these models ARE NOT databases. They are just multidimensional weights that tell pixels from a random seed where to go to in the next step in the diffusion process. If you text2image “hand” then there’s a set of weights that push pixels around to form the average value of a hand. What it settles into could be a four fingered hand, five fingers, or six fingers, depends on the seed and how hard the diffuser should follow the guidance scale for that particular prompt’s weight. But it’s distinctly not recalling pixel for pixel some image it has seen earlier. It just has a bunch of averages of where pixels should go if someone says hand.
You can generate something new from the average of complex tensors. You can put your thumb on the scale for some of those weights, give new maths to find new averages, and then when it’s getting close to the target you’re after use a textual inversion to give a label to this “new” average you’ve discovered in the weights.
Antialiasing doesn’t feel like new information is being added, but it is. That’s how we can take the actual pixels being pushed out by a program and turn it into a smooth line that the program did not distinctly produce. I get that it feels like a stretch to go from antialiasing to generating completely novel information. But it’s just numbers driving where pixels get moved to, it’s maths, there’s not really a lot of magic in these things. And given enough energy, anyone can push numbers to do things they weren’t supposed to do in the first place.
The way models that come from folks who need their models to be on the up and up is to ensure that particular averages don’t happen. Like say we want to avoid outcome B’, but you can average A and C to arrive at B’. Then what you need is to add a negative weight to the formula. This is basically training A and C to average to something like R’ that’s really far from the point that we want to avoid. But like any number, if we know the outcome is R’ for an average of A and C, we can add low rank weights that don’t require new layers within the model. We can just say, anything with R’ needs -P’ weight, now because of averages we could land on C’ but we could also land on A’ or B’ our target. We don’t need to recalculate the approximation of the weights that A and C give R’ within the model.
Not all models use the same training sets, and not all future models would either.
Generating images of humans of different ages doesn’t require having images of that type for humans of all ages.
Like, no one is arguing your link. Some models definitely used training data with that, but your claim that the type of image discussed is “novel” simply isn’t accurate to how these models can combine concepts
And don’t understand how generative AI combines existing concepts to synthesize images - it doesn’t have the ability to create novel concepts.
Imagine someone asks you to shoop up some pr0n showing Donald Duck and Darth Vader. You’ve probably never seen that combination in your “training set” (past experience) but it doesn’t exactly take creating novel concepts to fulfill the request. It’s just combining existing ones. Web search on “how stable diffusion works” finds some promising looking articles. I read one a while back and found it understandable. Stable Diffusion was the first of these synthesis programs but the newer ones are just bigger and fancier versions of the same thing.
Of course idk what the big models out there are actually trained on (basically everything they can get, probably not checked too carefully) but just because some combination can be generated in the output doesn’t mean it must have existed in the input. You can test that yourself easily enough, by giving weird and random enough queries.
No, you’re quite right that the combination didn’t need to exist in the input for an output to be generated - this shit is so interesting because you can throw stuff like “A medieval castle but with Iranian architecture with a samurai standing on the ramparts” at it and get something neat out. I’ve leveraged AI image generation for visual D&D references and it’s excellent at combining comprehended concepts… but it can’t innovate a new thing - it excels at mixing things but it isn’t creative or novel. So I don’t disagree with anything you’ve said - but I’d reaffirm that it currently can make CSAM because it’s trained on CSAM and, in my opinion, it would be unable to generate CSAM (at least to the quality level that would decrease demand for CSAM among pedos) without having CSAM in the training set.
it currently can make CSAM because it’s trained on CSAM
That is a non sequitur. I don’t see any reason to believe such a cause and effect relationship. The claim is at least falsifiable in principle though. Remove whatever CSAM found its way into the training set, re-run the training to make a new model, and put the same queries in again. I think you are saying that the new model should be incapable of producing CSAM images, but I’m extremely skeptical, as your medieval castle example shows. If you’re now saying the quality of the images might be subtly different, that’s the no true Scotsman fallacy and I’m not impressed. Synthetic images in general look impressive but not exactly real. So I have no idea how realistic the stuff this person was arrested for was.
I think there are two arguments going on here, though
- It doesn’t need to be trained on that data to produce it
- It was actually trained on that data.
Most people arguing point 1 would be willing concede point 2, especially since you linked evidence of it.
I think it’s impossible to produce CSAM without training data of CSAM (though this is just an opinion). Young people don’t look like adults when naked so I don’t think there’s anyway an AI would hallucinate CSAM without some examples to train on.
In this hypothetical, the AI would be trained on fully clothed adults and children. As well as what many of those same adults look like unclothed. It might not get things completely right on its initial attempt, but with some minor prompting it should be able to get pretty close. That said, the AI will know the correct head size proportions from just the clothed datasets. It could probably even infer limb proportions from the clothed datasets as well.
It could definitely get head and limb proportions correct, but there are some pretty basic changes that happen with puberty that the AI would not be able to reverse engineer.
There are legit, non-CSAM types of images that would still make these changes apparent, though. Not every picture of a naked child is CSAM. Family photos from the beach, photos in biology textbooks, even comic-style illustrated children’s books will allow inferences about what real humans look like. So no, I don’t think that an image generation model has to be trained on any CSAM in order to be able to produce convincing CSAM.
This is a fair point - if we allow a model to be trained on non-sexualizing minor nudity it likely could sexualize those models without actually requiring sexualized minors to do so. I’m still not certain if that’s a good thing, but I do agree with you.
Yeah, it certainly still feels icky, especially since a lot of those materials in all likelihood will still have ended up in the model without the original photo subjects knowing about it or consenting. But that’s at least much better than having a model straight up trained on CSAM, and at least hypothetically, there is a way to make this process entirely “clean”.
This is the part of the conversation where I have to admit that you could be right, but I don’t know enough to say one way or the other. And since I have no plans to become a pediatrician, I don’t intend to go find out.
it was trained on CSAM
In that case, why haven’t the people who made the AI models been arrested?
Dunno, probably because they didn’t knowingly train it on CSAM - maybe because it’s difficult to prove what actually goes into neural network configuration so it’s unclear how strongly weighted it is… and lastly, maybe because this stuff is so cloaked in obscurity and proprietaryness that nobody is confident how such a case would go.
Then we should be able to charge AI (the developers moreso) for the same disgusting crime, and shut AI down.
Camera-makers, too. And people who make pencils. Lock the whole lot up, the sickos.
Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
That’s not how generative AI works. It’s capable of creating images that include novel elements that weren’t in the training set.
Go ahead and ask one to generate a bonkers image description that doesn’t exist in its training data and there’s a good chance it’ll be able to make one for you. The classic example is an “avocado chair”, which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
The trainers didn’t train the image generator on images of Mr. Bean hugging Pennywise, and yet it’s able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can’t generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
The trainers taught it what Mr. Bean looks like and what Pennywise looks like - it took those concepts and combined them to create your image. To make CSAM it was, unfortunately, trained on CSAM https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
So where’s the blame go?
First, you need to figure out exactly what it is that the “blame” is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there’s no blame to begin with.
If the problem is possession of CSAM, then that’s on the guy who generated them since they didn’t exist at any point before then. The trainers wouldn’t have needed to have any of that in the training set so if you want to blame them you’re going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn’t prove anything.
If the problem is the creation of CSAM, then again, it’s the guy who generated them.
If it’s the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
…no
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
Not exactly. This would be more akin to a company that will 3D printer metal parts and assemble them for you. You use this service and have them create and assemble a gun for you. Then you use that weapon in a violent crime. Should the company have known better that you were having them create an illegal weapon on your behalf?
The person who was charged was using Stable Diffusion to generate the images on their own computer, entirely with their own resources. So it’s akin to a company that sells 3D printers selling a printer to someone, who then uses it to build a gun.
Sadly that’s what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.
AI image generation shouldn’t be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.
Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.
It would be more like outlawing ivory grand pianos because they require dead elephants to make - the AI models under question here were trained on abuse.
A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used it to generate CSAM).
That has nothing to do with the developer of the AI, and everything to do with the person using it. (hence the arrest…)
I stand by my analogy.
Unfortunately the developer trained it on some CSAM which I think means they’re not free of guilt - we really need to rebuild these models from the ground up to be free of that taint.
Reading that article:
Given it’s public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn’t consider that their fault either.
I think it’s reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn’t be on users (here meaning the devs of Stable Diffusion) of that data to ensure there’s no illegal content within the billions of images in a public dataset.
That’s a different story now that users have been informed of the content within this particular data, but I don’t think it should have been assumed to be their responsibility from the beginning.
Sounds to me it would be more like outlawing grand pianos because of all of the dead elephants - while some people are claiming that it is possible to make a grand piano without killing elephants.
There’s CSAM in the training set[1] used for these models so some elephants have been murdered to make this piano.
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
I know. So to confirm, you’re saying that you’re okay with AI generated CSAM as long as the training data for the model didn’t include any CSAM?
No, I’m not - I still have ethical objections and I don’t believe CSAM could be generated without some CSAM in the training set. I think it’s generally problematic to sexually fantasize about underage persons though I know that’s an extremely unpopular opinion here.
So why are you posting all over this thread about how CSAM was included in the training set if that is in your opinion ultimately irrelevant with regards to the topic of the post and discussion, the morality of using AI to generate CSAM?
That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Then that settles it. It’s whoever allows bad data into the training data.
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Yes. Because they did (not intentionally though)
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
I think that’s a bit of a stretch. If it was being marketed as “make your fantasy, no matter how illegal it is,” then yeah. But just because I use a tool someone else made doesn’t mean they should be held liable.
Check my other comments. My thought was compared to a hammer.
Hammers aren’t trained to act or respond on their own from millions of user inputs.
Image AIs also don’t act or respond on their own. You have to prompt them.
And if I prompted AI for something inappropriate, and it gave me a relevant image, then that means the AI had inappropriate material in it’s training data.
No, you keep repeating this but it remains untrue no matter how many times you say it. An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
I just want to clarity that you’ve bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner… but it can never create anything novel.
What it’s able and intended to do is besides the point, if it’s also capable of generating inappropriate material.
Let me spell it more clearly. AI wouldn’t know what a pussy looked like if it was never exposed to that sort of data set. It wouldn’t know other inappropriate things if it wasn’t exposed to that data set either.
Do you see where I’m going with this? AI only knows what people allow it to learn…
You realize that there are perfectly legal photographs of female genitals out there? I’ve heard it’s actually a rather popular photography subject on the Internet.
Do you see where I’m going with this? AI only knows what people allow it to learn…
Yes, but the point here is that the AI doesn’t need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it’s capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.
As others have analogized in this thread, if you murder someone with a hammer that doesn’t make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It’s how you used it that is illegal.
I learned how to write by reading. The AI did the same, more or less, no?
The AI didn’t learn to draw or generate photos from blind words though…
Oh, it learned from art? Like how human artists learn?
AI hasn’t exactly kicked out a Picasso with a naked young girl missing an ear yet has it?
I sure hope not!
But if it can, then that seriously indicates it must have some bad training data in the system…
I won’t be testing these hypotheses.
It in fact does have bad training data! https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
Can we do guns next?
I’d rather not fart bullets, but thank you for inviting me to the party.
I’m not sure why you’re picking this situation for an anti-AI rant. Of course there are a lot of ways that large companies will try to use AI that will harm society. But this is a situation where we already have laws on the books to lock up the people who are specifically doing terrible things. Good.
If you want to try to stand up and tell us about how AI is going to damage society, pick an area where people are using it legally and show us the harms there. Find something that’s legal but immoral and unethical, and then you’ll get a lot of support.
Totally dismissing inappropriate usage, AI can be funny and entertaining, but on the flip side it’s also taking people’s jobs.
It shouldn’t take a book, let alone 3 seconds of common sense thought, to realize that.