Generative models does not work like that, if it were so, how do you explain that I can generate a picture of a purple six legged cat throwing lasers from the eyes in space?
In a very very very simplified way, the models are trained that from noise it de noises it until the image is “restored”. A part of the model learns to remove noise until a drawing of a child is restored, another learns to restore the image of a drawing of a nude woman. Basically you say to the model that from noise it has to restore the drawing of a nude child it combines the two proceses (also it is trained to combine things in a way that makes sense).
This, when I decided to join Mastodon I was prompted to choose a server and had to research which one should join and understand how it works.
It is called UX friction and is well studied in sign up and checkout processes, the more steps the user has to perform the more likely it abandons it.