And how much do they cost? And how do you like them?
All mine are with Bing, it’s free but you only get 15 full speed image generations a day. After that they take a few minutes per prompt.
I dislike that it has restrictions so I can’t make the fully unhinged pictures I would like 😂
Yeah, that is what I use too, but the limitations kind of suck.
I need a Nic Cage Wolf man picture in my life, but Bing blocks celebrity prompts.
I use stable diffusion with automatic1111’s webui ran locally with an AMD GPU. I use the card for gaming and encoding too, so the cost for just AI is basically free. The webui is excellent, and I learn about new things it can do every time I use it. Setting it up took some time, but nothing beyond what I am familiar with. I do loathe that so much data science/AI stuff is python based, because python’s dependency management is an unruly beast, but oh well.
because python’s dependency management is an unruly beast
Note that Automatic1111 is, by default, set up to run in a venv – a sort of little isolated Python install – so it won’t smack into the system packages, at any rate.
I think that the current version of Automatic1111 – I’m running off the dev branch – also pulled down the appropriate ROCm pytorch that AMD wants into its little venv, but I’m pretty sure that I recall needing to manually install that some months back, on an older version of Automatic1111. Other than that, I don’t think that I had to do anything significant with Python packages; there’s just a script that one runs to launch the package, and it also automatically downloads anything it needs the first time.
by default, set up to run in a venv
It does, but since I’m running inside a container, I disable that behavior, and run it as a user package. Some extensions also require additional libraries, but they don’t pull the correct ROCm dependencies and I have to modify part of the install scripts to manually define the correct versions.
The main webui code is excellent, even if sometimes the documention is out of step because of how fast everything moves. Its the extensions that are not always to the same level of quality that make fiddling with python dependencies a bit of extra work.
You use it on Linux? I have used it in Windows (6700xt) and it is slow af (2 it/s or even in s/it range), apparently it should be a lot faster in Linux but haven’t tested it.
I run it in a container on a NixOS host yes, eventually I’ll learn how to do it in a flake but my nix skills aren’t quite there yet. EDIT: I use a 6900xt, and some quick runs I did give me roughly 10 it/s. Which feels reasonably fast, only a couple seconds per image.
I recognize that those are words and numbers…
No worries, I’ll link to some Arch Wiki stuff to help explain. Containers are a very cool system for isolating environments. Similar to how python uses VENV to contain all the dependencies for a python program, containers let you have a full environment beyond just the python stuff. I use podman to actually run the container on my computer. You use a Containerfile, to define what you want this environment to look like, and docker/podman does all the hard work for you, by making an image file that holds the whole thing in one place separate from our real OS.
This is my start script.
#!/usr/bin/env bash podman run -it --rm --name stablediff2 -p 7860:7860 \ -e COMMANDLINE_ARGS="--api --listen --port 7860 --enable-insecure-extension-access --medvram-sdxl --cors-allow-origins *" \ --device /dev/dri:dev/dri \ --device /dev/kfd:/dev/kfd \ -v ./models:/dockerx/stable-diffusion-webui/models:z \ -v ./repos:/dockerx/stable-diffusion-webui/repositories:z \ -v ./extensions:/dockerx/stable-diffusion-webui/extensions:z \ -v ./embeddings:/dockerx/stable-diffusion-webui/embeddings:z \ -v ./outputs:/dockerx/stable-diffusion-webui/outputs:z \ -v ./inputfiles:/dockerx/stable-diffusion-webui/inputfiles:z localhost:stablediffusion:latest
This is just telling podman to start the container, give it an actual terminal to connect to, remove the container if it stops running, give it a name, and tell it what ports it can run on.
podman run -it --rm --name stablediff2 -p 7860:7860
These are the arguments passed to the webui start script itself, mostly for my own convenience. The medvram-sdxl is not required, since my card has enough vram, but then I can’t be doing anything else with it. So I sacrifice a bit of generation speed for more free memory for the rest of my computer. I’m running this locally, so insecure extension access also doesn’t matter since I’m the only one using this, just makes installing extensions from the webui directly.
-e COMMANDLINE_ARGS="--api --listen --port 7860 --enable-insecure-extension-access --medvram-sdxl --cors-allow-origins *" \
These are just the device files that correspond to my GPU, so that the container has access to it. Without this, the container would only have access to CPU based generation. Everything else is just the folders that holds my models, extensions etc. You have to give the container exactly what you want it to, because its isolated away from your normal files unless you tell it otherwise.
--device /dev/dri:dev/dri \ --device /dev/kfd:/dev/kfd \
This is iterations per second, I believe. It’s basically a measure of how fast stablediffusion is is running a particular generation of an image. It lets people compare performance across different software and hardware configurations.
10 it/s
NixOS is the name of the GNU/Linux operating system I’m using, similar to how MacOS is different than Windows, NixOS is another type of operating system. I’ve only been using it for a few months, but its extremely cool. Before that I mostly used Debian and Fedora, but the main difference between NixOS and them is that you can define you whole OS as a configuration files, and then the tools it’s designed around build your system for you. So instead of say, installing a program, opening it up and going into settings and changing everything to be how you like it. You can instead just make a file that lists everything the way you want it from the start, and Nix installs the program and sets it all up all in one go. It has a pretty big learning curve, and its features are numerous that I have yet to take full advantage of them. Probably not the best to start with if you are new to GNU/Linux systems, but once you see the benefits of why it does things differently, its awesome.
Hopefully that explains most of the words I used. Pardon my formatting, as I don’t know markdown very well and I think I separated everything okay. :)
Wow, this is fascinating. Looks like I will need to devote some time to this.
I’m using Bing Image Creator mostly. It’s powered by DALL-E 3.
I’ve used https://www.craiyon.com/ for one because I purposely wanted it to be crappy.
Stable diffusion. 1.5 on my own PC (with an old GTX 970), as it gets too old to run SD XL, I use getimg.ai which is uses XL and a selection XL based models. The reason why I use this one is what I can buy credits without a subscription. I am fine paying 10€ every so and on but less keen to pay 10 a month, just for a fun toy)
Where can I find some well-explained step by step instructions for running an AI directly on my Mac?
I use that package https://easydiffusion.github.io/
If you’re geek enough to be here you should find how to use it, might require a couple of evening tweaking
Stable Diffusion on a 1080TI. Not the fastest and can’t handle the some models, but it runs. Cost noting but time and a slight uptick in my already high power bill(and thousands in hardware).
I just use bing cuz im a dweeb and its free and easy
You say dweeb like it’s a bad thing
Does anyone know if SDXL can run split tasks with SLI cards? I’ve been thinking of building a dual A80 tesla rig since they are so cheap but I want to be able to render on all 48gb as one.
For OP – I run totally on OpenAI using API calls.
You can’t just increase your VRAM limit like that for single tasks, like working on a single massive high-resolution image.
There might be some way to get a series of queued tasks split.
googles
According to this, not in Automatic1111 currently, but there’s some other frontend that can:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1621
StableSwarmUI support this out of box. The ex employee of Stability.ai made it.
https://github.com/Stability-AI/StableSwarmUI
Motivations
The “Swarm” name is in reference to the original key function of the UI: enabling a ‘swarm’ of GPUs to all generate images for the same user at once (especially for large grid generations).
I use ChatGPT premium almost every day, mostly for coding, rarely for image generation. $20/month. It can write/refactor decent (not great) code faster than me if I can type out what I want faster than just writing the code myself. Dalle-3 through ChatGPT produces pretty good images and seems to understand the prompts better than SD (ChatGPT actually writes the prompt for you, so that might have something to do with it). It’s much better than Dalle-2, but they’ve put guardrails on it so you can’t ask to do things like create images in the style of a modern artist.
I’ve messed around with Automatic1111 and SD a little bit. ControlNet is very nice for when you need to have control over the output. I would draw shitty outlines with Inkscape then used SD+ControlNet to kind of fill everything else in. Free and open source model and software. Ran it on a RTX 3090 which costed me $800 a year ago.
Messed around with DeepFloyd IF on replicate.ai for a while, which was very nice. It seemed to understand the prompts much better than SD. I think it was $2/hr, with each image generation using something like 30s of GPU time. Cold starts can take minutes though, which is annoying.
I use OpenAI’s API in a prototype application; both GPT-4 and Dalle-3. GPT-4 is by far the most well-behaved and “knowledgeable” LLM, but all the guardrails put on it can be annoying. Dalle-3 is pretty good, but not sure if it’s the best. The cost isn’t significant yet while prototyping.
I get ads, news, and video recommendations served to me which probably uses some kind multi-armed bandit AI algorithm. Costs me my privacy. I don’t like it; I rate it 0/10.
I’m actually really curious to see how the 10 core GPU on my M2 Mac Mini performs.
I haven’t tried using Macs. I’ve heard their GPUs are kinda slow (compared to high-end discrete GPU), but have unified memory so you can run very large models.
I bought 3090s because I needed to train a classifier. It took months of training 24/7, so it was cheaper to buy 3090s than pay for cloud compute time. A 3090 is probably overkill for just running SDXL (unless they release an even larger model in the future).
I’m using Stable Diffusion. It’s open-source, and the software doesn’t cost anything…but you run it locally, so you need a graphics card capable of handling it, and it has a serious hunger for VRAM. I’m using an AMD Radeon RX 7900 XTX with 24GB of VRAM; that’s something like $1k. Your CPU doesn’t really matter, as the GPU does the heavy-lifting. An 8GB card probably is going to have problems running out of memory running the SDXL models that are current; I’d probably go with a 16GB+ card if I intended to run it.
Part of the “AI” here is the software, Stable Diffusion, and part is the model, the “memory” or “knowledge” of the AI; I use realmixXL 1.5; as with most other Stable Diffusion models, it’s downloadable from civitai.com, and costs nothing.
Relative to the other AIs people on here are likely to use…hmm. Well, the major commercial services tend to censor; and have various content filters. If you want to generate pornography, Stable Diffusion is probably where you want to be. Ditto for things like images of celebrities, which I understand the online services are aiming at also restricting (and maybe art styles of artists, not sure where the commercial services are going with that). And if you’re a programmer and want to write software that interacts directly with it, SD’s where you’ll want to be.
Both Midjourney and DALL-E have a natural-language processing pass, so the idea is that one feeds them somewhat English-looking sentences, which are easier to write. You’ll generally get something not entirely unreasonable in Stable Diffusion if you do that, but normally one just feeds it a list of tokens, a list of keywords.
The commercial services will also data-mine what you’re doing; Stable Diffusion stuff stays on your local computer. That could matter to you, if you take issue with things like Bing or Google data-mining searches.
Stable Diffusion tends to provide for a lot of control, and there are a lot of new extensions being put out regularly for it by researchers.
On the other hand, you’re limited to the hardware that you have. You’re paying for that hardware, even if it’s not actually running; whatever premium pricing commercial online services have or will have, they’ll ultimately be able to spread out the cost of hardware across many users by keeping the compute cards more-or-less constantly in use. If you use your GPU 1% of its lifetime…well, if they can keep the compute cards in their datacenters at an average of 80% activity, they’re getting 80 times as much good out of their parallel compute hardware.
I don’t know what hardware Midjourney and DALL-E are running on, but I would wager that if they aren’t yet, they will be running on compute cards with more VRAM than most home users are going to be able to get ahold of, so they’ll probably have an advantage in terms of the potential model size. That’s a guess.
There may be other features added down the line in the natural-language-processing layer in the commercial services, and AFAICT, that’s not really a primary focus of Stable Diffusion development. My guess is that one of the big commercial services will probably wind up looking something like Instagram – it’ll become approachable enough for everyone and get a huge user base – and I suspect that it won’t be Stable Diffusion because of the lack of the NLP layer, the fact that it doesn’t aim to take in English-language-looking sentences.
Stable Diffusion also has multiple frontends. I generally use the (popular) Automatic1111 frontend, which is more-analogous to the Midjourney or DALL-E UIs. Another notable frontend is ComfyUI; this works more like image-processing software, where one builds up a directed graph of operations, and then as you make changes, the graph will recompute stuff as needed. That’s probably more-useful for compositing complex scenes, but it’s also slower to put together a scene; one isn’t just plonking in some search terms.
Adobe also has some kind of generative AI effort (“Firefly”) going on, I assume is gonna integrate it with their graphics processing software. I’ve got no experience with it, but if you’re a serious Photoshop user, you might want to look into it, see what it’s like, since I’d guess that whatever they come up with, they’ll probably do a reasonable job of integrating it with their traditional image-processing software.
There are a number of users of all of SD, Midjourney, and DALL-E/Bing on this community; you’ll get solid images out of all of them, as things stand.
I should also mention for completeness that one can “rent” a computer with a large GPU and use it remotely on places like vast.ai, if you just want to dabble a bit. But then you’re also kind of in the position of keeping the GPU idle a fair bit of the time, just as if you had it locally.
There may be someone running online Stable Diffusion-based services out there, but I haven’t gone looking to get an appraisal of what the state of affairs there is.
EDIT: I should also note that you can run Stable Diffusion on your CPU. It will be very, very, very slow, and unless you just want to take a look at the UI or something, you are probably going to go bonkers pretty quickly if you try doing any significant work on the CPU. Might work if you just want to occasionally upscale an image – something that it’s pretty good at.
What if I have quad 12-core Xeons with 196GB of RAM?
How slow are we talking? Would a prompt I can run on Mage.space in 3min take my system hours? or days?
What if I have quad 12-core Xeons with 196GB of RAM?
I have a 24-core i9-13900 and 128GB of RAM and I briefly tried it and recall it being what I’d call unusably slow. That being said, I also just discovered that my water cooler’s pump has been broken and the poor CPU had been running with zero cooling for the past six months and throttling the bajesus out of itself, so maybe I’d be possible to improve on that a bit.
If you seriously want to try it, I’d just give it a spin. Won’t cost you more then the time to download and install it, and you’ll know how it performs. And you’ll get to try the UI.
I just don’t want to give the impression to people that they’re gonna be happy with on-CPU performance and then have them be disappointed, hence the qualifiers.
EDIT: Here’s a fork designed specifically for the CPU that uses a bunch of other optimizations (like the turbo “do a generation in only a couple iterations” thing, which I understand has some quality tradeoffs) that says that it can get down into practical times for a CPU, just a couple of seconds. It can’t do 1024x1024 images, though.
https://github.com/rupeshs/fastsdcpu
I haven’t used it, though. And I don’t think that that “turbo” approach lets you use arbitrary models.
zoo.replicate.dev, similar to craiyon, but more options for running different models.
CodeLLaMa. It costs 15GB of vram which my gpu has
I mostly prefer to use Stable Diffusion with ComfyUI. But microsoft image creator/bing image creator are really neat with their flexibility to blend different things together, that Stable Diffusion cant do without a lot of tinkering. But Microsoft’s solutions are really annoying to use. Rate limited, censored to heck and back, and at the moment its not even letting me make images because its faking that “theres too many users”. Basically softbanning me.