Stable Diffusion AI

Deep Dish · Oct 9, 2023

Stable Diffusion is the best AI image generator. I've been using Stable Diffusion for six months, looking into how I can use AI within my art pipeline.

In support for my assertion of it being the best, Stable Diffusion is free and open source, runs locally on your own computer, has thousands of checkpoint models to choose from, you can mix models, you can train your own models (called Loras), is highly customizable, doesn't censor your image generation, has ControlNet (for advanced control over images), and the image quality rivals the best of Midjourney (especially with the release of SDXL).

Stable Diffusion is a model rather than a specific program. There are a variety of user interfaces to choose from, with Automatic1111 as the most popular implementation. Others include InvokeAI, SD.Next, and Leonardo.ai.

Because it runs on your own computer rather than the cloud, you don't need to worry about cloud employees or other people using your images. Over at Midjourney, you have to pay an additional $20 a month on top of your base subscription to have privacy, turning privacy into a privilege rather than a right.

Because you use your own GPU, you don't run into censorship. Midjourney, Dall-E, NightCafe, DreamStudio, and Adobe Generative Fill are totally obscene with "community standards" of censorship, which is odd because image generation tools are not social media. DreamStudio is based on Stable Diffusion, but because it's in the cloud and you use their GPU, they go full-blown social justice warrior on you in their terms of service. Censorship defeats the point of using the tools, especially when innocuous prompts are filtered, especially when doing client work. As long as companies are controlling your images, Stable Diffusion remains king.

The downside to Stable Diffusion is that installing it can be a pain in the butt and the installations are huge (InvokeAI clocked in at 80GB). The developers didn't want to spend time developing a standalone program, so it runs in your web browser. You have to install Python, and dealing with Python is finicky at times.

Unlimited Upscaling

Dall-E limits your image to 1024 x 1024, Midjourney limits your image to 2048 x 2048, but with Stable Diffusion you can upscale to any size that you want (as long as it's divisble by 8). I once did somewhere around 20,000 x 20,000.

This is because of ESRGAN upscalers which send the image in tile chunks to your GPU to render images larger than your GPU would normally be able to handle.

Here is a link to upscalers:

ESRGAN/Models/Realistic, Multipurpose

nmkd.de

I used to love using Topaz Gigapixel AI for upscaling, but this blows it out of the water.

Nodes

For the best Stable Diffusion experience, there is ComfyUI. By "best" I don't mean "easiest to use," I mean the most powerful.

ComfyUI is a dedicated node editor and nodes are daunting. Nodes are visual programming. Nodes scare people away because it becomes a flying spaghetti monster when things get really complex, but it's so flexible and customizable, and you can organize the spaghetti.

I have experience with nodes from Maya, 3DS Max, Substance Designer, and Blackmagic Fusion, so when I heard about ComfyUI, I instantly felt at home.

The advantage of nodes:

- You learn how things work under the hood, you get a deep understanding of low-level operations. There is a learning curve, but that's okay, because you will climb it. Start small and simple and gradually work your way up to more complexity.

- Experiment with different ideas without undo.

- Easier to troubleshoot problems by tracing back through the wiring of nodes.

- Build custom workflows that would otherwise be impossible in other programs. You can chain together multiple steps instead of doing it manually, like always upscaling an image before it's saved.

ComfyUI has a lot of custom nodes. My favorite is WAS Node Suite which has over 190 custom nodes, including for compositing, image adjustments, and animation.

If you want a mixture of nodes and a normal interface, you can try ComfyBox, Stable Swarm AI, or InvokeAI.

I personally use ComfyUI for image generation, and Automatic1111 for inpainting and outpainting.

When you're using AI, you often have to render a lot of images (at least several) to find the best one. When you do find an image that you really like, your job is not done. To get the best possible image, lock the seed number to keep the composition of the image the same and try different sampling methods and/or schedulers, which very often will produce an even better version of the image. Alternatively, in ComfyUI, it's possible to change the noise which is used to generate the image, which will also give you different variations.

In Stable Diffusion, information about the image generation is usually baked into the metadata of the PNG image, so you can load up an image and find what prompt was used, and ComfyUI bakes in all of the workflow details. ComfyUI can even load up the workflow which was used.

Au revoir!

Stable Diffusion AI

Deep Dish

Master Don Juan

ESRGAN/Models/Realistic, Multipurpose