The Basics

I have a billion interests. I have learned the hard way (well, still learning, really) that it does no good to jump among all of them constantly. So I’ve had to prioritize. I spent a couple years learning to draw, and I’m certainly no expert on it. But I know enough that I’m ready for a new creative challenge: creating with AI image models.

It’s a nice dovetail between my day job (web developer, transitioning to analyst and in-house solutions engineer) and my personal pursuits of trying to express myself in some meaningful way.

I had to put a few other things on hold (learning to play drums, learning day trading, learning to paint better with watercolors) so I could do this. I’m still pursuing learning Spanish, because that seems like a separate enough train of thought. (train of thought number 3, describing a reasonable worldview that can encompass both faith and reason without belittling either, continues apace. We won’t get into that right now.)

I’m learning a ton about AI in general, and it’s helping my work with Midjourney.

Why Midjourney?

There are several options that seem to work well for people: DALL-E, Leonardo, Playground, Ideogram, and several ways to run the Stable Diffusion system both locally and online. I’ve played with all of them, and I was blown away at how much better MidJourney (MJ) understood what I wanted. That was, of course, back in ancient pre-history, late August of 2023, so all the models have changed and improved. I’m still very happy with MJ, and since I can only afford one paid plan at a time, I’m going with MJ’s $30 a month option so I can create unlimited images (they’re in “relaxed” mode, which means they take a bit longer).

My understanding of how Midjourney works is constantly evolving, but here’s a current snapshot.

The Seed

The seed values determines what random jumble of pixels the model starts with to sculpt into your image. If you don’t specify one, the system will give you one at random. This is used again if you vary or remix your image to make sure that you’re starting from mostly the starting point. I don’t know for sure, but like a lot of other people I suspect that the seeds work best in your current session. If you go back and use the same prompt with a seed you used days ago, you’ll likely get something very different.

Two very different prompts that start from the same seed.

The Model

Midjourney is a language model, as are all image generators. Even though we use the term “artificial intelligence,” I don’t think of the model as “thinking.” It’s really just calculating what you probably want based on your input. Part of the process is that it has its own idea of what looks good or what a reasonable image would look like that matched the text of your prompt. I understand when people get upset that artists are somehow cheating by using AI. I simply don’t see it like that. This is a different can of worms that I will get into eventually.

The point for our purposes is this: you will never write a prompt detailed enough to get everyting you want. We depend on the model to make literally millions of decisions so we don’t have to. So at the beginning of the process especially, we want to involve the model in that decision making. We do that with high --stylize values, or at least leaving them at default.

soft watercolor illustration of a quirky house shaped like a camera on the moon. The Earth rises in the background. The camera house is a lunar observatory, with telescopes pointed at distant galaxies. Astronauts gather to watch a solar eclipse, capturing the moment for posterity.

Later on, when we have the basics of what we want and we’re fine-tuning, we want to take the model out of it more, so that we have more control over what we’re changing. This is when it’s a good time to use --style raw to take the model’s own sense of aesthetics out, and to use low --stylize values. I find that too low isn’t good either: if I use --s 50 or lower I often find that faces don’t look like faces or horses don’t look like horses anymore. You have to find the sweet spot between control of the image and the model understanding what you want.

a quirky house shaped like a camera on the moon. The Earth rises in the background. The camera house is a lunar observatory, with telescopes pointed at distant galaxies. Astronauts gather to watch a solar eclipse, capturing the moment for posterity. --style raw --stylize 75

Then at the end, when we’re looking to get some customized styling, it can sometimes be helpful to involve the model more heavily.

soft watercolor illustration of a quirky house shaped like a camera on the moon. The Earth rises in the background. The camera house is a lunar observatory, with telescopes pointed at distant galaxies. Astronauts gather to watch a solar eclipse, capturing the moment for posterity. ::1 clipart isolated on white background ::1 --no background, color to edges --sref https://s.mj.run/E0TAfN7GXf0 https://s.mj.run/Sc3VFcUg7fQ https://s.mj.run/GdgerUI078k ::2 https://s.mj.run/5XFrjuGM1V4 ::2 --sw 250 --stylize 600 --iw 1.2

The Prompt

You can find a million opinions on the best way to build a text prompt for midjourney. I won’t pretend to be an expert (everyone who says they’re an expert is probably also pretending, imo) My current guidelines for v6 vary depending on what I’m trying to accomplish. This is my basic process, although I don’t always follow it:

I start in ideation mode (not sure what I’m going to end up with yet). I will do things like abstract language that could be wildly interpreted. Sometimes I will put in bits of poetry or text, tell a story, or add wikipedia-style info about what I want in the image. This is an excellent place to try out words and phrases that you’d just be curious to see how the model interprets it. A Youtuber named Thaeyne does a great job of just finding intereesting keywords to put into their prompts. Those videos are often worth a watch. I sometimes use a non-zero --chaos value as well. This will give you more variance between the corners of you four-up image.

three apples sit in a triangle shape on the floor of an autumnal forest. leaves litter the ground around.

After I get a general idea of what will work, i move on to composition mode. I want control of what is pictured and the general way it’s presented. For this I try to keep mostly to describing things that can be seen and that don’t require a lot of interpretation. You might get predictable results from something basic like, “the rabbit has a smug look on his face.” You almost certainly won’t get predictable results if you say “the rabbit represents the struggle of the human spirit to actualize itself.”

My final step is what I think of as styling mode. Here is where I dial in the overall aesthetic of the image. In my examples here, I’m strting with specifying “2d flat vector art on a white background” to guide the model from the beginning. This is because my final product will be an ink & watercolor look that i can remove the background from. The 2d vector art part will keep the shapes and colors simple so it doesn’t get labyrinthine when it turns the already detailed image into a more detailed image.

three apples sit in a triangle shape on the floor of an autumnal forest. leaves litter the ground around. ::1 watercolor clipart isolated on white background ::1 --no background detail --sref https://s.mj.run/n6EeUhwVkWc https://s.mj.run/aj78h_-hZ78 https://s.mj.run/ahs_w9Gca_0 --sw 500 --stylize 250

Does it work?

Well, it’s not a perfect process, as you can see from these examples. The model can be weirdly obstinate in ignoring certain parts of what I want. Rewording sometimes helps, as well as a few other strategies I have for manipulating the image. This is a tale for another time.