If you haven’t heard, DALL-E 2 is a natural text-to-image generator. Meaning if you type the prompt –
“A sloth rollerskating during golden hour, digital art.”
You may get something like this –
You aren’t too far off if that sounds like some form of magic. Over the last 2 weeks, I have generated well over 2,000 images both for entertainment and to push the medium to its extreme to see what it can and, more importantly, what it can’t do.
The technical aspect of the DALL-E 2 really gives it that magic feel. You can see their explanation here: https://vimeo.com/692375454
The general gist is that the AI engine has been tagged with millions of images that focus on the context of the objects and scenes. The system takes your input and attempts to parse out what you actually want in your image. This isn't a perfect relationship. For instance:
Though we might say a baker was throwing pizza, DALL-E 2 takes that a bit literally.
The generator learned by creating random noise, similar to the fuzz that shows up on a TV without a signal (if you remember that from the old days) from an original image. Then it rebuilds the original from the noise to learn how to create the image. This is the method it uses to then build out variations of the original.
The idea is to start with random noise and slowly turn it into the equivalent of the text that it identified you typed in. Going over the image again and again, flipping pixels until the image is rendered to what the system believed you asked for.
You can read a deeper dive of the technology here: https://medium.com/augmented-startups/how-does-dall-e-2-work-e6d492a2667f
DALL-E can generate some pretty amazing images. One aspect that my friends and family have noted is that the small thumbnails look amazing. When you first see the set of images, they usually surprise and delight. of
However, as you zoom into the individual images, you notice imperfections like extra hands, unformed faces, and oddly placed objects. These errors do not negate that the system is still impressive – the art is just less likely to be hung on my wall as art.
It should be noted here that DALL-E 2 currently doesn’t do photorealistic faces. In fact, they have an algorithm that attempts to disrupt rendering faces. Further, their policy forbids posting photorealistic faces here. However, I can describe that they are disrupted just enough to be scary.
DALL-E 2 generates digital art prompts really well. The system seems to be especially good at rendering animals in digital scenes. This art is one of my favorites that it has produced.
The generator also does a great job with impressionist paintings. As noted, the details don’t always look sharp on the images, so giving it a medium to copy that is soft and forgiving produces some awe-inspiring art.
Last, I would say that DALL-E 2 naturally excels at highly stylized images of familiar objects. It seems that the stylizing helps it form the image well and produces some incredible art.
NOTE: One last trick I found with DALL-E 2 is to ask the scene to be during golden hour. This addition adds warmth and magic to most images I found highly desirable. In fact, as I have produced more and more images, most contain that in the text somewhere.
There are also things that DALL-E 2 doesn’t handle well. For example, phrases that the system struggles to understand or images it struggles to render. In these situations, the system either gives an awkward image or, worst-case scenario, things out of nightmares.
Take, for example, this series I did with babies. I expected that it would look like the early work from Pixar.
For the most part, DALL-E surprised me with cut babies doing various activities. However, when I asked DALL-E 2 to give me a baby vacuuming with a “Persian rug,” it clearly didn’t know how to handle it. That one phrase set it off to somehow merge grandmas with babies — a creepy Benjamin Button situation, in my opinion.
Similarly, it may mess up what you are asking. For instance, I wanted to see if DALL-E 2 would produce a pencil wrestling a vacuum. However, the image created was missing a pencil at all, and DALL-E filled in the rest. Humorous photos, but not what I was hoping for.
DALL-E will sometimes choose a style that is just plain bad. Maybe this is a style opinion, and someone wants low-quality art, but for me, there is a significant question on what kind of art you will ultimately get. Here is an example of the 6 images of pelicans complaining about the weather. Note the final puffin style.
DALL-E 2 cannot spell. If words are imprinted onto the image, they resemble ideas of English words but are almost never correct.
Last, when rendering people, animals, and the like, sometimes DALL-E 2 doesn’t finish. There will be missing limbs, legs detached, and more. These, in particular, are examples where the small thumbnail looks right, but after enlarging the image and zooming in, the errors are very apparent.
One of the most prominent questions swirling around DALL-E is how OpenAI will end up offering DALL-E after beta? One idea presented in a user survey was how much someone would be willing to pay per search.
This concept prompted me to think through how I might use it in the future if searching moved to a paid model.
I will most likely pay if I use DALL-E 2 for work because it can justify the expense. However, when I go out to look for stock images, I am not paid for trying to find one that fits. Often it is an exploratory process that helps me hone what type of photo or design I am looking for.
DALL-E 2’s concept of paying per search doesn’t seem feasible because I am not confident I can get the results I need in the number of searches that make it a cost-effective method.
If OpenAI wants to charge me (which I would assume they need to create a paid model eventually), I believe I would use the following:
Suppose OpenAI can find a new way to make searching more reliable OR create a system that creates constant value over time (not just novelty). In that case, I can see a paid version becoming quite valuable.