Written by
Thomas Clapper
Thomas Clapper
Category
Design
Jul
15

Why Hello DALL-E 2

What is Dalle-2?

If you haven’t heard, DALL-E 2 is a natural text-to-image generator. Meaning if you type the prompt –


“A sloth rollerskating during golden hour, digital art.”


You may get something like this –

"a sloth rollerskating during golden hour, digital art"



You aren’t too far off if that sounds like some form of magic. Over the last 2 weeks, I have generated well over 2,000 images both for entertainment and to push the medium to its extreme to see what it can and, more importantly, what it can’t do.



How it works

The technical aspect of the DALL-E 2 really gives it that magic feel. You can see their explanation here: https://vimeo.com/692375454


The general gist is that the AI engine has been tagged with millions of images that focus on the context of the objects and scenes. The system takes your input and attempts to parse out what you actually want in your image. This isn't a perfect relationship. For instance:

"a baby tossing pizza, digital art" vs "a baby tossing pizza dough, digital art"



Though we might say a baker was throwing pizza, DALL-E 2 takes that a bit literally.

The generator learned by creating random noise, similar to the fuzz that shows up on a TV without a signal (if you remember that from the old days) from an original image. Then it rebuilds the original from the noise to learn how to create the image. This is the method it uses to then build out variations of the original.

The idea is to start with random noise and slowly turn it into the equivalent of the text that it identified you typed in. Going over the image again and again, flipping pixels until the image is rendered to what the system believed you asked for.

You can read a deeper dive of the technology here: https://medium.com/augmented-startups/how-does-dall-e-2-work-e6d492a2667f

What it can do well

DALL-E can generate some pretty amazing images. One aspect that my friends and family have noted is that the small thumbnails look amazing. When you first see the set of images, they usually surprise and delight. of

"laptop in a coffee shop, digital art"



However, as you zoom into the individual images, you notice imperfections like extra hands, unformed faces, and oddly placed objects. These errors do not negate that the system is still impressive – the art is just less likely to be hung on my wall as art.

Closeup of a "laptop in a coffee shop, digital art"


It should be noted here that DALL-E 2 currently doesn’t do photorealistic faces. In fact, they have an algorithm that attempts to disrupt rendering faces. Further, their policy forbids posting photorealistic faces here. However, I can describe that they are disrupted just enough to be scary.


DALL-E 2 generates digital art prompts really well. The system seems to be especially good at rendering animals in digital scenes. This art is one of my favorites that it has produced.

"a vole as a detective looking for the last clue to solve a mystery, digital art"



The generator also does a great job with impressionist paintings. As noted, the details don’t always look sharp on the images, so giving it a medium to copy that is soft and forgiving produces some awe-inspiring art.

"Fireworks over Mississippi River, by Monet"


Last, I would say that DALL-E 2 naturally excels at highly stylized images of familiar objects. It seems that the stylizing helps it form the image well and produces some incredible art.

"a synth wave cup of coffee"


NOTE: One last trick I found with DALL-E 2 is to ask the scene to be during golden hour. This addition adds warmth and magic to most images I found highly desirable. In fact, as I have produced more and more images, most contain that in the text somewhere.



What prompts make nightmare images

There are also things that DALL-E 2 doesn’t handle well. For example, phrases that the system struggles to understand or images it struggles to render. In these situations, the system either gives an awkward image or, worst-case scenario, things out of nightmares.


Take, for example, this series I did with babies. I expected that it would look like the early work from Pixar.

Screenshot from the Pixar short "Tin Toy"


For the most part, DALL-E surprised me with cut babies doing various activities. However, when I asked DALL-E 2 to give me a baby vacuuming with a “Persian rug,” it clearly didn’t know how to handle it. That one phrase set it off to somehow merge grandmas with babies — a creepy Benjamin Button situation, in my opinion.

"a baby vacuuming a persian rug in a living room, digital art" vs. "a baby vacuuming a living room, digital art"


Similarly, it may mess up what you are asking. For instance, I wanted to see if DALL-E 2 would produce a pencil wrestling a vacuum. However, the image created was missing a pencil at all, and DALL-E filled in the rest. Humorous photos, but not what I was hoping for.

"A vacuum holding a pencil in a headlock while wrestling in a living room"



DALL-E will sometimes choose a style that is just plain bad. Maybe this is a style opinion, and someone wants low-quality art, but for me, there is a significant question on what kind of art you will ultimately get. Here is an example of the 6 images of pelicans complaining about the weather. Note the final puffin style.

"A puffin complaining about the rain, digital art"


DALL-E 2 cannot spell. If words are imprinted onto the image, they resemble ideas of English words but are almost never correct.


Last, when rendering people, animals, and the like, sometimes DALL-E 2 doesn’t finish. There will be missing limbs, legs detached, and more. These, in particular, are examples where the small thumbnail looks right, but after enlarging the image and zooming in, the errors are very apparent.

"A 3d render of a plant wearing glasses, studying hard for a test at their desk"


Can they actually charge for usage?

One of the most prominent questions swirling around DALL-E is how OpenAI will end up offering DALL-E after beta? One idea presented in a user survey was how much someone would be willing to pay per search.


This concept prompted me to think through how I might use it in the future if searching moved to a paid model.

On the pros of DALL-E 2:

  • I have enjoyed the creativity of mismatching and mashing concepts in new ways
  • Some of the art was so delightful I have talked about hanging it on my wall
  • It is entertaining to search with other people, like my young kids, who think it is amazing


Cons of DALL-E:

  • Not every search gives me what I was looking for – or even a usable image
  • There is a novelty that wears off over time as I run out of ideas
  • The photos are a decent size but would need to be much higher resolution to be usable outside of small images on the web


I will most likely pay if I use DALL-E 2 for work because it can justify the expense. However, when I go out to look for stock images, I am not paid for trying to find one that fits. Often it is an exploratory process that helps me hone what type of photo or design I am looking for.


DALL-E 2’s concept of paying per search doesn’t seem feasible because I am not confident I can get the results I need in the number of searches that make it a cost-effective method.


Where OpenAI could get me to pay

If OpenAI wants to charge me (which I would assume they need to create a paid model eventually), I believe I would use the following:


  • All images that have been made by others are searchable as a stock image site that can be purchased for inexpensive in a very high-quality render (8K or so)
  • If you want to go searching for an image, you can run up to 50 searches a day, but you still only pay for images you “buy.”
  • You can decide if you buy a copy or you buy the sole rights (a copy would go into the general stock image site)
  • The DALL-E 2 system creates a new feature that once a model is made – let’s say a cute personified mouse as a detective – you can reuse that character in new scenes. Somehow you can call that model again to put them into a new scene over and over.
  • This addition would allow me access to all the images I need for a website with a “model” or two. I run into the issue of shutter stock only having 3 images of a particular person but wanting them across 10 images on the website.


Suppose OpenAI can find a new way to make searching more reliable OR create a system that creates constant value over time (not just novelty). In that case, I can see a paid version becoming quite valuable.


See more examples of images