In my previous post, I discussed how humans, whether they
are artists or laymen, are using Artificial Intelligence to create digital
works of art. Currently, there are
several AI art generative tools that one can choose from, such as DALL-E
2, DreamStudio (Stable Diffusion), Midjourney, NightCafe, and Prodia, just to
name a few. Some of these programs are
free to use, but many charge an annual fee (monthly or yearly).
Eager to try out this new technology
and see how far it could be pushed, I chose to experiment with Bing Image Creator,
which is a highly regulated version of DALL-E 2 (and is free to use with a
Microsoft account). The way this tool
works is really quite simple; first you type in a word or sentence of an idea
you have, then you click the “create” button.
The time it takes for the AI algorithm to formulate and
generate the art of your idea depends on how specific your request is.
For my first generation request, I simply
type in the word “dog”. After a minute
of waiting, the AI program generated four images of a dog that you see below.
Initially, I was very excited that the
resulting images actually looked like a dog.
However, my gradual thoughts were: “Why did the AI generate images of
these particular dog breeds? Why did it choose to generate hyper-realistic
images of dogs as opposed to say hand-drawn illustrations or 3D models? Why are they all headshots and not full body
views?” I understand that users must be
more precise with their words in order to get varied results, but I wonder why
are these particular images (of dogs) the program’s defaults? Interestingly, when I reverse image searched
a few of these generations through Google I found a few websites were using very
similar images of these dogs. In fact,
some images had the Bing AI art watermark in the lower left hand corner.
Next, I decided to repeat the same search,
as I was curious to see if the AI would generate art of the same dog
breed. Once again, four images appeared;
the color, lighting, and position of the dog’s head were all the same. Perhaps the programmers (or those who built
the original algorithm) chose a Retriever to be the AI’s default idea of a “dog”.
Eager to create something different, I
decided to include the additional detail “with alien” in my original search. In these new generations, I finally got four
different breeds of dogs, different species of aliens, and varied head
positions. I was astonished by the
uniqueness of each alien’s features (the number of eyes, the colors of their
leathery skin), and I especially enjoyed the expressions on each of the dogs’
faces (some scared and others confused).
Subsequently, I decided to modify the
sentence even further by adding the words “playing catch.” It was at this point when I started to notice
that the AI program seemed to be struggling with merging several figures into
one image. For example, you may notice
that there are distortions around the eyes in some of the dogs’ faces. I was intrigued by the fact that the program seems
to interpret the request “Dog playing catch with alien” in multiple ways. In two images, a dog and alien are playing
catch with a ball (just as I requested), but in another image, it looks like
the alien has taken on a football shape and is perhaps being caught by the dog
(like a chew toy). What I also found
interesting was that in two images, the AI chose to include a UFO ship, even
though I never requested that in the original prompt.
In the final step of my experiment with
AI-generated art, I added a few more words to this ever-growing sentence: “Dog
playing catch with alien at Fenway Park, photograph.” By adding the word “photograph,” I hoped to
make the final image appear more realistic, with no blurring and crystal-clear
detail, rather than a digital illustration with painterly brushstrokes. In the end, I am very happy with how three of
the four art generations turned out. In
each image, the viewer can clearly see that there is a dog and an alien
throwing a ball back and forth, that the location is a baseball stadium, and
that the AI used the correct colors of Fenway Park. The AI program really pushed itself to create
dynamic movement in both the dog and alien bodies (specifically outstretched
arms and bent knees). One question I
would have for the AI artist is, “Why are all of the dogs portrayed in profile
view and not three-quarter view?” I
wonder if the program is capable of producing images where the dog has its back
towards the camera.
If you would like to learn more about Bing
AI click this link:
https://www.bing.com/images/create/help?FORM=GENHLP
For those of you who are unfamiliar with the concept of AI
Art check out this article:
https://www.techtarget.com/searchenterpriseai/definition/AI-art-artificial-intelligence-art