Exploring the power of generative adversarial networks

The Art of Forgery

© Lead Image © Aliaksandr-Marko, 123RF.com

© Lead Image © Aliaksandr-Marko, 123RF.com

Author(s):

Auction houses are selling AI-based artwork that looks like it came from the grand masters. The Internet is peppered with photos of people who don't exist, and the movie industry dreams of resurrecting dead stars. Enter the world of generative adversarial networks.

Machine learning models that could recognize objects in images – and even create entirely new images – were once no more than a pipe dream. Although the AI world discussed various strategies, a satisfactory solution proved illusive. Then in 2014, after an animated discussion in a Montreal bar, Ian Goodfellow came up with a bright idea.

At a fellow student's doctoral party, Goodfellow and his colleagues were discussing a project that involved mathematically determining everything that makes up a photograph. Their idea was to feed this information into a machine so that it could create its own images. At first, Goodfellow declared that it would never work. After all, there were too many parameters to consider, and it would be hard to include them all. But back home, the problem was still on Goodfellow's mind, and he actually found the solution that same night: Neural networks could teach a computer to create realistic photos.

His plan required two networks, the generator and the discriminator, interacting as counterparts. The best way to understand this idea is to consider an analogy. On one side is an art forger (generator). The art forger wants to, say, paint a picture in the style of Vincent van Gogh in order to sell it as an original to an auction house. On the other hand, an art detective and a real van Gogh connoisseur at the auction house try to identify forgeries. At first, the art expert is quite inexperienced, but the detective immediately recognizes that it is not a real van Gogh. Nevertheless, the counterfeiter does not even think of giving up. The forger keep practicing and trying to foist new and better paintings off on the detective. In each round, the painting looks more like an original by a famous painter, until the detective finally classifies it as genuine.

This story clearly describes the idea behind GANs. Two neural networks – a generator and a discriminator – play against each other and learn from each other. Initially, the generator receives a random signal and generates an image from it.Combined with instances of the training dataset (real images), this output forms the input for the second network, the discriminator. Then the discriminator assigns the image to either the training dataset or the generator and receives information on whether or not it was correct. Through back propagation, the discriminator's classification then returns a signal to the generator, which uses this feedback to adjust its output accordingly.

The game carries on in as many iterations as it takes for both networks to have learned enough from each other for the discriminator to no longer recognize where the final image came from. The generator part of a GAN learns to generate fake data by following the discriminator's feedback. By doing so, the generator convinces the discriminator to classify the generator's output as genuine.

From AI to Art

One of the best-known examples of what GANs are capable of in practical terms is the painting "Portrait of Edmond Belamy" [1] from the collection "La Famille de Belamy" (Figure 1). Auctioned at Christie's for $432,500 in 2018, the artwork is signed by the algorithm that created it. This shows what GANs are backed up by in terms of mathematics and game theory: the minimax strategy. The algorithm is used to determine the optimal game strategy for finite two-person zero-sum games such as checkers, nine men's morris, or chess. With its help and a training dataset of 15,000 classical portraits, the Parisian artist collective Obvious generated the likeness of Edmond Belamy, as well as those of his relatives [2].

Figure 1: The Paris-based Obvious collective, which is made up of AI experts and artists exploring the creative potential of artificial intelligence, used a GAN to generate this painting (Source: Obvious).

Today, AI works of art are flooding art markets and the Internet. In addition, you will find websites and apps that allow users to generate their own artificial works in numerous styles using keywords or uploaded images. Figure 2, for example, comes from NightCafÈ [3] and demonstrates what AI makes of the buzzword "time machine" after three training runs. Users can achieve quite respectable results in a short time, as Figure 3 shows. But if you want to refine your image down to the smallest detail, you have to buy credit points.

Figure 2: Image generated by NightCafÈ. © rolffimages, 123RF.com
Figure 3: A catchword and style parameters are all you need to generate a scene in cyberpunk style.

GANs don't just mimic brushstrokes. Among other things, they create extremely authentic photos of people. The website thispersondoesnotexist.com [4] provides some impressive examples. The site is backed by AI developer Phillip Wang and NVIDIA's StyleGAN [5]. On each refresh, StyleGAN generates a new, almost frighteningly realistic image of a person who is completely fictitious (Figure 4). Right off the bat, it's very difficult to identify the image as a fake.

Figure 4: NVIDIA's StyleGAN delivers such good results that viewers often can't tell if it's a real person or not (Source: thispersondoesnotexist.com).

Jevin West and Carl Bergstrom of the University of Washington, as part of the Calling Bullshit Project, at least give a few hints on their website at whichfaceisreal.com that can help users distinguish fakes [6]. There are, for example, errors in the images that look like water stains and clearly identify a photo as generated by StyleGAN, or details of the hairline or the earlobes that do not really match. Sometimes irregularities appear in the background, too. The AI doesn't really care about the background because it is trained to create faces.

GANs and Moving Images

Faces are also the subject in another still fairly unexplored application for GANs, the movie industry. Experts have long recognized the potential of GAN technology for film. It is used, for example, to correct problematic blips in lip-synched series or movies. The actors' facial expressions and lip movements often do not match the dialog spoken in another language, and the audience finds this dissonance distracting.

GANs and deep fakes solve the problem. Deep fakes replace the facial expressions and lip movements from the original recording. To create deep fakes, application developers need to feed movies or series in a specific language into training datasets. The GAN can then mime new movements on the actors' faces to match the lip-synched speech.

But this is just a small teaser of what GANs could do in the context of movies. In various projects, researchers are working on resurrecting the deceased through AI. For example, developers at MIT resurrected Richard Nixon in 2020, letting him bemoan a failed moon mission in a fake speech to the nation [7]. The same method could theoretically be applied to long-deceased Hollywood celebrities.

GANTheftAuto

The traditional way to develop computer games is to cast them in countless lines of code. Programming simple variants does not pose any special challenges for AI. A set of training data and NVIDIA's GameGAN generator[8], for example, is all you need for a fully interactive game world to emerge at the end. The Pacman version by an NVIDIA AI, or an Intel model that can be used to implement far more realistic scenes in video games demonstrate how far the technology has advanced.

However, this by no means marks the limits of what is possible. In 2021, AI developers Harrison Kinsley and Daniel Kukiela reached the next level with GANTheftAuto [9] (Figure 5). With the help of GameGAN, they managed to generate a playable demo version of the 3D game Grand Theft Auto (GTA) V. To do this, the AI – as in NVIDIA's Pacman project – has to do exactly one thing: play, play, and play again.

Figure 5: By running through a road section in GTA V over and over again, the AI generates a visually realistic demo (Source: YouTube).

Admittedly, the action-adventure classic game is far more complex with its racing and third-person shooter influences. The training overhead increases massively with this complexity, which is why Kinsley and Kukiela initially concentrated on a single street. They had their AI run the course over and over again in numerous iterations, collecting the training material itself. While doing so, GameGAN learned to distinguish between the car and the environment.

The bottom line: GANTheftAuto is still far removed from the graphical precision of full-fledged video games, but it is worth watching and likely to be trendsetting. At least the AI managed to correctly copy details such as the reflection of sunlight in the rear window or the shadows cast by the car from GTA V. And it reproduced them correctly, as Kinsley explains in a demo video on YouTube [10].

Resources

As you can probably guess, a system of two adversarial neural networks is a complex thing, and programming one from scratch is a difficult road unless you have considerable experience with AI. Still, several resources are available for those who wish to further explore this fascinating field.

First of all, Ian Goodfellow's original GAN code is still available on GitHub [11], and you are free to download it yourself and experiment. The code is mostly in Python, and the authors include the following note: "We are an academic lab, not a software company, and we have no personnel devoted to documenting and maintaining this research code. Therefore this code is offered with absolutely no support." The GitHub page makes reference to the original June 2014 article "Generative Adversarial Networks" by Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. The article, which is also available today for free download at the arxiv.org website [12], offers a technical introduction that is a good starting point if you are looking for more information on GANs. The first two sentences of the abstract succinctly sum up this promising technique: "We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake."

You'll also find several other GAN implementations, some showcasing different AI development tools, including a Torch implementation [13] and a TensorFlow-based lightweight library for training GANs [14]. Other projects let you use GANs to edit images [15], generate images from text [16], and even generate anime characters [17].

If you want to explore GANs but you're not quite ready to dive down into the code, you will also find some illuminating demo sites online that will give you a closer look. One example is GAN Lab [18] (Figure 6), an application developed by the Polo Club of Data Science, a group of programmers and scientists affiliated with Georgia Tech University. GAN Lab is an application that lets you experiment with GANs in your browser window. To maximize its effect as a teaching tool, GAN Lab takes a very simple approach. Rather than generating a computerized painting or a fake video, the Lab simply generates a scattering of data points to match a sample. The user can choose a preconfigured sample data distribution pattern or define a custom pattern.

Figure 6: GAN Lab lets you watch a GAN at work, with a very simple example to maximize the illustrative effect.

Curse, Blessing, or Both?

In the nearly two decades since Ian Goodfellow got the ball rolling, GANs have taken several fields by storm, and the technology is still rapidly evolving. Generative AI is already delivering impressive results, especially in the context of images and video, and the technique is still in its infancy. Future possibilities include assisting with medical imaging methods, such as X-rays, CT scans, or MRIs. With the help of an AI-modeled disease progression, doctors could adjust their treatment at an early stage to improve outcomes.

But for all the hype surrounding GANs, the technology also has its downsides: It drastically simplifies the process of creating fake content. The Internet has played an important role in publishing and disseminating false information for many years, and more convincing fake videos could compound the problem significantly. The best way to prepare for this challenge is to raise awareness about the power of GANs.

Infos

  1. Portrait of Edmond de Belamy: https://en.wikipedia.org/wiki/Edmond_de_Belamy
  2. Obvious: https://obvious-art.com/page-projects/
  3. NightCafÈ: https://creator.nightcafe.studio/my-creations
  4. Thispersondoesnotexist.com: https://thispersondoesnotexist.com/
  5. StyleGAN: https://github.com/NVlabs/stylegan
  6. Whichfaceisreal.com: https://www.whichfaceisreal.com/
  7. Nixon Deepfake: https://www.scientificamerican.com/article/a-nixon-deepfake-a-moon-disaster-speech-and-an-information-ecosystem-at-risk1/
  8. GameGAN: https://nv-tlabs.github.io/gameGAN/
  9. GANTheftAuto on GitHub: https://github.com/Sentdex/GANTheftAuto
  10. GANTheftAuto on YouTube: https://www.youtube.com/watch?v=udPY5rQVoW0
  11. Code and Hyperparameters for the Paper "Generative Adversarial Networks": https://github.com/goodfeli/adversarial
  12. "Generative Adversarial Networks" by Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio: https://arxiv.org/abs/1406.2661
  13. gans-collection.torch: https://github.com/nashory/gans-collection.torch
  14. Tooling for GANs in TensorFlow: https://github.com/tensorflow/gan
  15. Invertible Conditional GANs for Image Editing: https://github.com/Guim3/IcGAN
  16. TAC-GAN: https://github.com/dashayushman/TAC-GAN
  17. animeGAN: https://github.com/jayleicn/animeGAN
  18. GAN Lab: https://poloclub.github.io/ganlab/