Generative Deep Learning

Ecole Nationale Supérieure de Cognitique

Baptiste Pesquet

GitHub logo

Summary

  • Neural Style Transfer
  • Generative Adversarial Networks (GAN)

Neural Style Transfer

Neural Style Transfer in a nutshell

  • Reproduce an image with a new artistic style provided by another image.
  • Blend a content image and a style reference image in a stylized output image.
  • First described in A Neural Algorithm of Artistic Style by Gatys et al (2015). Many refinements and variations since.

Example (Prisma app)

Prisma style transfer example

Underlying idea

As always: loss minimization.

Content loss Style loss Total loss

The content loss

  • Content = high-level structure of an image.
  • Can be captured by the upper layer of a convolutional neural network.
  • Content loss for a layer = distance between the feature maps of the content and generated images.

The style loss

  • Style = low-level features of an image (textures, colors, visual patterns).
  • Can be captured by using correlations across the different feature maps (filter responses) of a convnet.
  • Feature correlations are computed via a Gram matrix (outer product of the feature maps for a given layer).
  • Style loss for a layer = distance between the Gram matrices of the feature maps for the style and generated images.

The total variation loss

  • Sum of the absolute differences for neighboring pixel-values in an image. Measures how much noise is in the image.
  • Encourage spatial continuity in the generated image (denoising).
  • Act as a regularization loss.

Gradient descent

  • Objective: minimize the total loss.
  • Optimizer: L-BFGS (original choice made by Gatys et al.) or Adam.

Animation of style transfer

Generative Adversarial Networks (GAN)

GAN in a nutshell

  • Simultaneously train two models:
    • One tries to generate realistic data.
    • The other tries to discriminate between real and generated data.
  • Each model is trained to best the other.
  • First described in Generative Adversarial Nets by Goodfellow et al. (2014).
  • NIPS 2016 Tutorial

GAN overview

GAN process

Training process

  • The generator creates images from random noise.
  • Generated images are mixed with real ones.
  • The discriminator is trained on these mixed images.
  • The generator’s parameters are updated in a direction that makes the discriminator more likely to classify generated data as “real”.

Specificities and gotchas

  • A GAN is a dynamic system that evolves at each training step.
  • Interestingly, the generator never sees images froms the training set directly: all its informations come from the discriminator.
  • Training can be tricky: noisy generated data, vanishing gradients, domination of one side…
  • GAN convergence theory is an active area of research.
  • GAN Open Questions

GAN progress on face generation

GAN progress from 2014 to 2018

The GAN landscape

GAN flavours

Some GAN flavours

  • DCGAN (2016): use deep convolutional networks for generator and discriminator.
  • CycleGAN (2017): image-to-image translation in the absence of any paired training examples.
  • StyleGAN (2019): fine control of output images.
  • GAN - The Story So Far

GAN use cases: not just images!