Artificial IntelligenceDeep LearningImage ProcessingMachine Learning

Ethics in Generative AI : Detecting Fake Faces in Videos

By June 26, 2019 No Comments

Technology is inherently about humans, and it is perilous to ignore social and psychological impact while creating tech. As engineers we must be aware of the unintended consequences of the technology we create.

With the advent of automotive AI and recent impact of social media platforms on elections, Ethics in AI has become one of the major areas of research. Few important (but not limited to) questions in Ethical AI are:

Algorithmic Bias : ML algorithms trained on biased data reinforce that bias into results and recommendations. For example machines taught by photos and text from the internet learn a sexist view of women

  • Below is an example that you can try out for yourself in google translate. Having learnt that the vector representation (wordvec) of “SHE” is more closely related to the word nurse than to the word doctor, the translation of the sentence to Turkish and back to english leads to unintended bias reinforcement [ReadMore]
Left : English to Turkish Right: Turkish to English

Autonomy & System Design : Whose life should your self driving car save in case of an unavoidable accident [ReadMore]

Governance in AI : What are the Labor and Regulation laws relating to automation and robots. [ReadMore]

Generative AI : Images and Videos now created by Algorithms (GANs) are virtually indistinguishable from real ones.This is leading to widespread fake news dissemination.Checkout this popular video where Barack Obama is speaking words he has never uttered in real life

Fast.ai has great set of blogs on AI Ethics here. Read more about governance of AI in an MIT course here. I would like to focus more on the problems and ethics of Generative AI.

Generative Adversarial Networks (GAN)

GAN is a branch of Machine Learning that allows us create images algorithmically. It is being used in myriad of innovative ways.

Create faces of people that never existed .This website creates a new human face every time you refresh the webpage.

Generate scenery from doodles. It’s like a colouring book picture that describes where a tree is, where the sun is, where the sky is.

Super resolution of images , i-e Create a high resolution image from a low resolution one.

Apart from the innovative applications that GANs enable, the negative usage is much more terrifying. One such trend is the usage of GANs to generate swap faces and generate fake videos.

 

DeepFakes is a term given to face-swapping techniques based on deep learning algorithms. DeepFakes have been used to create fake celebrity pornographic videos, fake news and malicious hoaxes.

Detecting fake faces in videos vi FaceForensics++

This blog post shows in greater detail one of the ways of ways to create face-swapping videos using GANs

  • Owing to the realistic results it is difficult to separate fake images from real ones by a human eye especially on a low/mobile resolution
  • Easy access to FakeApp, a desktop software to create face swapped videos has enabled ease of use with no technical knowledge

Spotting of such videos/images on a large scale is very difficult especially on a mobile screen

  • Fake artefacts in synthetic images are generally not visible to human eye
  • Artefacts similar to fake ones are introduced in pristine image on compression or resizing
  • Almost all videos/images on social media undergo some form of compression for efficient storage

A conventional way for forgery detection is called Multimedia forensics.It aims to ensure authenticity and origin of an image or video driven by handcrafted features that capture expected statistical or physics based artefacts that occur during image formation. Some if it is based on the fact that faces in DeepFake videos do not blink, which an expert forger can easily overcome [ReadMore]

There are still few visible artefacts in synthetically generated images which can be used to separate them from their pristine counterparts

  • Blurring over non facial areas
  • No Blinking of eyelids in fake videos
  • Change of texture along the edges of the face
  • Contour distortion in faces across frames

A simple methodology can be to create a supervised deep learning based classifier that classifies any input image into two categories – real or fake . Training any such network would require a huge amounts of training data with pairs of real and corresponding fake images. Since DeepFake is an umbrella term given to all deep learning based fake image creation algorithms, the types of algorithms keep on expanding and evolving which makes maintaining such a training dataset difficult.

FaceForensics++ is a research paper that implements a similar approach to detection of fake images.It uses XceptionNet as its base model to perform binary classification. The Xception architecture is a deep convolutional neural network architecture inspired by InceptionV3, where inception modules have been replaced with depth-wise separable convolutions.This leads to performance gains over InceptionNet keeping the number of parameters constant due to more efficient use of model parameters.

The paper divided image manipulation techniques into two major modes

  • Identity modification : These methods replace the face of a person with the face of another person. This is known as face swapping.[DeepFakes & Face-Swap]
  • Expression modification : It enables the transfer of facial expressions of one person to another person in real time [Face2Face]

A novel dataset of manipulated facial imagery composed of more than 1.5 million images from 1,000 videos with pristine (i.e., real) sources and target ground truth to enable supervised learning.Collects image from three major sources

  • Face2Face is a facial reenactment system that transfers the expressions of a source video to a target video while maintaining the identity of the target person.
  • FaceSwap is a conventional graphics-based approach to transfer the face region from a source video to a target video. The implementation is computationally light-weight and can be computed quite fast with a CPU only
  • DeepFakes is a synonym for face replacement that is based on deep learning. A face in a target sequence is replaced by a face that has been observed in a source video or image collection. There are various public implementations of DeepFakes available, most notably FakeApp.

To overcome the compression issues the paper trains 3 networks for videos at different compression rates

  • Raw: No Compression
  • High Quality Compression [c23]
  • Low Quality Compression [c40]

The resulting trained network takes a video stream as input , detects a face in each frame and classifies it into real or fake category.

I created a repository of 13 DeepFakes videos juxtaposed with their real videos from playlists on youtube. The trained XceptionNet classifies each frame with a detected face into two categories real or fake. The inferred results from the XceptionNet trained by the above paper

Confusion matrix for all three compression level models

The highest accuracy and f-score numbers are obtained using the c23 model. But the precision values (for the hypothesis that a frame is fake )are still low.The network basically predicts a lot the real frames as fake, which is undesirable as any real video passed through this network will also be tagged incorrectly. This is also visible in the gifs below.

Nicholas Cage on Amy Adams
Trump on Putin
John Oliver on Jimmy Fallon
Nicholas Cage on Russell Crowe

You can find all the inferred videos in the youtube playlist here

The ever evolving nature of DeepFake algorithms and the distortions introduced in pristine videos due to compression/resizing are few major reasons that make such classification approaches difficult.

Source spurious artefacts can be seen in pristine videos after hard compression

Since data aggregation of fake videos at scale is an exhaustive process, maybe researchers working on generative algorithms should publicly share a repository of fake data created by their algorithms.

This paper provides a great dataset for further research and is available publicly. It also sets a benchmark for face forensics using a data driven approach. It is already being used in transfer learning techniques for other domains.We can also process the frames across the time axis to verify the temporal smoothness. Since DeepFakes create faces independently across frames, we should expect the transition to be less smooth compared to a real video.There are a few more deep learning based papers especially ones that take temporal consistency into account for forensics, which i would like to investigate in upcoming posts.