New Advancement from Machine Learning has developed DeepFakes, Videos Approaching Real Life Quality

Oct.02.2018

Author　：Justin Brunnette

Category： IT News

Falsified information has become very publicized in the American media as of recently though the weaponization of information is not a particularly new issue. Even outside of the realm of politics, it is a great concern for individuals to be able to evaluate information and apply critical thinking to what is valid and invalid information. This is no doubt going to become more difficult as AI is being applied to synthesize media.

We have seen some innovation as Nvidia at the end of last year presented their results of AI image rendering by changing images to have different weather, change day to night or making a scene to include snow from summer photographs; results that appear nearly indistinguishable from real photos. Now researchers at Carnigie Mellon University have developed a new method called generative adversarial networks or GAN to synthesize movement to images to a convincing degree.

The researchers have employed a class of algorithms called GAN, which are comprised of two models, a “discriminator” model and a “generator”. Much like the other machine learning models that we see out in the market, the “generator” model will make images while the “discriminator” will in a sense test their results.

The discriminator model will be taught, lets say, how President Obama’s speech patterns very closely, learning the small details like how his head shifts after certain words, pace of his speech after specific words or how his hand gestures are used during the pace of a conversation. The generator model will then learn the style of images that will trick the discriminator model. The discriminator model will test the images, and scores the effectiveness of the generator model.

The GAN model is used to translate one image from another. The researchers had for example, rendered footage of President Obama speaking from footage of President Trump. The advantage of the GAN models is that is is able to predict the next trajectories or movements of an image from only a single image. The GAN model is also an unsupervised model meaning that there is no manual alignment of the algorithm but rather a self correcting model in which it can quickly improve the results and learn new processes on its own.

The result is that the researchers can make a footage of people say anything they like. Some possibilities are like the footage of President Obama saying words recorded by comedian Jordan Peele seen in the youtube link below:

https://youtu.be/cQ54GDm1eL0

This model has much potential in the realm of movie production such as replacing distinguishably unrealistic CGI with a more realistic rendering from the GAN model. But since the algorithm is only going to improve from here, it is very well in the realm of possibilities that more and more video footage is able to be falsified by this type of model. In the era of “Fake News”, with the rise in information manipulation, it is always better to have more awareness of the abilities of technology.

Original Article: https://arxiv.org/pdf/1808.05174.pdf