DeepMind’s AI Can Recreate Entire Landscapes From Single Pictures

Jun.18.2018

Author :Justin Brunnette

Category: IT News

DeepMind’s AI Can Recreate Entire Landscapes From Single Pictures

The breakthroughs in image rendering recently has been extraordinary. Just last year, Nvidia’s neural network research was able to generate photorealistic manipulations such as adding snow or rain to photographs.  Although neural networks are still a very limited form of AI, it has proven to develop effective predictive capabilities. Last week Google’s subsidiary, DeepMind, has published that their neural networks are able to create various viewpoints of a scene just from a single image.
 
When fed an image of a single vantage point that contains objects of various sizes and shading, their AI can spit out a full 3D rendering making predictions and estimations on where related objects should stand in the picture. In previous iterations of this research, neural networks were needed pictures of scenes to have labels for objects or various depths. These required millions of images usually each hand labeled by human programmers. But DeepMind has able to circumvent this method and has trained their neural network in a method called the Generative Query Network (GNQ).
 
The GQN model is made up of two networks; a representation network and a generation network. The representation network is fed various perspectives of a scene and the generative network than predicts the scene from a queried perspective. As the representation network accumulates more and more images of a scene, it produces all information such as object identities, positions, colors and area layout for the generative network.
 
GQN model is essentially learning by itself how to extract this information from the pixels it receives. With this collection of data, it notes any statistical patterns such as typical colors of the sky or symmetries of certain objects. By allowing for statistical pattern recognition, it opens some of it capacity for more abstract details of a scene.
 
DeepMind’s researcher describes, “Much like infants and animals, the GQN learns by trying to make sense of its observations of the world around it.” The GQN has is essentially able to ‘imagine’ new scenes that have not been observed and able to predict what objects would look like from the other side. The understanding of spatial relationships allow it to control a virtual robot arm and move a ball around and even self-correct.
 
There are still limitations to this model as it has only been trained in computer generated scenery. DeepMind hopes to be able to expand upon this technique to real life scenes or photographs. This sort of scene understanding may contribute much to physics research in space and time querying as well as in virtual and augmented reality development.  
 
For those who are curious, the datasets used in their experiments are publically available via Github: https://github.com/deepmind/gqn-datasets

Original Article: https://deepmind.com/blog/neural-scene-representation-and-rendering/