Deep-reinforcement learning aims to make robots learn faster than ever

Newsroom24x7 Desk

ronotsSan Francisco : A new artificial intelligence startup called Osaro aims to give industrial robots super speed turbocharge that would enable the machines to buck up on their learning speeds and make it faster than ever. All this would be possible due to claims made by Osaro, with $3.3 million in investments from the likes of Peter Thiel and Jerry Yang, which states that it has perfected the algorithm for taking deep-reinforcement learning to the next level, delivering the same superhuman AI performance but over 100 times as fast.

Deep-reinforcement learning is an offshoot of the technique more familiar to us, as deep learning — a method of using multiple layers of neural networks to efficiently process and organize mountains of raw data. Deep learning now underlies many of the best facial recognition, video classification, and text or speech recognition systems from Google, Microsoft, and IBM Watson. Deep-reinforcement learning adds control to the mix, using deep learning’s ability to accurately classify inputs. These learning systems train themselves automatically by repeating a task over and over again until they reach their goal. The power of deep reinforcement lies in the premise that the system would be able to discover behaviors that a human would not have guessed or thought to hand code.

Training a new AI system from a blank slate, however, could well take a long time. Osaro says a robot is a physically embodied system that takes time to move through space, and if one wants to use basic deep-reinforcement learning to teach a robot to pick up a cup from scratch, it would literally take a year or more. To accelerate that training process, Osaro has based its data collection technique from human mechanics — so as to say, from the way people learn most activities — by watching other people. Its program functions on this module — it starts by observing a human play several games; it then uses those behaviors as a jumping-off point for its own training efforts. It doesn’t copy a human and one doesn’t has to play precisely or very well. One just has to give it a reasonable idea of what to do.

Osaro’s first application for its deep-reinforcement learning technology is likely to be high-volume manufacturing, where reprogramming assembly line robots can currently take weeks of effort from highly skilled (and highly paid) professionals. In the future, the plan is to be able to give a robot three buckets of parts, show it a finished product, and simply say, ‘Make something like this.’ However, that is a far cry presently, yet the team is moving in the direction where presently its next step is to run simulated robotic demos in a virtual environment called Gazebo before launching with industrial robot manufacturers and their customers in 2017.