Reinforcement Learning will Supercharge Machine Learning

April 2018

Lately there has been a stream of news describing algorithms that learn on their own and achieve superhuman capabilities in finding optimal solutions to various problems. The application that made the biggest splash in the AI field is a computer algorithm that managed to become the best GO player in the world, defeating world champions and even starring in a Netflix documentary.

The technique behind these self-learning algorithms is called Reinforcement Learning and while it is not a new concept, recent progress in Reinforcement Learning research has yielded promising results that have been applied to problems ranging from gaming, through cybersecurity and even making an impact on the design of energy efficient systems.


What is Reinforcement Learning?

Before describing some of those new applications, it is worth spending a few minutes and explain what Reinforcement Learning is all about. At its core, Reinforcement Learning (RL) is a technique in which a system learns an optimal sequence of decisions that help it reach its single long-term objective. By connecting an action to an outcome, the RL technique uses a process of trial-and-error and evaluates each action based on its outcome. Imagine a robot making its first step in a maze, the robot is rewarded for every step it makes towards the maze’s exit and it is being penalized when it is blocked by the maze’s walls. Eventually the robot will not only find its way out of the maze, but will also find the most rewarding path (whether we defined the most rewarding path to be the shortest path, the one with least turns or the path that covers the most ground is completely up to us – the designers of the algorithm).


Reinforcement Learning AI


Reinforcement Learning Could Help Forge the Mona Lisa

In March 2018, researchers at DeepMind – the London AI research company, published a paper presenting SPIRAL, a software agent that “learns” how to generate meaningful images by leveraging RL. SPIRAL was asked to perform a certain task – interact with a drawing software to copy an image and make it as identical as possible to an original. Programmed to be incentivized by rewards, SPIRAL tried different possible options until it managed to maximize its reward by getting closer to its objective – fool another AI algorithm that specializes in discerning originals from replicas. In the art world this process could be compared to an artist trying to recreate an exact copy of da Vinci’s Mona Lisa while a world-renowned expert takes a close look at each painting, looks for the master’s signature brush strokes and decides whether this painting is indeed the original masterpiece or just another replica. As the expert becomes less and less sure about which of the paintings is an original and which one is a replica, the artist is getting more rewards and generates better, more accurate copies of the original piece. SPIRAL achieved impressive results by performing on-par or better than previous image generation by deep learning agents. Since SPIRAL leverages Reinforcement Learning, it is completely self-taught, meaning that no human intervention and no labeled training data are required in guiding the agent’s learning process.


Why this is important

As recent applications of RL show, the technique is best applied to tasks in which an intelligent agent uses sensory information (think IoT) to learn more about its environment (as described in the robot-in-the-maze example earlier) and learn the best behavior (win a GO game, navigate a maze or create exact replicas).

Yet, as previously pointed out, enterprises should not just marvel at new technologies as they are being developed but always seek out the most practical uses that bring value to their organization.

The process of designing an optimal machine learning algorithm is laborious requiring validation and many trial-and-error iterations. Furthermore, a great deal of intuition and intimate knowledge of the algorithm’s architecture is often required.  As recent studies from Berkely and  MIT show that RL can be used to speed up some of the  crucial steps of a machine learning algorithm development by automating their design. A Google Brain publication has already demonstrated that a better algorithm can be automatically generated using RL based approaches. By leveraging a reward scheme, the RL algorithm learns how to select the best performing machine learning algorithm, design an optimal architecture and find the best parameter settings for it.


Wrapping up

A few years ago, the availability of deep learning algorithms combined with powerful processing, helped automate the way data scientists find valuable characteristics (or features) to build their machine learning models. This new capability – automatic feature engineering – has helped boost the process of machine learning model development. We anticipate that the incorporation of RL into the toolkit of data scientists and leveraging it in machine learning algorithm design will super charge the AI field across all industries yet again.

Share this post