Artificial Intelligence (AI) is getting more and more advanced and humanlike every day, which makes it both creepy and fantastic at the same time. But could you imagine robots being actually competent enough to learn from their own mistakes? OpenAI researchers have developed the algorithm which achieves this goal.
OpenAI is a non-profit AI focusing organization sponsored by various companies and individuals. Their aim is to make safe artificial general intelligence (AGI) as common as possible.
The new algorithm called Hindsight Experience Replay (HER) makes robots a little more humanlike, giving them the ability to learn from their failures. The key to success is that every step that helps get closer to the goal is considered as a small success, even if it is technically a failure. Unlike humans, robots can’t learn from their mistakes as the only thing that matters for them is achieving their programmed goal. Nevertheless, OpenAI has made it to the next generation of AI.
To cut a long story short, HER helps the agent learn and train itself, just like we do in our everyday life. The company has revealed this cutting-edge technology in an open-source software package with eight robotic environments for Gym (a toolkit created for RL purposes): (Fetch and ShadowHand).
“HER allows an agent to learn from failures.”
The researchers pointed out that the main idea portrayed by HER is just the same thing which humans do regularly: we learn from failure as well as from success.
Imagine you are playing a football game. You are meant to score a goal but instead you miss, and the ball goes just some centimetres away from its meant destination. Yes, you fail, but you also learn something from that. Next time you kick the ball you already know what force/ direction you should take advantage of to finally score a goal. Learning from mistakes is an essential tool in building skills. By achieving small, secondary goals, we step closer to the initial one. Then why not pretend that this was planned?
“This helps us learn how to achieve all possible goals.”
The reinforcement learning algorithm gives the machine the new reward function. Instead of being just 0 and 1 (failure and success accordingly), it can be far more complex and advanced. Sparse binary rewards make the agent achieve several goals through failures before the primary one.
Though it is quite obvious that Hindsight Experience Replay is a much better, renowned technology, there is still a significant difference in the performance of two types of rewards.
The line graph above indicates that DDPG + HER with sparse rewards surpasses other forms of DDPG while DDPG + HER with dense rewards achieves worse performance.
Having said that, it becomes evident that HER is a completely new way of Q-Learning – instead of having bare “positive” and “negative” functions, it rather has “non-negative” because the agent has achieved at least some goal.
OpenAI community reports that despite the success they have already achieved, they are planning to go on with their creation. In their opinion, the new approach still has a lot of space for improvement. Apart from other suggestions, they’ve pointed out the following:
- Combining HER and HRL (Hierarchical Reinforcement Learning)
- Making value functions richer
- Combining HER and other RL
Of course, developing reward functions for RL is not that easy, there are plenty of hidden obstacles you should remember about while crafting. Regardless of that, OpenAI developers highly encourage you to test their new algorithm and make any possible suggestions and feedbacks. HER has made a significant progress in improving artificial intelligence, though further research is yet to be done.