Is Reinforcement Learning a Slow Learner?

A prominent AI researcher recently gave a webinar with the ACM (Association of Computing Machinery) expressing dismay at the current performance of AI systems and giving his thoughts on the directions research should take. Yann LeCun, Chief AI Scientist at Facebook and Professor at NYU, gave a talk titled “The Power and Limits of Deep Learning”. Current AI systems have no ability to model the real-world. For example, babies learn quite early that a truck that drives off a platform and hovers in the air is unexpected. Current AI systems do not have this ability – they might, after many, many training examples, might be able predict this type of behavior for very specific vehicles. “Sure, I know a red fire truck will fall down, but I have no idea what this Prius is going to do. Let’s watch…” This same type of thing happens in the simpler task of image recognition. A human can get the idea of an elephant from a few images, but our most sophisticated image recognition systems need many thousands of training examples to recognize a new object. And even then, it will have difficulty in recognizing a different view (Elephant rear-end?, Elephant with trunk hidden behind a wall?) if it has not specifically been trained with those types of views.

Similarly, Reinforcement Learning, a technique used to train AI systems to do things like play video games at (or above) human levels, is a slow learner. It takes 83 hours of real-time play for the RL systems to achieve a level a human player can achieve in 15 minutes.

Two basic algorithms used in AI are (1) supervised learning and (2) unsupervised learning. Supervised learning is an algorithm trained by showing it an image (or training example) along with the desired response. “Hello computer. This image is a car. This next image is a bird.” This goes on for millions of images (The ImageNet dataset, used in a lot of benchmark tests, has over 15 million images, and often a subset of over 1 million images is used for training). The training is also repeated over that same set many times (many “epochs”). On the other hand, unsupervised learning tries to make sense of the data without any human “supervised” advice. An example is a clustering algorithm that tries to group items into clusters, or groups, so that items within each group are similar to each other in some way.

Prof. LeCun’s suggestion is that unsupervised learning, or what he calls self-supervised learning, might provide a better approach. He said “Prediction is the essence of intelligence.” We will see whether computers will be able to generate predictions from just a few examples.


  1. Karen Hao, Technology Review, The AI technique that could imbue machines with the ability to reason
  2. Yann LeCun, The Power and Limits of Deep Learning

Leave a Reply

Your email address will not be published.