Monday, September 5, 2016

Why is Deep RL exciting?

Every so often, a friend asks what is so exciting or different about what has (unfortunately) come to be called Deep RL (other than the obviously exciting new applications). Here are some points I made in my talk at the IJCAI 2016 Deep RL workshop.

1. Renewed focus on Learning Representations in addition to the usual focus on function approximation (this is a subtle but consequential difference).
  • More of us visualize / look at the representations learned and if we find them inadequate we redesign elements of the neural network.
  • Collectively we are building intuitions (and hopefully soon theory) about architecture and representations.
  • Remarkable increase in stability of the resulting function approximators (overcome some of the fear from earlier negative collective experiences and worst-case theoretical results on function approximaton and RL).
2. Renewed focus on (cognitive?) architecture
  • In addition to the standard RL candidates (e.g., actor-critic), there is exploration of many new architectures.
  • The use of recurrence, episodic memory, gating, etc.
3. Renewed interest in the most foundational of RL algorithms
  • (In a parallel to Supervised Learning) Turns out we may not need algorithms that were designed to exploit special structure in classes of domains. Witness the increased use of Q-learning, TD actor-critic, policy gradient, etc. Of course, algorithmic innovation continues.
  • This in turn makes it far easier for non-RL people to enter the field and do interesting and useful work; they don't necessarily have to catch up on a couple of decades of the more sophisticated RL algorithms.
4. RL for control of internal processing
  • Control of memory read/write as well other forms of "attention" are a really exciting application of RL towards the old AI goal of managing internal computations intelligently.
5. (We can) Borrow, exploit, and build upon really rapid progress in Deep Learning for Perception
  • This has been crucial for renewing and expanding interest in RL.
  • In particular, it has freed RL from the shackles of "illustrative empirical work". Witness successes in Visual RL but also increasingly Spoken/Textual RL.
6. (We can) Borrow, incorporate/exploit, and build upon use of Computer Science to deal with large volumes of high-dimensional data
  • CAFÉ, THEANO, TORCH, TENSOR FLOW, etc. (Hopefully we will get a lot of RL stuff added to these.)
  • Asynchronous, distributed, parallel computations, with clever use of memory and GPUs to scale Deep RL.


Sunday, September 4, 2016

Three Waves of AI

At a recent meeting, I heard a DARPA synthesis from John Launchbury, a worldview so to speak, of AI that (somewhat to my surprise) I found reasonably consistent with my own view of a useful way to partition AI work. Here is my version.

Handcrafted AIs. Expert systems fall into this category, but so do logic-based as well as probability-based planning, problem-solving, reasoning, and inference systems. The main inclusion criterion is that experts write down (much of) the knowledge and skills of such systems. In "expert systems", rules determine the behavior/skill. In planning systems, experts write down a suitably asbtracted world-simulator from which general purpose algorithms derive behavior (and so, for example, dynamic programming on MDPs falls into this category). Reasoning and inference using general purpose algorithms on manually crafted Graphical models also falls into this category. Over the decades, emphasis has shifted from logic to probabilities, and more recently to mixing logic and probabilities. A lot of great ongoing AI effort in both industrial applications as well as in academic research falls into this category. Arguably, at least in academia this wave has crested so that many of those invested in and creating these ideas are pushing them towards the next wave.

Statistical, Data-driven, AIs.  Both Supervised learning (SL) and Reinforcement learning (RL) systems fall into this category. In the past decade much of the excitement in AI within industry and within academia has been in SL because of the increasing availability of large-scale labeled data and computation. Deep Learning (DL) has been a major and resurgent force in spreading these ideas through many industries. RL had largely been confined to academia but in the last few years has broken out into industry (thanks largely due to the success of DeepMind, cf., AlphaGO). What separates these AIs from Handcrafted AIs is that there is a significant component of learning from data (often this learning is statistical in nature). What separates these AIs from the next category is that they involve well-defined tasks (of classification, regression, or reward maximization).  We have not yet reached the crest of this wave within AI in industry and academia. 

Continual Learning* AIs.  This new wave of AI is yet to take shape, though it has its genesis reaching back decades in the earliest goals of AI. These third-wave AIs will perform contextual adaptation and learning, continually adapting from moment to moment what they know from previous contexts to generate behavior, and continually learning new skills and knowledge from the current context and outcomes of their behavior. Rapid learning, even one-shot learning, in new contexts will be a hallmark of such AIs. In contrast to the second wave, there need not always be well defined tasks for such AIs and so at least in part their behavior will be motivated by intrinsic goals associated with the learning of new knowledge and skills from the experience they themselves generate. More broadly, Unsupervised learning (UL) will play a major role. Finally, such AIs will need more elaborate cognitive architectures than currently used in SL and RL for flexibly managing their growing set of knowledge and skills.

Now, of course, there is considerable work that crosses the boundaries laid out above. Indeed, the boundaries themselves are perhaps not all that sharply defined. Nonetheless, I believe this view helps.

*(For full disclosure, I am a co-founder of a "continual learning company", Cogitai, Inc.)

P.S.  The name for the third wave is not quite settled among AI folks.