Every so often, a friend asks what is so exciting or different about what has (unfortunately) come to be called Deep RL (other than the obviously exciting new applications). Here are some points I made in my talk at the IJCAI 2016 Deep RL workshop.
1. Renewed focus on Learning Representations in addition to the usual focus on function approximation (this is a subtle but consequential difference).
1. Renewed focus on Learning Representations in addition to the usual focus on function approximation (this is a subtle but consequential difference).
- More of us visualize / look at the representations learned and if we find them inadequate we redesign elements of the neural network.
- Collectively we are building intuitions (and hopefully soon theory) about architecture and representations.
- Remarkable increase in stability of the resulting function approximators (overcome some of the fear from earlier negative collective experiences and worst-case theoretical results on function approximaton and RL).
- In addition to the standard RL candidates (e.g., actor-critic), there is exploration of many new architectures.
- The use of recurrence, episodic memory, gating, etc.
- (In a parallel to Supervised Learning) Turns out we may not need algorithms that were designed to exploit special structure in classes of domains. Witness the increased use of Q-learning, TD actor-critic, policy gradient, etc. Of course, algorithmic innovation continues.
- This in turn makes it far easier for non-RL people to enter the field and do interesting and useful work; they don't necessarily have to catch up on a couple of decades of the more sophisticated RL algorithms.
- Control of memory read/write as well other forms of "attention" are a really exciting application of RL towards the old AI goal of managing internal computations intelligently.
5. (We can) Borrow, exploit, and build upon really rapid progress in Deep Learning for Perception
- This has been crucial for renewing and expanding interest in RL.
- In particular, it has freed RL from the shackles of "illustrative empirical work". Witness successes in Visual RL but also increasingly Spoken/Textual RL.
- CAFÉ, THEANO, TORCH, TENSOR FLOW, etc. (Hopefully we will get a lot of RL stuff added to these.)
- Asynchronous, distributed, parallel computations, with clever use of memory and GPUs to scale Deep RL.