Reinforcement Learning

Monday, September 5, 2016

Why is Deep RL exciting?

Every so often, a friend asks what is so exciting or different about what has (unfortunately) come to be called Deep RL (other than the obviously exciting new applications). Here are some points I made in my talk at the IJCAI 2016 Deep RL workshop.

1. Renewed focus on Learning Representations in addition to the usual focus on function approximation (this is a subtle but consequential difference).

More of us visualize / look at the representations learned and if we find them inadequate we redesign elements of the neural network.
Collectively we are building intuitions (and hopefully soon theory) about architecture and representations.
Remarkable increase in stability of the resulting function approximators (overcome some of the fear from earlier negative collective experiences and worst-case theoretical results on function approximaton and RL).

2. Renewed focus on (cognitive?) architecture

In addition to the standard RL candidates (e.g., actor-critic), there is exploration of many new architectures.
The use of recurrence, episodic memory, gating, etc.

3. Renewed interest in the most foundational of RL algorithms

(In a parallel to Supervised Learning) Turns out we may not need algorithms that were designed to exploit special structure in classes of domains. Witness the increased use of Q-learning, TD actor-critic, policy gradient, etc. Of course, algorithmic innovation continues.
This in turn makes it far easier for non-RL people to enter the field and do interesting and useful work; they don't necessarily have to catch up on a couple of decades of the more sophisticated RL algorithms.

4. RL for control of internal processing

Control of memory read/write as well other forms of "attention" are a really exciting application of RL towards the old AI goal of managing internal computations intelligently.

5. (We can) Borrow, exploit, and build upon really rapid progress in Deep Learning for Perception

This has been crucial for renewing and expanding interest in RL.
In particular, it has freed RL from the shackles of "illustrative empirical work". Witness successes in Visual RL but also increasingly Spoken/Textual RL.

6. (We can) Borrow, incorporate/exploit, and build upon use of Computer Science to deal with large volumes of high-dimensional data

CAFÉ, THEANO, TORCH, TENSOR FLOW, etc. (Hopefully we will get a lot of RL stuff added to these.)
Asynchronous, distributed, parallel computations, with clever use of memory and GPUs to scale Deep RL.

Sunday, September 4, 2016

Three Waves of AI

At a recent meeting, I heard a DARPA synthesis from John Launchbury, a worldview so to speak, of AI that (somewhat to my surprise) I found reasonably consistent with my own view of a useful way to partition AI work. Here is my version.

Handcrafted AIs. Expert systems fall into this category, but so do logic-based as well as probability-based planning, problem-solving, reasoning, and inference systems. The main inclusion criterion is that experts write down (much of) the knowledge and skills of such systems. In "expert systems", rules determine the behavior/skill. In planning systems, experts write down a suitably asbtracted world-simulator from which general purpose algorithms derive behavior (and so, for example, dynamic programming on MDPs falls into this category). Reasoning and inference using general purpose algorithms on manually crafted Graphical models also falls into this category. Over the decades, emphasis has shifted from logic to probabilities, and more recently to mixing logic and probabilities. A lot of great ongoing AI effort in both industrial applications as well as in academic research falls into this category. Arguably, at least in academia this wave has crested so that many of those invested in and creating these ideas are pushing them towards the next wave.

Statistical, Data-driven, AIs. Both Supervised learning (SL) and Reinforcement learning (RL) systems fall into this category. In the past decade much of the excitement in AI within industry and within academia has been in SL because of the increasing availability of large-scale labeled data and computation. Deep Learning (DL) has been a major and resurgent force in spreading these ideas through many industries. RL had largely been confined to academia but in the last few years has broken out into industry (thanks largely due to the success of DeepMind, cf., AlphaGO). What separates these AIs from Handcrafted AIs is that there is a significant component of learning from data (often this learning is statistical in nature). What separates these AIs from the next category is that they involve well-defined tasks (of classification, regression, or reward maximization). We have not yet reached the crest of this wave within AI in industry and academia.

Continual Learning* AIs. This new wave of AI is yet to take shape, though it has its genesis reaching back decades in the earliest goals of AI. These third-wave AIs will perform contextual adaptation and learning, continually adapting from moment to moment what they know from previous contexts to generate behavior, and continually learning new skills and knowledge from the current context and outcomes of their behavior. Rapid learning, even one-shot learning, in new contexts will be a hallmark of such AIs. In contrast to the second wave, there need not always be well defined tasks for such AIs and so at least in part their behavior will be motivated by intrinsic goals associated with the learning of new knowledge and skills from the experience they themselves generate. More broadly, Unsupervised learning (UL) will play a major role. Finally, such AIs will need more elaborate cognitive architectures than currently used in SL and RL for flexibly managing their growing set of knowledge and skills.

Now, of course, there is considerable work that crosses the boundaries laid out above. Indeed, the boundaries themselves are perhaps not all that sharply defined. Nonetheless, I believe this view helps.

*(For full disclosure, I am a co-founder of a "continual learning company", Cogitai, Inc.)

P.S. The name for the third wave is not quite settled among AI folks.

Sunday, June 24, 2012

A great talk on "Are Humans Just another Primate"

Every so often I have discussions with my AI colleagues about whether the best research approach to building human-level intelligence is to build machines that have the linguistic abilities of humans or perhaps the high-level problem-solving skills of humans or whether it is to builds machines that can navigate, physically manipulate their world, and deal with other machines at a basic social level. More simply stated, whether in the current state of AI the more effective research approach would be "building a human" or "building a dog or a chimp". Those on the side of "building a human" often point to the non-incremental qualitative differences between humans and non-humans, whilst those on the side of "building a chimp" think that there is mostly a difference of degree.

I came across this wonderfully delivered and informative talk by Robert Sapolsky that I recommend listening to (including the questions at the end). To whet your appetite, until recently it was thought that "a theory of mind" distinguishes humans from non-humans. Apparently it is not the case.

Friday, September 10, 2010

To "know everything"

Eric Schmidt, CEO of Google, describes the circumstance when "all the information in the world is available at our fingertips" (presumably through always networked mobile devices, etc.) as being a circumstance when "we can literally know everything".

It is interesting that common parlance is shifting to call looking something up on the internets as knowing that something. At one level this is innocuous in that the meaning of a word or phrase is changing. But at another level this is a huge shift in culture and society.

If you want to hear Eric say the statement above, here is the link (around 11 minutes into the video).

Monday, August 16, 2010

A Visual Representation of a Doctorate

An amusing representation of what a doctorate means.

Sunday, July 12, 2009

Data on Scientists and what they and the public think of them?

An interesting post from the Pew Research Center for the People & Press. A must read for academics and researchers. Lots of poll results and some commentary.

Sunday, March 22, 2009

Is University Science somehow more Pure than Industry Science?

In this Washington post article the writer (a former stem cell researcher at Harvard) argues that the view that University research/science is "curiosity-driven" is misplaced and that the incentive structure is broken. I agree!

Here are two relevant quotes from his argument that I find real.

"University researchers are in a constant battle for recognition and the rewards associated with success: research space, speaking engagements, funding and autonomy. Consequently, while academic research is often described as "curiosity-driven," the reality is messier, as (curiously) many researchers tend to pursue the trendiest technologies and explore topics that happen to be associated with the most generous levels of research support.

Moreover, since academic success is determined almost exclusively by the number and prestige of research publications, the incentives to generate results are exceedingly powerful and can encourage investigators to see patterns that may not exist, to disregard contradictory observations that might be important, to overvalue data that might be preliminary or unreliable, and to embrace conclusions that deserve to be viewed with far greater skepticism."