Reinforcement Learning: 2008

Monday, November 24, 2008

On Academic Freedom

A New York Times opinion writer, Stanley Fish, often writes about questions of interest to university professors. In this article, he writes about the notion of "academic freedom" and what it means and how it is quite different from individual First Amendment rights. He also refers to a new book on this topic. Much of his discussion makes sense to me. In particular, the notion that "... academic freedom, rather than being a philosophical or moral imperative, is a piece of policy that makes practical sense in the context of the specific task academics are charged to perform." He argues that academics ought to be protected from the dictates of public opinion but ought to be subject to professional standards and norms. I agree. And this is a distinction important for us academics to understand and implement in our conduct.

Tuesday, October 21, 2008

On Reviewing Reinforcement Learning Papers

*I had drafted this earlier this summer but never got around to finishing and posting this. Am doing so now*

I just finished reviewing for a conference and in looking over my reviews was reminded of the following comment by a senior colleague to me at ICML this year. He has been program chair for several conferences and said that he thought that RL reviewers were the hardest on their own community that he had ever seen. I thought back to my turns as senior PC or area chair at ML conferences and realized that I had felt the same way in those cases.

So, assuming you agree with the supposition, why is it the case? In looking back at years of reviewing, I think the reasons are:

1. There is a large subset of the RL community that is at best skeptical of papers that don't do "real applications". Certainly there is good reason to be hard on shoddy empirical work or claims. Nevertheless this subset goes too far perhaps?

2. Assuming that a reviewer is willing to accept simulated domains, in the absence of widely accepted benchmarks there is no agreement to a standard suite of problems and so different reviewers set their own standards of acceptable empirical test sets. Many reviewers reject "gridworld"-types of tasks for example.

3. There is also a healthy skepticism of theoretical results in a section of the RL community. And indeed, some theory while quite possibly true has little significance. But again, perhaps these papers are treated too harshly?

4. Perhaps the most important reason however is that a significant part of RL research is focused on issues other that strict performance. Arguably, the focus of machine learning on benchmarks and performance is misplaced for RL? This is to say that while some part of the RL community needs to focus on engineering and performance, others should be allowed and even encouraged to explore the more difficult to quantify AI issues.

Of course it is entirely possible that we are not producing enough good papers as a community and thus being harsh is the right thing. I don't believe this :)

Any comments?

(In a later post, I will make specific suggestions to address the issues above)

Tuesday, October 14, 2008

General Results versus Illuminating Examples

I recently came across this description of economists attributed by Paul Krugman to Robert Solow. "There are two kinds of economists: those who look for general results and those who look for illuminating examples." This dichotomy struck me as rather interesting and upon reflection, it was pretty clear to me that my own AI-research methodology is overwhelmingly towards the general results side. I am drawn to fairly basic and general questions and issues. Very rarely am I motivated by a specific example. Note that despite the superficial similarity between the dichotomy under discussion and the "theory versus empirical" dichotomy, they are quite unrelated. One can take a specific example, e.g., a specific human capability or indeed a specific human failing and then either build a theory of intelligence or do empirical work from that. Linguists do that. Cognitive scientists do that. Psychologists do that. Is AI research by its nature restricted to the general results side of the divide? Maybe this is the divide that separates AI from say cognitive science. The former seeks general results while the latter explains illuminating examples.

To all of the 3 people that might read this: Are there examples of AI research that stemmed from illuminating examples? What role do illuminating examples play in AI?

Monday, October 13, 2008

Krugman and Foundation

On a non-RL note: Paul Krugman, one of the writers whose nytimes blog I have enjoyed reading over the past few months (and incidentally the one who just got a Nobel Memorial Prize in Economics) credits the reading of the Foundation series of books by Isaac Asimov for sparking his interest in economics. His notion being that short of inventing the field of psychohistory, economics is the field currently available that comes closest to explaining and understanding and perhaps predicting the macro outcomes of the micro actions of billions of individuals. How true! And, how cool! Makes me want to learn more economics.

Sunday, June 1, 2008

On "What it means to be Human"

A far too brief report of the World Science Festival stellar panel
on What it means to be human.

(If someone finds a better report or video, please point us to it.)

An interesting conversation about "Conversation"

Conversation with Ian McEwan & Steven Pinker (audio only) [Warning: has mature content]

(or, why building conversational systems would be so hard)

Friday, May 30, 2008

Some RL Community Statistics

In response to Satinder's post about growth of the field, I did a little experiment. Google scholar allows for date searches. (Citeseer has similar information, but I couldn't figure out a way to work with it.) I searched for the phrase "reinforcement learning" and counted the number of hits in each year. Google scholar also creates a "key author" list for each search, so I included that information as well.

Before 1963, Google scholar lists 42 papers, but it seems nearly all of them are errors in detecting the year of publication (using the page number instead). After that, things muddle around in the single digits until the 80s when the UMass folks begin attracting attention to the problem. The learning automata and GA folks are pretty visible during this period.

In the 90s, the per-year paper count swells from 100 to 1000. Non-UMass pockets are clearly visible. Multiagent folks are out in force by the end of the decade. In the 2000s so far, we're seeing a spread to the students (and students of students) of the pioneers as well as paper counts up above 2000.

Plotting the paper counts, it doesn't look so much like an exponential as a quadratic with a weirdly high peak in 2005. In any case, Satinder's intuition that things have grown substantially appears valid. :-)

-----------------

1964 1 P Wasserman - F Silander - M Foundation
1965 6 M Waltz - K Fu
1966 5 K Fu - Z Nikolic
1967 1
1968 4 K Fu
1969 3 J Carlyle
1970 12 K Fu - J Mendel - E Davison - G SARIDIS - R McLaren
1971 5 J Albus
1972 2
1973 4 J Justice - J Shanks - J MENDEL - J ZAPALAC
1974 3 R Monopoli - J Mendel
1975 5 J Holland
1976 1 P Verschure - T Voegtlin - R Douglas
1977 0
1978 1
1979 4 G Saridis - C Van Rijsberg...
1980 2
1981 8 A Barto - R Sutton - P Brouwer - G Saridis - W Croft
1982 2 R Sutton - A Barto - D Reilly - R Williams - L Cooper
1983 4 P Schweitzer - D Lenat - S Hampson - D Kibler
1984 12 P Young - J De Kleer - J Brown - W Croft - R Thompson
1985 8 R Korf - N Cramer - J Gould - R Levinson - A Barto
1986 13 R Sutton - P Kumar - C Anderson - P Varaiya - A Barto
1987 16 D Bertsekas - D Goldberg - J Richardson - D Ackley - R Korf
1988 38 B Widrow - R Sutton - M Hoff - P Sahoo - S Soltani
1989 63 K Narendra - T Kohonen - D Goldberg - M Thathachar - C Anderson
1990 151 R Sutton - P Maes - P Werbos - V Gullapalli - A Benveniste
1991 198 R Sutton - S Whitehead - D Ballard - D Chapman - C Lin
1992 275 C Watkins - L Lin - P Dayan - S Mahadevan - R Williams
1993 348 L Kaelbling - A Moore - C Atkeson - N Lavrac - P Dayan
1994 467 M Littman - M Puterman - S Haykin - G Rummery - J Boyan
1995 530 D Bertsekas - S Russell - A Samuel - G Tesauro - P Norvig
1996 695 L Kaelbling - M Littman - D Bertsekas - A Moore - J Tsitsiklis
1997 803 M Tan - H Kitano - M Dorigo - M Asada - Y Kuniyoshi
1998 990 J Hu - C Claus - M Wellman - R Parr - C Boutilier
1999 1100 R Sutton - T Dietterich - S Singh - D Precup - J Rennie
2000 1190 S Singh - K Doya - R Sutton - M Littman - W Smart
2001 1290 M Littman - S Dzeroski - K Driessens - L Peshkin - P Stone
2002 1490 M Kearns - S Singh - K Doya - W Smart - B Hengst
2003 1700 A Barto - S Mahadevan - R Brafman - D WOLPERT - C Guestrin
2004 1770 A Ng - Y Shoham - R Powers - T Grenager - J Si
2005 2060 A Barto - S Singh - D Ernst - S Collins - M Bowling
2006 1830 S LaValle - P Stone - J Peters - S Whiteson - Y Liu
2007 2000 P Stone - S Mabu - M Taylor - J Peters - K Hirasawa
2008 347 J Drugowitsch - A Barry - H Tizhoosh - E Courses - Y Liu

Another amazing BMI robot arm

An online Wired article describes a Robot arm built by Dean Kamen (of the Segway fame). I think it is great that folks like Dean want to innovate in this space. Great for AI and robotics, not to mention all the folks this could end up helping. (BTW, does RL have a role to play in building brain machine interfaces?)

Interview of Dean by Walter Mossberg, and another video

Thursday, May 29, 2008

AI in Second Life (post 1)

I have had a casual interest in Second Life (SL) for a little while and periodically I am going to collect links to and briefly discuss AI work being done in SL. Here is the first such discussion.

This page has a brief description of work by RPI on building an AI avatar that can do rudimentary conversations in SL. The description characterizes the avatar as a "four year old" based on a theory of mind test.

Briefly a version of a theory of mind test goes as follows. The agent sees person A put an object inside box 1 and then leaves the room. The agent then sees person B come inside the room and take the object from box 1 and place it inside a different box 2. The agent is then asked which box will person A look for the object in when she comes back. If the agent has a theory of mind it will say that person A will look in box 1, else it will say that person A will look in box 2.

Here is a video of the RPI agent Edd failing

and here is a link to a video of the agent Edd succeeding after having learned.

The RPI researchers convert the English conversation into logical expressions and then use theorem proving as the reasoning engine to provide behavior. The authors state, “Our aim is not to construct a computational theory that explains and predicts actual human behavior, but rather to build artificial agents made more interesting and useful by their ability to ascribe mental states to other agents, reason about such states, and have — as avatars — states that are correlates to those experienced by humans.”

I could not find a paper by the authors but did find the slides of a talk they gave at the First Conference on Artificial General Intelligence, 2008.

My comments: The specific theory of mind experiment described above seems relatively straightforward and I have seen others do similar things (e.g., a project from Cynthia Breazeal's lab; someone point us to the precise paper if they know it). I would venture that the natural language processing in the RPI project is rudimentary at this point and works only in fairly narrow scripted settings. In summary, it seems to me that the authors used their previous general purpose logic-based question-answering work to build a specific narrow system that can basically do this one task. Also, it isn't clear what SL added to this experiment since the conversation is not with an arbitrary human player but rather with the experiment designers. At the same time, the overall goal of ascribing mental states to other agents and to reason about such states seems laudatory, at least at first glance.

(I need to get Charles Isbell to comment on this.)

Suggest Captions :)

Matthew Taylor just sent me this funny picture of me at the recent Barbados workshop on RL along with his caption which was "Satinder is ready to wring the last bit of performance out of some poor PSR algorithm. "

How would you caption it? :)

Wednesday, May 28, 2008

What a totally crazy wacky fruit!

Ok, this has nothing to do with RL or AI. But what a fruit! Someone should bring a bag to the next conference for fun.

To quote the nytimes article:
"The miracle fruit, Synsepalum dulcificum, is native to West Africa and has been known to Westerners since the 18th century. The cause of the reaction is a protein called miraculin, which binds with the taste buds and acts as a sweetness inducer when it comes in contact with acids,..."
(nytimes video)

Apparently makes Tabasco sauce taste like doughnut glaze and Guinness taste like a chocolate shake. The effect lasts for an hour. I have got to try this!

Monkeying around with robots

This is a cool project and brain-machine interfaces like this will be a great application of machine learning that will help society. This stuff is exciting!

Tuesday, May 27, 2008

Measuring Reinforcement Learning

When I started my PhD in Andy Barto's lab in 1988, there were perhaps a handful of folks doing research in the field of modern RL. There was an outpost of ex-students from Andy's lab at GTE Laboratories including folks like Rich Sutton, Chuck Anderson and Judy Franklin. By 1990 or so, there were a few others including Leslie Kaelbling and Peter Dayan. It was a pretty lonely field back then.

But how big is the field of RL now? If I had to guess I would say that there are several hundreds of researchers around the world who would self-identify themselves as being RL researchers.

So, the objective of this post is to solicit ideas and more importantly effort in measuring the size of the field of RL. Here are some ideas of how to collect some data. Any volunteers?

Collect a list of conferences that publish substantial number of RL papers. These include (in no particular order) ICML, NIPS, AAAI, UAI, COLT, IJCAI, AAMAS, and ECML. What other major venues am I missing?

Establish some simple and noisy methodology for determining when a paper is an RL paper.

If the paper publishes keywords then look for phrases from some list that pretty clearly indicates an RL paper, e.g., reinforcement learning, Q-learning, TD, temporal differences (What others am I missing?). I think it will introduce too much noise to include MDPs and POMDPs.

Look for the same list of keywords identified above in the title and abstract.

Do we need to do more sophisticated things?

Gather the data for each conference separately by year. An interesting use of this data will be to just get a sense of the publication rate of RL papers in the different conferences. If I could get this data somehow, I would happily create graphs and put them up on this blog. But to serve the main purpose of this post, one would just create and count a list of the unique authors of such papers.

Does anyone have scripts that could easily do this? Not sure this is worth a lot of effort but it sure would be fun to have this data.

A beginning

So I am starting a blog in which I (and others to be invited) will initiate conversations with those who wish to comment (or just read) on things pertaining to RL and more generally AI. The intention is to keep this related to research and only occasionally stray into other areas of life.

At this point I have no rules for posts or comments other than to keep things friendly and civil. This may be modified in the future.

Reinforcement Learning