Friday, May 30, 2008

Some RL Community Statistics

In response to Satinder's post about growth of the field, I did a little experiment. Google scholar allows for date searches. (Citeseer has similar information, but I couldn't figure out a way to work with it.) I searched for the phrase "reinforcement learning" and counted the number of hits in each year. Google scholar also creates a "key author" list for each search, so I included that information as well.

Before 1963, Google scholar lists 42 papers, but it seems nearly all of them are errors in detecting the year of publication (using the page number instead). After that, things muddle around in the single digits until the 80s when the UMass folks begin attracting attention to the problem. The learning automata and GA folks are pretty visible during this period.

In the 90s, the per-year paper count swells from 100 to 1000. Non-UMass pockets are clearly visible. Multiagent folks are out in force by the end of the decade. In the 2000s so far, we're seeing a spread to the students (and students of students) of the pioneers as well as paper counts up above 2000.

Plotting the paper counts, it doesn't look so much like an exponential as a quadratic with a weirdly high peak in 2005. In any case, Satinder's intuition that things have grown substantially appears valid. :-)


1964 1 P Wasserman - F Silander - M Foundation
1965 6 M Waltz - K Fu
1966 5 K Fu - Z Nikolic
1967 1
1968 4 K Fu
1969 3 J Carlyle
1970 12 K Fu - J Mendel - E Davison - G SARIDIS - R McLaren
1971 5 J Albus
1972 2
1973 4 J Justice - J Shanks - J MENDEL - J ZAPALAC
1974 3 R Monopoli - J Mendel
1975 5 J Holland
1976 1 P Verschure - T Voegtlin - R Douglas
1977 0
1978 1
1979 4 G Saridis - C Van Rijsberg...
1980 2
1981 8 A Barto - R Sutton - P Brouwer - G Saridis - W Croft
1982 2 R Sutton - A Barto - D Reilly - R Williams - L Cooper
1983 4 P Schweitzer - D Lenat - S Hampson - D Kibler
1984 12 P Young - J De Kleer - J Brown - W Croft - R Thompson
1985 8 R Korf - N Cramer - J Gould - R Levinson - A Barto
1986 13 R Sutton - P Kumar - C Anderson - P Varaiya - A Barto
1987 16 D Bertsekas - D Goldberg - J Richardson - D Ackley - R Korf
1988 38 B Widrow - R Sutton - M Hoff - P Sahoo - S Soltani
1989 63 K Narendra - T Kohonen - D Goldberg - M Thathachar - C Anderson
1990 151 R Sutton - P Maes - P Werbos - V Gullapalli - A Benveniste
1991 198 R Sutton - S Whitehead - D Ballard - D Chapman - C Lin
1992 275 C Watkins - L Lin - P Dayan - S Mahadevan - R Williams
1993 348 L Kaelbling - A Moore - C Atkeson - N Lavrac - P Dayan
1994 467 M Littman - M Puterman - S Haykin - G Rummery - J Boyan
1995 530 D Bertsekas - S Russell - A Samuel - G Tesauro - P Norvig
1996 695 L Kaelbling - M Littman - D Bertsekas - A Moore - J Tsitsiklis
1997 803 M Tan - H Kitano - M Dorigo - M Asada - Y Kuniyoshi
1998 990 J Hu - C Claus - M Wellman - R Parr - C Boutilier
1999 1100 R Sutton - T Dietterich - S Singh - D Precup - J Rennie
2000 1190 S Singh - K Doya - R Sutton - M Littman - W Smart
2001 1290 M Littman - S Dzeroski - K Driessens - L Peshkin - P Stone
2002 1490 M Kearns - S Singh - K Doya - W Smart - B Hengst
2003 1700 A Barto - S Mahadevan - R Brafman - D WOLPERT - C Guestrin
2004 1770 A Ng - Y Shoham - R Powers - T Grenager - J Si
2005 2060 A Barto - S Singh - D Ernst - S Collins - M Bowling
2006 1830 S LaValle - P Stone - J Peters - S Whiteson - Y Liu
2007 2000 P Stone - S Mabu - M Taylor - J Peters - K Hirasawa
2008 347 J Drugowitsch - A Barry - H Tizhoosh - E Courses - Y Liu

Another amazing BMI robot arm

An online Wired article describes a Robot arm built by Dean Kamen (of the Segway fame). I think it is great that folks like Dean want to innovate in this space. Great for AI and robotics, not to mention all the folks this could end up helping. (BTW, does RL have a role to play in building brain machine interfaces?)

Interview of Dean by Walter Mossberg, and another video

Thursday, May 29, 2008

AI in Second Life (post 1)

I have had a casual interest in Second Life (SL) for a little while and periodically I am going to collect links to and briefly discuss AI work being done in SL. Here is the first such discussion.

This page has a brief description of work by RPI on building an AI avatar that can do rudimentary conversations in SL. The description characterizes the avatar as a "four year old" based on a theory of mind test.

Briefly a version of a theory of mind test goes as follows. The agent sees person A put an object inside box 1 and then leaves the room. The agent then sees person B come inside the room and take the object from box 1 and place it inside a different box 2. The agent is then asked which box will person A look for the object in when she comes back. If the agent has a theory of mind it will say that person A will look in box 1, else it will say that person A will look in box 2.

Here is a video of the RPI agent Edd failing

and here is a link to a video of the agent Edd succeeding after having learned.

The RPI researchers convert the English conversation into logical expressions and then use theorem proving as the reasoning engine to provide behavior. The authors state, “Our aim is not to construct a computational theory that explains and predicts actual human behavior, but rather to build artificial agents made more interesting and useful by their ability to ascribe mental states to other agents, reason about such states, and have — as avatars — states that are correlates to those experienced by humans.”

I could not find a paper by the authors but did find the slides of a talk they gave at the First Conference on Artificial General Intelligence, 2008.

My comments: The specific theory of mind experiment described above seems relatively straightforward and I have seen others do similar things (e.g., a project from Cynthia Breazeal's lab; someone point us to the precise paper if they know it). I would venture that the natural language processing in the RPI project is rudimentary at this point and works only in fairly narrow scripted settings. In summary, it seems to me that the authors used their previous general purpose logic-based question-answering work to build a specific narrow system that can basically do this one task. Also, it isn't clear what SL added to this experiment since the conversation is not with an arbitrary human player but rather with the experiment designers. At the same time, the overall goal of ascribing mental states to other agents and to reason about such states seems laudatory, at least at first glance.

(I need to get Charles Isbell to comment on this.)

Suggest Captions :)

Matthew Taylor just sent me this funny picture of me at the recent Barbados workshop on RL along with his caption which was "Satinder is ready to wring the last bit of performance out of some poor PSR algorithm. "

How would you caption it? :)

Wednesday, May 28, 2008

What a totally crazy wacky fruit!

Ok, this has nothing to do with RL or AI. But what a fruit! Someone should bring a bag to the next conference for fun.

To quote the nytimes article:
"The miracle fruit, Synsepalum dulcificum, is native to West Africa and has been known to Westerners since the 18th century. The cause of the reaction is a protein called miraculin, which binds with the taste buds and acts as a sweetness inducer when it comes in contact with acids,..."
(nytimes video)

Apparently makes Tabasco sauce taste like doughnut glaze and Guinness taste like a chocolate shake. The effect lasts for an hour. I have got to try this!

Monkeying around with robots

This is a cool project and brain-machine interfaces like this will be a great application of machine learning that will help society. This stuff is exciting!

Tuesday, May 27, 2008

Measuring Reinforcement Learning

When I started my PhD in Andy Barto's lab in 1988, there were perhaps a handful of folks doing research in the field of modern RL. There was an outpost of ex-students from Andy's lab at GTE Laboratories including folks like Rich Sutton, Chuck Anderson and Judy Franklin. By 1990 or so, there were a few others including Leslie Kaelbling and Peter Dayan. It was a pretty lonely field back then.

But how big is the field of RL now? If I had to guess I would say that there are several hundreds of researchers around the world who would self-identify themselves as being RL researchers.

So, the objective of this post is to solicit ideas and more importantly effort in measuring the size of the field of RL. Here are some ideas of how to collect some data. Any volunteers?
  1. Collect a list of conferences that publish substantial number of RL papers. These include (in no particular order) ICML, NIPS, AAAI, UAI, COLT, IJCAI, AAMAS, and ECML. What other major venues am I missing?

  2. Establish some simple and noisy methodology for determining when a paper is an RL paper.

    • If the paper publishes keywords then look for phrases from some list that pretty clearly indicates an RL paper, e.g., reinforcement learning, Q-learning, TD, temporal differences (What others am I missing?). I think it will introduce too much noise to include MDPs and POMDPs.

    • Look for the same list of keywords identified above in the title and abstract.

    • Do we need to do more sophisticated things?

  3. Gather the data for each conference separately by year. An interesting use of this data will be to just get a sense of the publication rate of RL papers in the different conferences. If I could get this data somehow, I would happily create graphs and put them up on this blog. But to serve the main purpose of this post, one would just create and count a list of the unique authors of such papers.
Does anyone have scripts that could easily do this? Not sure this is worth a lot of effort but it sure would be fun to have this data.

A beginning

So I am starting a blog in which I (and others to be invited) will initiate conversations with those who wish to comment (or just read) on things pertaining to RL and more generally AI. The intention is to keep this related to research and only occasionally stray into other areas of life.

At this point I have no rules for posts or comments other than to keep things friendly and civil. This may be modified in the future.