Mailbag: IF for Reinforcement Learning

Hi Emily

I’m a PhD student working with Prof. Mark Riedl at Georgia Tech and Microsoft Research Redmond. I am currently working on making AI agents (specifically using reinforcement learning) that play interactive fiction games (text-adventure games in the vein of Zork) in a non-game specific, generalizable way.

I was advised by Prof. Janet Murray that you would be the right person to help answer a question I had regarding these games, given your expertise in interactive fiction. If you have a list of such games (e.g. those given here https://github.com/microsoft/jericho#supported-games), is it possible to identify a subset of maybe ~10-15 of them that reasonably cover a majority of all interactive fiction games in terms of game structure, i.e. linearity of progression/score accumulation from the perspective of a learning agent? If it is possible, what would this set look like? Any insight at all would be great.

Nice to hear from you — I’ve been keeping an eye on this space as people have been publishing about it recently.

I’m not sure there’s a perfect answer to this, since IF is hugely varied in how it handles world model, score, pacing, etc. Also, your list here skews very much towards early interactive fiction, which means it doesn’t cover some of the formal experiments that came along later.

I also don’t remember how score works in all the games in this list — some of them I’ve not played, or played a long time ago.

However, with that in mind, here are a couple of categories that represent some fairly standard game structures:

Short or medium game in which score is given out rarely — Lost Pig (max 7)

Short or medium game in which score is given more frequently — Meteor etc. (max 30), Balances (max 51)

Long game in which score is distributed fairly frequently throughout — Adventure, Zork; possibly Enchanter and Sorcerer also; Anchorhead, as I recall

And from your list, I recall these being ones that might pose an interesting challenge:

Curses — it’s long, it’s complicated, it does have a scoring system which it doles out gradually, and it also does a trick (if I’m remembering right) where it actually at one point deducts score from the player again. 

Wishbringer — this one’s interesting because there’s a scoring system that reacts to how many times you’ve used the magic stone in the game — so the more you use wishes, the easier the game becomes, but the lower your final score.

Hunter, in Darkness — doesn’t keep score. There’s also a procedurally generated maze in this, which I would expect to make it very challenging indeed.

Thinking about games not on your list, here are some other formal extremes that might be interesting to try to reason about; all of these can be found on https://ifdb.tads.org/ and in most cases they’re available for download.

ASCII and the Argonauts — an intentionally short and simple game that gives a bunch of +1 rewards for doing basic tasks; the relatively small verb set might make it easier than some of the other games.

Aisle — a game that takes one move to play, and for which many different verbs are available; there’s also no score. It’s hard to imagine how one would use reinforcement learning on this, but it represents one extreme that might be valuable for purposes of thought experiment.

Adventurer’s Consumer Guide — as I recall this one gives out a pretty steady stream of +1 point rewards, rather than only a few or only rarer rewards, so it might be a nice counterpoint to some of the others.

Savoir-Faire — a game of mine, and I suggest it just because I happen to know it well enough to know how the rewards work; there are frequent opportunities for scoring and some rewards are bigger than others.

Bronze — a game that I wrote that keeps track of how many rooms you’ve explored and triggers certain narrative events when you’ve found more of the space, so you could use the explored-rooms count as a secondary signal to score and probably get some useful reinforcement out of that aspect as well.

Captain Verdeterre’s Plunder — gives you a score based on how much loot you managed to rescue off a sinking ship before it goes under. Genuinely an interesting optimization problem; human players have competed to try to come up with the highest-score possible traversal.

Journey to Alpha Centauri in Real Time — as the name would suggest, this takes place over a certain amount of elapsing real time and therefore it’s not possible to finish, because it’s representing a very long journey in space.

Rematch — a game in which the challenge is to figure out a single very long command that will solve the game in a single move, and in which there is a cyclical pattern to the initial world set-up. (I think this one is not a z-machine game, so it might not work with Jericho.)

Zero Sum Game — starts with a score and counts down to zero (but this may be less interesting than the others since you could just reverse the sign of the signal and wind up with something equally valid).

Hadean Lands — fiendishly hard puzzle game, in which instead of score you’re gaining access to lots of objects which could arguably be used as a proxy for progress. Also features areas where the player has to do similar things in slightly different ways.

One thought on “Mailbag: IF for Reinforcement Learning”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s