Mailbag: IF for Reinforcement Learning

Hi Emily

I’m a PhD student working with Prof. Mark Riedl at Georgia Tech and Microsoft Research Redmond. I am currently working on making AI agents (specifically using reinforcement learning) that play interactive fiction games (text-adventure games in the vein of Zork) in a non-game specific, generalizable way.

I was advised by Prof. Janet Murray that you would be the right person to help answer a question I had regarding these games, given your expertise in interactive fiction. If you have a list of such games (e.g. those given here https://github.com/microsoft/jericho#supported-games), is it possible to identify a subset of maybe ~10-15 of them that reasonably cover a majority of all interactive fiction games in terms of game structure, i.e. linearity of progression/score accumulation from the perspective of a learning agent? If it is possible, what would this set look like? Any insight at all would be great.

Nice to hear from you — I’ve been keeping an eye on this space as people have been publishing about it recently.

I’m not sure there’s a perfect answer to this, since IF is hugely varied in how it handles world model, score, pacing, etc. Also, your list here skews very much towards early interactive fiction, which means it doesn’t cover some of the formal experiments that came along later.

I also don’t remember how score works in all the games in this list — some of them I’ve not played, or played a long time ago.

However, with that in mind, here are a couple of categories that represent some fairly standard game structures:

Short or medium game in which score is given out rarely — Lost Pig (max 7)

Short or medium game in which score is given more frequently — Meteor etc. (max 30), Balances (max 51)

Long game in which score is distributed fairly frequently throughout — Adventure, Zork; possibly Enchanter and Sorcerer also; Anchorhead, as I recall

And from your list, I recall these being ones that might pose an interesting challenge:

Curses — it’s long, it’s complicated, it does have a scoring system which it doles out gradually, and it also does a trick (if I’m remembering right) where it actually at one point deducts score from the player again. 

Wishbringer — this one’s interesting because there’s a scoring system that reacts to how many times you’ve used the magic stone in the game — so the more you use wishes, the easier the game becomes, but the lower your final score.

Hunter, in Darkness — doesn’t keep score. There’s also a procedurally generated maze in this, which I would expect to make it very challenging indeed.

Thinking about games not on your list, here are some other formal extremes that might be interesting to try to reason about; all of these can be found on https://ifdb.tads.org/ and in most cases they’re available for download.

ASCII and the Argonauts — an intentionally short and simple game that gives a bunch of +1 rewards for doing basic tasks; the relatively small verb set might make it easier than some of the other games.

Aisle — a game that takes one move to play, and for which many different verbs are available; there’s also no score. It’s hard to imagine how one would use reinforcement learning on this, but it represents one extreme that might be valuable for purposes of thought experiment.

Adventurer’s Consumer Guide — as I recall this one gives out a pretty steady stream of +1 point rewards, rather than only a few or only rarer rewards, so it might be a nice counterpoint to some of the others.

Savoir-Faire — a game of mine, and I suggest it just because I happen to know it well enough to know how the rewards work; there are frequent opportunities for scoring and some rewards are bigger than others.

Bronze — a game that I wrote that keeps track of how many rooms you’ve explored and triggers certain narrative events when you’ve found more of the space, so you could use the explored-rooms count as a secondary signal to score and probably get some useful reinforcement out of that aspect as well.

Captain Verdeterre’s Plunder — gives you a score based on how much loot you managed to rescue off a sinking ship before it goes under. Genuinely an interesting optimization problem; human players have competed to try to come up with the highest-score possible traversal.

Journey to Alpha Centauri in Real Time — as the name would suggest, this takes place over a certain amount of elapsing real time and therefore it’s not possible to finish, because it’s representing a very long journey in space.

Rematch — a game in which the challenge is to figure out a single very long command that will solve the game in a single move, and in which there is a cyclical pattern to the initial world set-up. (I think this one is not a z-machine game, so it might not work with Jericho.)

Zero Sum Game — starts with a score and counts down to zero (but this may be less interesting than the others since you could just reverse the sign of the signal and wind up with something equally valid).

Hadean Lands — fiendishly hard puzzle game, in which instead of score you’re gaining access to lots of objects which could arguably be used as a proxy for progress. Also features areas where the player has to do similar things in slightly different ways.

Can AI tell a good story?

emshortLGF.jpg

Tuesday I was invited to speak at the interactive narratives summit at the London Games Festival, specifically in a debate over whether AI can create a good story.

Perhaps the original scheme was to start a good showdown, but I have somewhat complicated views about what the question even means, and my would-be debater Brenden Gibbons did also, as it happens. So instead we had a more temperate but I think more interesting conversation, moderated by David Tomchak.

This is not a transcript of that conversation, because I can’t do that, but it’s an attempt to recapture some key points, drawing also on notes I made before the event, and expanding some of the ideas with links or examples I didn’t have available in the room.

First, AI can definitely already create stories, by pretty much any definition that a narratologist would establish. Indeed, we can set the bar higher than just “is there a sequence of causally-linked events,” though many scholars would accept that as enough. Some of GPT-2’s output is interesting, funny, and narrative. So are the outputs of other techniques stretching back to the 70s, from generative grammars to the model-and-curate approach used by James Ryan in his recent dissertation Curating Simulated Storyworlds. If AI were an orchard, we would have already plucked many and diverse story fruits there.

Continue reading “Can AI tell a good story?”

Mailbag: AI Research on Dialogue and Story Generation (Part 3)

This is a continuation of an earlier mailbag answer about AI research that touches on dialogue and story generation. As before, I’m picking a few points of interest, summarizing highlights, and then linking through to the detailed research.

This one is about a couple of areas of natural language processing and generation, as well as sentiment understanding, relevant to how we might realize stories and dialogue with particular surface features and characteristics.

Transferring text style

Screen Shot 2018-08-25 at 2.27.27 PM.png

Style transfer is familiar in image manipulation, and there are loads of consumer-facing applications and websites that let you make style changes to your own photographs. Textual style transfer is a more challenging problem. How might you express the same information, but in different wording, representing a different authorial manner? Alter the sentiment of the text to make it more positive or negative? Translate complex language to something more basic, or vice versa? Capture the distinctive prose characteristics of a well-known author or a specific era? Indeed, looked at the right way, translation from one human language into another can be regarded as a form of style transfer.

Continue reading “Mailbag: AI Research on Dialogue and Story Generation (Part 3)”

Mailbag: Research on Dialogue and Story Generation (Part 2)

This is a continuation of an earlier mailbag answer about research that touches on dialogue and story generation. As before, I’m picking a few points of interest, summarizing highlights, and then linking through to the detailed research. In this section, I’m mostly looking at authoring tools and at academic theoretical work on interactive narrative.

This will not be comprehensive.

Authoring Tools for Dynamic or Procedural Storytelling

Several academic projects focus on building authoring tools for various types of dynamic or procedural storytelling, whether or not those are heavily augmented by AI. Many of these don’t rely on machine learning per se but do explore some other aspect of  the problem; in particular, several attempt to furnish the author with the means to build content for a planner-based storytelling system. But there’s a whole range of functionality here (and this is not a complete list):

Screen Shot 2017-04-23 at 9.26.07 PM

Andrew Gordon has done quite a bit of work around tools designed to assist authors with story creation ideas based on large corpora. I’ve written elsewhere about DINE, his interactive story authoring tool. DINE allows authors to describe the sorts of prompts that they want to understand, but uses its own models of language to determine whether a player’s input qualifies as matching a prompt. The result is less controllable but sometimes more robust than a standard interactive fiction parser. (“Sometimes” is the key word in that sentence.)

Screen Shot 2018-08-26 at 9.39.39 PM.png

Emma’s Journey is a project out of UCSC that combines fragments of choice-based narrative with a planner to create dynamic scenes. Individual pieces feel like they could have been done in Twine, but the selection and ordering of pieces is very dependent on current stats; and there is a distracting minigame for the player that also affects what options are available. This is built with the experimental StoryAssembler tool. There are also several associated research papers.

Continue reading “Mailbag: Research on Dialogue and Story Generation (Part 2)”

Mailbag: AI Research on Dialogue and Story Generation

I’m curious: do you follow much research that happens in stories and dialog these days? In the world of machine learning research, there’s much less in dialog and stories than other areas (e.g. image generation/recognition or translation), but once in a while, you come across some interesting work, e.g. Hierarchical Neural Story Generation (by some folks in Facebook AI).

For some years now I’ve followed work coming out of the UCSC Expressive Intelligence Studio; work done at Georgia Tech around crowdsourced narrative generation; game industry applications introduced or covered at the GDC AI Summit (though it is rarer to see extensive story-generation work here). I’ve also served on the program committees for ICCC and ICIDS and a few FDG workshops; and am an associate editor on IEEE Transactions on Games focused on interactive storytelling applications. Here (1, 2, 3) is my multi-part post covering the book Interactive Digital Narrative in detail.

That’s not to say I see (or could see) everything that’s happening. I tend to focus on things that look most ready to be used in games, entertainment, or chatbot applications — especially those that are designed to support a partially human-authored experience. I also divide my available “research” time between academic work and hands on experiments in areas that interest me.

So with that perspective in mind:

  • I’m not attempting a comprehensive literature review here! That would be huge. This coverage cherrypicks items
  • I will go pretty lightly on the technical detail since the typical readership of this blog may not be that interested, but I’ll try to provide summary and example information that explains why a given item is interesting in my opinion, and then link back to the original research for people who want the deeper dive
  • I’ll actually start by summarizing a bit the paper the questioner linked
  • Even with cherrypicking, there is a lot to say here and I am breaking it out over multiple posts

That Initial Paper

For other readers: the linked article in this question is about using a large dataset pulled from Reddit’s WritingPrompts board and a machine learning model that draws on multiple techniques (convolutional seq2seq, gated self-attention). After training, the system is able to take short prompts and create a paragraph or so of story that relates to the prompt. Several of the sample output sections are quite cool:

Screen Shot 2018-08-25 at 1.41.40 PM.png

But they are generating surface text rather than plot, and the evidence suggests that they would not be able to produce a coherent long-term plot. Just within this dialogue section, we’re talking about a tablet-virus-monster object, and we’ve got a couple of random scientist characters.

Continue reading “Mailbag: AI Research on Dialogue and Story Generation”

TextWorld (Inform 7 & machine learning)

Inform 7 is used in a number of contexts that may be slightly surprising to its text adventure fans: in education, in prototyping game systems for commercial games, and lately even for machine learning research.

TextWorld: A Learning Environment for Text-Based Games documents how the researchers from Tilburg University, McGill University, and Microsoft Research built text adventure worlds with Inform 7 as part of an experiment in reinforcement learning.

Reinforcement learning is a machine learning strategy in which the ML agent gives inputs to a system (which might be a game that you’re training it to play well) and receives back a score on whether the input caused good or bad results. This score is the “reinforcement” part of the loop. Based on the cumulative scoring, the system readjusts its approach. Over many attempts to play the same game, the agent is trained to play better and better: it develops a policy, a mapping between current state and the action it should perform next.

With reinforcement learning, beacuse you’re relying on the game (or other system) to provide the training feedback dynamically, you don’t need to start your machine learning process with a big stack of pre-labeled data, and you don’t need a human being to understand the system before beginning to train. Reinforcement learning has been used to good effect in training computer agents to play Atari 2600 games.

Using this method with text adventures is dramatically more challenging, though, for a number of reasons:

  • there are many more types of valid input than in the typical arcade game (the “action space”) and those actions are described in language (though the authors note the value of work such as that of BYU researchers Fulda et al in figuring out what verbs could sensibly be applied to a given noun)
  • world state is communicated back in language (the “observational space”), and may be incompletely conveyed to the player, with lots of hidden state
  • goals often need to be inferred by the player (“oh, I guess I’m trying to get that useful object from Aunt Jemima”)
  • many Atari 2600 games have frequent changes of score or frequent death, providing a constant signal of feedback, whereas not all progress in a text adventure is rewarded by a score change, and solving a puzzle may require many moves that are not individually scored

TextWorld’s authors feel we’re not yet ready to train a machine agent to solve a hand-authored IF game like Zork — and they’ve documented the challenges here much more extensively than my rewording above. What they have done instead is to build a sandbox environment that does a more predictable subset of text adventure behavior. TextWorld is able to automatically generate games containing a lot of the standard puzzles:

Screen Shot 2018-08-26 at 2.19.51 PM.png

Continue reading “TextWorld (Inform 7 & machine learning)”