This is a continuation of an earlier mailbag answer about AI research that touches on dialogue and story generation. As before, I’m picking a few points of interest, summarizing highlights, and then linking through to the detailed research.
This one is about a couple of areas of natural language processing and generation, as well as sentiment understanding, relevant to how we might realize stories and dialogue with particular surface features and characteristics.
Transferring text style
Style transfer is familiar in image manipulation, and there are loads of consumer-facing applications and websites that let you make style changes to your own photographs. Textual style transfer is a more challenging problem. How might you express the same information, but in different wording, representing a different authorial manner? Alter the sentiment of the text to make it more positive or negative? Translate complex language to something more basic, or vice versa? Capture the distinctive prose characteristics of a well-known author or a specific era? Indeed, looked at the right way, translation from one human language into another can be regarded as a form of style transfer.
This is a continuation of an earlier mailbag answer about research that touches on dialogue and story generation. As before, I’m picking a few points of interest, summarizing highlights, and then linking through to the detailed research. In this section, I’m mostly looking at authoring tools and at academic theoretical work on interactive narrative.
This will not be comprehensive.
Authoring Tools for Dynamic or Procedural Storytelling
Several academic projects focus on building authoring tools for various types of dynamic or procedural storytelling, whether or not those are heavily augmented by AI. Many of these don’t rely on machine learning per se but do explore some other aspect of the problem; in particular, several attempt to furnish the author with the means to build content for a planner-based storytelling system. But there’s a whole range of functionality here (and this is not a complete list):
Andrew Gordon has done quite a bit of work around tools designed to assist authors with story creation ideas based on large corpora. I’ve written elsewhere about DINE, his interactive story authoring tool. DINE allows authors to describe the sorts of prompts that they want to understand, but uses its own models of language to determine whether a player’s input qualifies as matching a prompt. The result is less controllable but sometimes more robust than a standard interactive fiction parser. (“Sometimes” is the key word in that sentence.)
Emma’s Journey is a project out of UCSC that combines fragments of choice-based narrative with a planner to create dynamic scenes. Individual pieces feel like they could have been done in Twine, but the selection and ordering of pieces is very dependent on current stats; and there is a distracting minigame for the player that also affects what options are available. This is built with the experimental StoryAssembler tool. There are also several associated research papers.
I’m curious: do you follow much research that happens in stories and dialog these days? In the world of machine learning research, there’s much less in dialog and stories than other areas (e.g. image generation/recognition or translation), but once in a while, you come across some interesting work, e.g. Hierarchical Neural Story Generation (by some folks in Facebook AI).
For some years now I’ve followed work coming out of the UCSC Expressive Intelligence Studio; work done at Georgia Tech around crowdsourced narrative generation; game industry applications introduced or covered at the GDC AI Summit (though it is rarer to see extensive story-generation work here). I’ve also served on the program committees for ICCC and ICIDS and a few FDG workshops; and am an associate editor on IEEE Transactions on Games focused on interactive storytelling applications. Here (1, 2, 3) is my multi-part post covering the book Interactive Digital Narrative in detail.
That’s not to say I see (or could see) everything that’s happening. I tend to focus on things that look most ready to be used in games, entertainment, or chatbot applications — especially those that are designed to support a partially human-authored experience. I also divide my available “research” time between academic work and hands on experiments in areas that interest me.
So with that perspective in mind:
- I’m not attempting a comprehensive literature review here! That would be huge. This coverage cherrypicks items
- I will go pretty lightly on the technical detail since the typical readership of this blog may not be that interested, but I’ll try to provide summary and example information that explains why a given item is interesting in my opinion, and then link back to the original research for people who want the deeper dive
- I’ll actually start by summarizing a bit the paper the questioner linked
- Even with cherrypicking, there is a lot to say here and I am breaking it out over multiple posts
That Initial Paper
For other readers: the linked article in this question is about using a large dataset pulled from Reddit’s WritingPrompts board and a machine learning model that draws on multiple techniques (convolutional seq2seq, gated self-attention). After training, the system is able to take short prompts and create a paragraph or so of story that relates to the prompt. Several of the sample output sections are quite cool:
But they are generating surface text rather than plot, and the evidence suggests that they would not be able to produce a coherent long-term plot. Just within this dialogue section, we’re talking about a tablet-virus-monster object, and we’ve got a couple of random scientist characters.
Inform 7 is used in a number of contexts that may be slightly surprising to its text adventure fans: in education, in prototyping game systems for commercial games, and lately even for machine learning research.
TextWorld: A Learning Environment for Text-Based Games documents how the researchers from Tilburg University, McGill University, and Microsoft Research built text adventure worlds with Inform 7 as part of an experiment in reinforcement learning.
Reinforcement learning is a machine learning strategy in which the ML agent gives inputs to a system (which might be a game that you’re training it to play well) and receives back a score on whether the input caused good or bad results. This score is the “reinforcement” part of the loop. Based on the cumulative scoring, the system readjusts its approach. Over many attempts to play the same game, the agent is trained to play better and better: it develops a policy, a mapping between current state and the action it should perform next.
With reinforcement learning, beacuse you’re relying on the game (or other system) to provide the training feedback dynamically, you don’t need to start your machine learning process with a big stack of pre-labeled data, and you don’t need a human being to understand the system before beginning to train. Reinforcement learning has been used to good effect in training computer agents to play Atari 2600 games.
Using this method with text adventures is dramatically more challenging, though, for a number of reasons:
- there are many more types of valid input than in the typical arcade game (the “action space”) and those actions are described in language (though the authors note the value of work such as that of BYU researchers Fulda et al in figuring out what verbs could sensibly be applied to a given noun)
- world state is communicated back in language (the “observational space”), and may be incompletely conveyed to the player, with lots of hidden state
- goals often need to be inferred by the player (“oh, I guess I’m trying to get that useful object from Aunt Jemima”)
- many Atari 2600 games have frequent changes of score or frequent death, providing a constant signal of feedback, whereas not all progress in a text adventure is rewarded by a score change, and solving a puzzle may require many moves that are not individually scored
TextWorld’s authors feel we’re not yet ready to train a machine agent to solve a hand-authored IF game like Zork — and they’ve documented the challenges here much more extensively than my rewording above. What they have done instead is to build a sandbox environment that does a more predictable subset of text adventure behavior. TextWorld is able to automatically generate games containing a lot of the standard puzzles:
This Friday I had the pleasure of speaking to the AAAI workshop Knowledge Extraction from Games, which focused on gathering information from games and putting that information to use: for instance, studying level design in a platformer in order to find standard rules about platformer design or to propose alternative level designs that the creators might not have considered.
I was invited to talk about this topic from a designer’s perspective, looking particularly at how these techniques could be valuably applied to narrative games. And the problem, as I outlined it, was as follows:
Games that aspire to offer a lot of narrative agency often face the following challenge: they need a number of distinctive, hand-authored units of content (whether those are dialogue lines for Character Engine, storylets in a quality-based narrative system, or choice nodes in a ChoiceScript game) where each individual unit may both affect and be affected by the underlying world state.