GDC: Marleigh Norton, Conversation Interfaces

So, it’s the middle of IF comp, but I spent much of this week attending the game narrative summit at GDC Online.

Tuesday I went to sessions on storytelling in Rock Band; puzzle and storytelling (by Clara Fernandez-Vara, of the Boston IF crowd); exposition in games, and how to distribute it at the levels of structure and system as well as overt narrative; and a writing critique session that was largely about how to give useful, productive feedback to other writers. But inevitably the talk I was most keen to see was the one on interactive dialogue, by Marleigh Norton.

Norton’s background is in interaction design, and she had some intriguing ideas about how to use forms of interaction other than the dialogue tree. (Another write-up of that talk is here.) She started out by saying that she’s an academic and so she hopes that people will steal, use, and/or modify her ideas — so here are some of my own riffs on the things she presented.


Norton had two demonstration projects that showed off alternate ideas:

The most complete was a game called Camaquen, in which you play by manipulating the emotions of the main characters. You yourself say nothing, but you can control whether the characters get angry or not. This is an idea similar to one I once kicked around for IF, only my frame story was going to be that you were a psychically gifted, precocious, and deeply disturbed fetus-monster using the powers of your brain to control your parents-to-be during their arguments. Perhaps unsurprisingly, I got too creeped out by this idea to put much work into it. Norton and her students have a much nicer frame story.

In practice, I think Camaquen is a little too limited to demonstrate the full potential of this idea — the text of the conversation doesn’t change in response to the character moods. The animation still does give the dialogue different flavoring depending on how the protagonists are feeling, but there could be a lot more here, given more time to work it up.


Norton and her students are currently working on a “one-button audio game” in which the only thing you control is when the protagonist interrupts the speaker. The example she presented was done as a Bond villain game, where your objective is to interrupt at just the right places to keep the speaker monologuing until you’ve learned his evil plan. She and her students are working up the tool for this and expect to make it publicly available for other authors to work with in the future.

This strikes me as pretty cool, but under the surface it’s all an enormous dialogue tree still; it’s just that the controls for it have shifted from “press A, B, or C” to “press early, late, or later”. And accordingly it has all the content production challenges that dialogue trees normally have. Large amounts of text are needed; the story branches very widely very quickly; therefore the implemented story so far is quite short. I think this is especially likely to be a problem because we’re committing to all this branching in a medium (audio) where dialogue is expensive to produce and store, rather than one (text) where it’s cheap.

I have two main thoughts about this. One: it would be cool, IMO, if the tool worked in two layers. One layer would just keep track of during which audio chunk the interruption occurred, and pass that information down to layer two, as well as handle all of the audio-interrupting and -playing business. This would mean the internal logic could be replaced.

Two, for the internal logic itself, I’d be inclined to abstract a bit by tagging the lines as containing different types of content: e.g., for the Bond villain, THREAT (I am going to feed you to the sharks!), NEFARIOUS PLOT EXPOSITION (this planet-orbiting laser array will bring me instant world domination!), GLOATING (nothing in the world can stop me now!), and maybe a little MANIACAL LAUGHTER. Each of these content types might further be tagged with a series number (PLOT EXPOSITION 5, to come after 4) or degree (GLOATING might become more smug as the villain felt more secure).

With the bits tagged this way, it might then be possible to handle the dialogue-shuffling procedurally, without necessarily introducing so much complexity that it gets unwritable. Interrupt GLOATING and you get more NEFARIOUS PLOT EXPOSITION followed by LAUGHTER, say. Interrupt MANIACAL LAUGHTER and you get a new THREAT. You’d also need a few strategies to keep this from becoming super-mechanical, so perhaps we also track how early the player interrupts. Interrupt too early too often, and maybe the villain’s patience wears away faster. You still need a fair amount of content, but at least it’s a comprehensible system for which the player can develop strategies, and you don’t get the enormous content load you’d need for a pure-branching game.


Norton presented a couple of less-fleshed out, on-paper designs as well.

One was an eavesdropping game that was all about managing what you and/or other characters heard, for instance when trying to manipulate a detective into coming to the wrong conclusion. This is a cool idea, but it felt like the ringer in the group to me; the other ideas were all trying to propose ways in which the player could communicate conversational choices to the game system, whereas the eavesdropping mechanic makes the protagonist more or less mute, and is about controlling environmental interactions.

Finally, Norton proposed an interface that was still basically a multiple-choice dialogue tree but where there was a beat-matching challenge associated with each line of dialogue, with varying difficulty representing how hard it was to succeed in that kind of conversation. Asking someone about the weather or to pass the napkins would be easy to win; trying to pull off a complex diplomatic negotiation would be a lot harder and more likely to fail. (She acknowledged a debt to PaRappa the Rapper here.)

I can see some appeal in this, but as a very narrative-focused player with poor reflexes, I would be frustrated to have my character always stuck doing lame things because I was too incompetent to perform the narratively interesting ones. It would be like Guitar Hero having a great storyline that you could only see if you played every song on Expert. (I felt like Heavy Rain headed in that direction, but by the time my guys started dying through my lack of QTE skills, I was mostly ready to see them go.)

So I started thinking instead about a conversational Spellcraft, where you can compose conversation acts by mixing in topic modifiers (queen? dog? pants!), tonal modifiers (polite, rude), lead-ins that let you gauge the other person’s opinion before giving your own, etc. When you’re done composing all the elements, you perform your speech. (Obviously there’d be no voiceover here, and probably no direct textual representation of the player character’s speech, just the NPC’s reaction.) Said the wrong thing? The NPC’s face turns red and you get an animation of your PC being kicked out of the palace ballroom. Again.

On the other hand, successful conversation would earn you new tokens to use in future conversation, representing the information you gained or conversational tactic you observed. Maybe there’s one guy who is too suave and coy ever to answer your questions directly, but he’s still really useful because if you talk to him you gain similar evasive tactics of your own to use.

In this system, complex conversational acts would still take more work than simple ones, but the complexity would shift to being an intellectual task rather than a physical performative one. There’s probably more flexibility as well, since in any given situation there are probably multiple conversation act compositions that would get you an acceptable answer.

Anyway. Here end the tangents. Thanks again to Marleigh for a really intriguing talk.

10 thoughts on “GDC: Marleigh Norton, Conversation Interfaces”

  1. Isn’t the Bond conversational system you describe still a conversational tree? — Or describable as a directed graph, anyway? Can you talk about the advantage of such a system over a traditional system?

    It seems to me that what you’re talking about is more or less the logic of a keyword topic system, with four or five topics active at the same time.

    1. Directed graph, probably. Traditional conversation tree, no.

      One advantage is that the amount of writing (and VO recording, and storage) you would need to do to construct the nodes of the graph would be much less than for a conversation tree handling the same number of player turns.

      Another advantage is that, because you have a specific set of rules in mind about how the conversation will flow, the player can over time develop a sense of how the system works; if he understands the effects of interrupting (say) a maniacal laugh, he can now do that systematically, rather than having to trace out each branch of a traditional tree in order to find all the content.

      1. (Your point about your proposed system being cool is well-taken.)

        “Directed graph, probably. Traditional conversation tree, no.

        …the amount of writing … you would need to do to construct the nodes of the graph would be much less than for a conversation tree …”

        Are you talking about combinatorial explosion here?

        In my understanding, what is usually done in IF is the creation of several conversational “tracks.” The simplest case — not considering cases where the NPC asks the PC a question, and so on — is that the player types a topic (ASK DR EVIL ABOUT NEFARIOUS PLANS) and gets the next item in that conversational track, with the last item simply repeating (“Yes, Mr. Bond, that is how I will flood Silicon Valley.”).

        In that system, the player switches between tracks by typing a different topic. If I understand you, you’re talking about some kind of cyclical track-switching, where EVIL PLANS will always redirect to TROUBLED CHILDHOOD, and the player’s control is INTERRUPT or WAIT, rather than selecting a new topic.

        In both systems, so far as I’m understanding, we have a few conversational tracks with a number of blurbs in each track, and track-switching.

        How are you getting a lesser number of nodes in the proposed system?

      2. In my understanding, what is usually done in IF is the creation of several conversational “tracks.” The simplest case — not considering cases where the NPC asks the PC a question, and so on — is that the player types a topic (ASK DR EVIL ABOUT NEFARIOUS PLANS) and gets the next item in that conversational track, with the last item simply repeating (“Yes, Mr. Bond, that is how I will flood Silicon Valley.”).

        This is hardly a standard, though the TADS 3 conversation system may be conducive to it. Many games are implemented more to prefer breadth over depth, so the trick is to try as many topics as possible, with each topic producing only one response.

        In any case, what I was comparing my suggestion against was not any IF conversation model, but the purely branching conversation model Marleigh Norton was demonstrating so far. It didn’t have nodes and tracks; it was a simple tree.

  2. Emily,

    Fascinating, thanks for this.

    With respect to the last part of your post – I agree that employing a rhythm mini-game to represent the build-up of rapport with an agent is too artificial; and I even think it would detract from the value of including discursive interaction in the game in the first place. On your suggestion for an alternative approach, I think already implicit in what you wrote is the observation that there are very different kinds of interaction, each with their own norms. Making small talk, debating, flirting, etc. all involve different expectations about what is and isn’t appropriate. What’s more, both interlocutors usually have to agree on the nature of the interaction in order for it to proceed (the exception being when coercion is involved). And so guessing which kind of interaction someone may be amenable to under the circumstances seems like half the battle when it comes to establishing a rapport with them. For example, flirting with your boss may get you into trouble, or it may help you get a promotion. And it may only work in certain contexts, such as when the boss is drunk.

    Another aspect of this which you allude to is that one gets better at a certain kind of interaction the more one practices it. I would add that a particular performance also depends on the actor’s emotional energy at the time, which itself is determined by their recent history of successful or unsuccessful interactions. One can imagine an entire game which revolves around keeping one’s emotional energy at high levels by engaging in tricky interactions with various characters, perhaps while also pursuing some other goal which involves a trade off. Anyway, just some of my thoughts which were stimulated by your very interesting discussion.

    1. “One can imagine an entire game which revolves around keeping one’s emotional energy at high levels by engaging in tricky interactions with various characters, perhaps while also pursuing some other goal which involves a trade off.”

      That would be a game you can’t win.

      1. I would argue that in real life we play just such a game, by engaging in interactions which makes us happy while also doing things, like work, which may involve interactions which make us unhappy but give us the material resources necessary to participate in interactions. Does that make life a game you can’t win? I suppose you could say that.

  3. The “conversational Spellcraft” concept brought to mind an old Atari ST game called Captain Blood. The title character was travelling the galaxy to track down several clones, and to locate them he had to meet and converse with a variety of alien species.

    Conversation in the game involved selecting from a large set of icons representing different concepts. These could be mixed and matched at will to create different meanings. Each alien race would react very differently to different combinations of meaning.

    It was actually quite difficult to work out through pure deduction what you were supposed to say, to get the desired results (although I was perhaps too young at the time to fully understand the game). However the concept was a good one, and hugely impressive for the time. The sheer novelty of interacting with the aliens in this way, and how atmospheric the game seemed because of it, kept me playing for a long time.

    1. Yeah. I have heard of that, though it wasn’t in my mind when I was thinking about this.

      I think the difference between what I imagine and the Captain Blood example is that I’m thinking of the elements mostly being about conversational strategies rather than complex buildups of significance.

      So you wouldn’t so much be stringing together elements to say “your hat is on fire, your majesty” as to build “[greet] [reverently] [casually mention] [hat]”, and from context the game would know to bring up the whole flaming issue.

      If I describe the elements that way it sounds a bit like Deikto. But Deikto, again, is trying to build up sentences of meaning to convey, rather than choreograph a set of conversational steps to take.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: