This Friday I had the pleasure of speaking to the AAAI workshop Knowledge Extraction from Games, which focused on gathering information from games and putting that information to use: for instance, studying level design in a platformer in order to find standard rules about platformer design or to propose alternative level designs that the creators might not have considered.
I was invited to talk about this topic from a designer’s perspective, looking particularly at how these techniques could be valuably applied to narrative games. And the problem, as I outlined it, was as follows:
Games that aspire to offer a lot of narrative agency often face the following challenge: they need a number of distinctive, hand-authored units of content (whether those are dialogue lines for Character Engine, storylets in a quality-based narrative system, or choice nodes in a ChoiceScript game) where each individual unit may both affect and be affected by the underlying world state.
That means that the author — or authors — need a way to understand the complexity of what they are building. When should I put restrictions on accessing a piece of content, and what should those restrictions be? What are the implications if we give away a lot of resources in this storylet over here? Will it make some other part of the story easier or harder? How much of my content is a player going to see in a single run-through? Are there portions of that content that are too hard or too easy to reach? Are there cases where very different player choices will lead to the same outcomes (in which case, the player may not get the sense that their actions made a difference)?
All this becomes even more overwhelming when we’re talking about ongoing narrative worlds such as an MMORPG or something like Fallen London — where dozens of authors have created content and elements of the world model over a period of years. At a certain point, one tends to start using unique qualities or stats just for the one episode one is working on — tracking progress just in this one context, sandboxing so it won’t damage the rest of the world. But in the process, you’re also giving up the possibility of really interesting causality chains between stories that could otherwise exist… if you just had the means to plan and debug them thoroughly.
Given the complexity of these systems, it quickly becomes infeasible to reason about the balance from first principles. And it’s typically also difficult to use a separate abstract design tool to represent what’s happening in the game. So much is happening, and your representation is likely to get out of sync with the actual implementation.
Sometimes it’s possible to pare away some of that complexity again. Character Engine provides abstractions so that certain types of action can be categorized by the author, so that they’ll behave uniformly. That offers a handle on part of the challenge.
Where complexity cannot be eliminated, a second approach is to build analysis tools, automated testing and visualization methods that give an understanding of how the system plays over tens or hundreds of thousands of sample runs: I’ve used that in the past with ChoiceScript and for the Versu project (PDF), and we’re building our own upgraded analysis tools into Character Engine. Tools like that let us determine whether some content is too often or too rarely accessible, whether particular outcomes are too difficult, and so on.
There is a further step in potential tools if we look at what is being done in the procedural content generation space.
Suppose we regard any narrative game as a procedural story generator. In practice, the player is making choices, but we could substitute in a random or probabilistic bot to choose, as we do in auto-testing scenarios. Then we can talk about the expressive range of the generator, as Gillian Smith has done with procedural level generators. This is a way of asking “what can this generator build, and how much variation is present?”
Expressive range is only an interesting concept if we know what types of expressiveness we’re looking for, so we also need metrics. For narrative, we might propose metrics such as the following (and this is by no means meant to be a comprehensive list — more of an initial starting point):
- Number of distinctive endings (the most basic and trivial of metrics).
- Distribution of distinctive endings. This is a bit subtler: if a game has 10 endings but 9 of them can be reached only in special case circumstances, and the average player will see the same ending over and over again, the replay appeal is lower than if the game has 10 endings each reachable in roughly 10% of playthroughs. (Yes, I realize there is something to be said for easter eggs and hard to reach achievements — but if most of what you’re offering is in that category, the average player’s experience of narrative agency will be much lower.)
- Juxtaposition of composited elements. Say the player can romance any or none of half a dozen NPCs, and can win or lose each of two major confrontations. (This is a pretty plausible state space for an RPG or a Choice of Games type of piece.) Are all of those outcome sets possible, or are some of them interdependent? Does each of those outcome sets produce unique narration, or are some of them narrated in the same way?
- Ease vs. difficulty. How many of the final endings could be considered positive rather than negative? How likely is it that a given play-through will reach a positive ending?
- Points of increased intensity or raised stakes. How often does the protagonist face a situation where there is a significant risk of failure? Where in the story do these points occur? Are we putting our hard-to-pass stat checks at moments when they’re going to be most compelling for the player and contribute to dramatic escalation?
- Frequency and placement of reversals. How often does the protagonist experience a setback of some kind? What happens afterwards, and how easy is it to recover?
- Duration of the whole thing, or parts of the whole: is the narrative always the same length? Are the scenes within the narrative always the same length?
So what does the analogy to procedural content generation buy us? It means that we could in theory apply some of the same techniques used to analyze and alter procedural content generators in order to change their expressive range.
Last weekend, at the Malta Global Game Jam, I had the privilege of getting a look at Mike Cook’s project Danesh. Danesh lets the user plug in different procedural content generators, specify which variables are open to change, and explore how the expressive range of that content generator would be altered if the variables were altered.
So to bring that back around to narrative particulars. Suppose we had a ChoiceScript piece and a way to feed it to Danesh. (This would be nontrivial, I should add. But let’s imagine this were the case.) We could mark up variables like “gating threshold for starting a romance with the Grand Vizier” and “gating threshold for escaping the cave alive”, then explore the available space of possible generators until we found one that tended to offer an even distribution of endgame outcomes — or whatever other metrics we were trying to optimize. No human iterative guesswork needed.
Similar types of tooling could identify dud stats — pieces of world state that don’t (currently) have a significant effect on gameplay, which should probably be eliminated from the design or else used more heavily.
From there, we start to be able to imagine systems that are partly programmable in terms of high-level interactive narrative goals. It wouldn’t necessarily be enough for the author to say “there should be a conflict with the protagonist’s mother at roughly the 2/3 point in the story.” But one could imagine a system that could respond with either “yes, this generator can be tuned to produce that result” or “no, that result is not possible to reach with any version of this generator, and you’d need to add more content or world state in order to make it possible.” And the results would not only be faster to build, they’d also be better tuned than the majority of things we’re building today — offering the player a highly responsive narrative with loads of perceivable consequence for their actions.