The Seven Fables project I covered a week or so ago is now successfully Kickstarted and then some. With more resources available than they initially expected, the authors are thinking about how they might add conversational characters to the project, using some chatbot technology they’ve worked with in the past.
Here Mark Stephen Meadows and I talk through some of the design and tech issues involved.
ES: Why are you looking at adding chatbot technology to this piece?
MSM: Stories are almost always about people. Narrative’s core is about personalities: people, interactions, society, desire, fear, love, weakness. These are the building blocks of narrative and without people in a story it becomes more an exploration of architecture than a drama or adventure. That’s what IF is often about. Sure, it’s fun to poke around in a dungeon and discover doors that open and close. But I find that hearts that open and close are far more interesting.
Gollum? Princess Leia? Kung Fu Panda? Brothers Karamazov? Even great adventures like that are about the people, and what drives and limits them.
ES: Tell me what excites you about the chatbot technology you’re planning to use.
MSM: The problem with most chatbots these days is not the technology. Even simple systems like AIML have enough hooks and gears to work in a piece of IF as a believable character. The problem is design.
Usually chatbots lack context. They’re like abandoned people, homeless wanderers, that awkwardly roam the streets, looking for conversation. “Hi! My Name Is Bob! How Are You Today?” a chatbot might say. I dont want to talk with these chatbots. They’re drek, informational bums. Just like a person walking up to you on the street saying the same thing. “Hi! My Name Is Bob! How Are You Today?” I would do my best to politely brush him off and just keep walking down the street. But if there’s a design and narrative component to this then it starts to get interesting. If, for example, I see a small green man with dragonfly wings sitting on a post office box, asking me to open it because his faerie-wife is trapped inside, then I’m far more inclined to talk with him than the guy named Bob. Chat is not interesting simply because it is chat. It has to have a context. Chatbots are boring largely because they lack that context. NPCs / NPGs and chatbots should be given a context that allows them to serve a function. Give the bums a job.
This kind of design is, like writing, as much about psychology as anything else.
Once upon a time, in 2007, my company HeadCase had developed some technology that showed how a personality could be distilled from a conversation. We did it with Arnold Schwarzenegger. We were using ‘scrapers’ – an automated system that would traverse websites, search for first-person interviews, drag those back into a
database, snap off chunks of the interviews that were relevant to similar topics, ideas, and categories, and then rank that stuff according to frequency. Then we asked the system a question. So, for example, we asked the Arnold Schwarzenegger system, “What do you think of gay marriage?” and it answered, “Gay marriage should be between a man and a woman, and if you ask me again I’ll make you do 500 push-ups.”
It was Arnold. Like a photo, it was his likeness. This was, really, an authoring technique for NPCs. The goal was to take interviews and be able to generate NPCs from them.
There are many of these systems available today. Most of them are “top-down” or “pre-scripted” in which a series of possible questions and probable responses are written out by someone. But coupling this technique with learning systems, that are “bottom-up” is a good method of creating a more easily-authored (and often less predictable) character.
ES: Are you planning on doing a bottom-up, learning-systems approach to creating the chat content for this piece? If so, what sorts of source material would you be able to use, given that your characters are (presumably) not Arnold Schwarzenegger but fictional characters?
MSM: The only way I know to start anything is simply. So we’ll start top-down, with pre-scripted responses. Then we will implement an increasing amount of ‘learning’ capacities. The goal is to eventually have a hybrid. As an author, you want to establish a context for the character and weight the liklihood it will reply in a manner that matches with the story. There are some things a Drow elf (for example) will say, and other things he won’t. So you want to start by authoring that by hand, really carving out that psychology, and working with style and grammar that matches the character. So here we have a reason for starting with the pre-scripted chatbot. But this is just the character’s foundation.
Next we lay, on top of that, these ‘learning’ abilities (it’s not really learning as much as it is adding). One of the main reasons for implementing ‘learning’ abilities (or taking the bottom-up approach) is an authorship problem. Nobody, not even the eccentric Mr. Richard Wallace, wants to author one of these things. So by starting simple we write the questions and answers. Then we implement the system that scrapes, filters, and analyzes. At this point we have a chatbot with the capacity to assimilate new content. Here is where things get tricky.
Our next authoring step is to write out simple interviews. You just imagine you’re the character, then write out what you think. This is a great way of authoring as it is more like improvisational acting, but it’s spewing a bunch of fertilizer that goes into the garden. We take those transcripts then we can start to really rely on RDF systems that are built for the semantic web, like Wikipedia, for example. Ok, this is a ways down the line, but I bring up this example because it highlights the needs for reigns and blinders. If the system is aware of things like Toyota, Tampax and Tripoli these items might not fit too neatly into the world of Dungeons, Dragons, and Drow elves. We want to preserve that narrative context, and not break the psychological fourth wall. The Galatea that is created needs to remain in-costume, know what I mean?
So to sum this up, we start with simple pre-scripted / top down approach. Then we implement a ‘learning’ / bottom-up approach. Then we tweak this to make sure the character stays in role. In my past work we have gotten up to building the tools, but not up to refining the role.
Much of the technology we’ve developed needs to be retooled. Part of that is because of iOS requirements, part of it is because the app needs to work off-line, and part of it is because APIs need to be rebuilt (see, for more, Princeton’s “WordNet,” CMU’s “Link Grammar,” or University of Edinburgh’s “Open NLP”). I hope we can get started soon, but it remains to be seen. This first app we will be building will outline much of what we can and can’t do, so my hunch, today, and it’s just a hunch, is that we will implement some very small, very experimental chatbot, more as an easter-egg and proof-of-concept, than try to propel humanity into the singularity. Like I said, Just a hunch.
ES: There are lots of ways to structure an interactive conversation over multiple turns. Maybe it’s meant to simulate real-life conversation dynamics and social practices as accurately as possible, with a greeting and a good-bye every time. Maybe it’s meant to move towards a narrative climax of some kind. Maybe it’s supposed to follow a stylized structure, the way interactions in fables often have three beats. What kind of model are you building to structure conversation over multiple exchanges, and what does it emphasize?
MSM: Turn-taking is definitely key, and I think we’ll do only one or two cycles that are very short blobs of text in, most likely, comic-like word balloons. I don’t know what we’ll decide to do, as it is only this week we are seriously discussing it, but I suspect it will focus on very short interchanges. I want the story to be key, and while physics simulations and personality simulations are interesting, I think we need to practice moderation with them, especially with this first rev of the fables. Avoiding multi-cycle turns should help the characters to be a very gentle framework of the fable, not a key modus operandi. In other words, because this is not a first-person adventure, I want to garnish rather than gorge.
To answer as directly as possible, I hope that we can implement the following, even if we don’t use them all, as this will allow for most natural turn-taking models we see in real life to take place:
– current speaker selects next
– next self-selects
– current continues
ES: Do your chatbot characters have goals in conversation? If so, how do they pursue those?
MSM: In the past we’ve worked with manually registering emotional goals that were altered based on word frequencies. So we’d circle out a collection of words that we had hoped to register, and if those showed up on the radar then we’d move towards an emotion of satisfaction or happiness or whatever. I use the radar analogy because I think, maybe because I’m a sailor, it’s accurate. The information determines a trajectory.
We’ve worked with a relatively classic model of six emotions, with a default null or 0 emotion in the middle, but we’ve also chosen to combine and stack emotions since that is, as I understand it from Yanon Volcani, a psychologist that has worked with us in the past (and is a backer of our project now) how these things work. In humans, I mean.
For 7Fables, it might not be necessary. The characters will probably just provide information and once that is communicated blip out of the scene. The question is whether the reader has understood the advice given, and if we can get confirmation on that without implementing an emotional model, then all the better. We’re still working on these questions and picking our implementation.
ES: Is the player speaking to the protagonist or as the protagonist? I wasn’t sure from our initial conversation whether we were understanding the player playing a part within the world of the fables or just influencing them from above, as it were. (Both could be really interesting, but it’s likely to make a big difference in design.)
MSM: Our past work, in systems like Second Life, had the player speaking as the protagonist. That was easy enough because your avatar was you. In this instance, though, and with characters that can die without stopping the story, means that we need to rethink the function of a chatbot.
Here we need the reader to be able to identify with the protagonist in the story, and to guide the protagonist, but the reader’s guidance is almost always environmental; you (the reader) increase wind, you let the piggies out of the stall, you blow blossoms off a tree, you create waves. We’re planning stuff like that. These actions then affect what the protagonist can or can’t do. I think of it a bit like a mouse in a maze. You, the reader, want to get the mouse to the cheese, but you, yourself, don’t know the maze, so you open and close gates which the mouse then moves through. There are usually several endings, or pieces of cheese, in the maze. Note that I am not thinking of games like Black/White, with the god PoV. Our approach is far simpler.
Ok, so where is the chatbot needed?
One option is to put the chatbot up on the wall of the maze, watching the mouse as well as you, the reader. That chatbot then serves as a kind of advisor, offering some contextual and navigational opinions, so to speak. I suspect he’ll be a little like the chorus in ancient Greek drama, advising moderation. It might end up like a Jiminy-Cricket-chatbot, a cricket-on-the-wall, that asks pointed and sometimes slightly obscure questions of the reader. Second, we’ve considered putting the chatbot down in the maze, yet still addressing the reader, more than our protagonist-mouse. In this case Jiminy Cricket is there to address a rather mute mouse, and you can speak on behalf of the team (you and the mouse).
What’s important, however, is that the turn-taking be initiated by the chatbot. Chatbot gets first pick of topic, place, and time. That should build our narrative context, keep a narrative flow, but simultaneously buckle the reader into the interaction. I think, to return to this question of context, that it is important that the chatbot do this because we want the chatbot to establish context, not the reader. The reader needs to be informed. The chatbot needs to do the informing. If the chatbot initiates it then we’re able to keep things under some modicum of narrative control, but still allow a reader chance to play within that trajectory.
In both examples we break the fourth wall, and this is a problem for us. I’m not sure how best to address it within a fable context, which is SO tight, and SO short, and with usually a very guided parameter of lesson-learning, but I hope this will be a convo-light game, rather than something that focuses on the conversation.
I mean, when we break the metaphor, or fourth-wall, is the question, right? If we throw an interactive in the middle of the story we’ve broken the metaphor, we’ve broken that sacred fourth-wall. You stop reading words and you changed headspace, mental mode, to interact with moving pixels. Doing it with conversational characters only multiplies that problem, so we have to be very gentle with the application of this tech.
Chris Hecker, in his talk at GDC in 2010, addressed this a little. There’s some notes he has on it here, where he mentions the importance of “Verbal, unexpected, informational feedback, increases free-choice and self-reported intrinsic motivation.”
I hope we can work along those lines because we’re certainly not keeping scores.
ES: Most natural language-input conversation games have the problem that this article describes as “pixel-bitching”: it’s hard for the player to figure out which keywords are going to be important and effective at moving the story forward. This becomes very important if you have win/lose possibilities in conversation. How do you plan to address this?
MSM: First off, I love the fact that a social equivalent of the uncanny valley is outlined here. That’s spot-on and we authors of conversational systems need to be very alert to the hills and dales of this new world we’re exploring. Psychological, social, and cultural uncanny valleys can be found if searched for, and the interfaces we build for interacting with chatbots can influence those uncanny valleys in both directions.
Anyway, keywords are one way of doing it, but there are other ways of influencing a chatbot as well. Consider that the chatbot is watching your body language, or actions, as well as monitoring your words. Right now, in the app prototype that I developed (which crashes, and is crawling with bugs) there’s a link between the text and the image. If you move the image you can, in some instances, change the text, and vice-versa. I’m hoping that if we can affect changes between those two layers we can also link them with the chatbot input, so what you do with the image and text layers can influence the chatbot’s emotional response. This means that a reader might not even have to reply to the chatbot, but just do what the chatbot recommends. I don’t know for sure where we’ll end up as I’m waiting to see what we’re capable of doing with the resources we have.
ES: Most chatbots are designed to disguise when they don’t understand something (by giving a generic response of some kind or changing the subject); most game interfaces are designed to make clear when they don’t understand input and guide the user towards the affordances that are available. From a story-telling point of view, the disguise approach is often tempting because it contributes to the illusion of a coherent personality rather than a fake human, but it can add to the player’s confusion, especially given an NLP interface where they’re already having to guess what to type. What’s your take on this issue?
MSM: Yeah, I think many chatbots do that because many people do that, as well. Maybe it’s an assumed reaction or maybe people think out-loud. But if our goal in conversation is to exchange information, then I don’t think it’s the best thing to do, in principle. The real problem is breaking that all-sacred metaphor / fourth wall, especially when a chatbot burps up something like, “I did not understand what you said about [UNPARSED:LINE 2113:*/Fix/45%”].” Ultimately, this might be seen as two distinct layers of bugginess.
Perhaps it is best to have the chatbot, for this fables project, first influence the reader, then be won over, or not, depending on what the reader does. It might be possible to circumvent this entirely, and if the chatbot is advising the reader on things to do, the question is more “does the chatbot win the reader over?”
We’ll see where we come down on this, but I definitely hope to be upfront if our chatbot doesn’t understand something.
ES: The iPad isn’t always the most fun platform for doing extended typing on — it works, but it’s not always very accurate, and it interposes the keyboard metaphor between the player and the story world, whereas a lot of the other interaction features you’ve told me about sound like they’re aiming at creating a kind of vibrant tactile engagement with the game world. Is this something you’re concerned about? If so, how might you address that?
I’ve been very careful over the years with platforms I’ve worked with. Most of my interactive narrative work, and interactive visual work, and even the chatbot work I’ve been involved with, were for obscure systems where we were able to clearly isolate design variables such as framerate, network latency, processor speed. While I’ve got some problems with the iPad it is the first low-cost system that is widely distributed enough to be able to accommodate a large number of readers. We can isolate our design decisions.
Plus it has some features that are really good for interactive fiction and chatbots.
One of them is a little microphone. We’ll see if we can hit this target in our first rev, but I’d like to run some speech-to-text (and possible text-to-speech) input / output with the characters. But this is very difficult and often fails, plus the microphone is next to a speaker, and we’re planning on playing music at the same time as the fables. So our bell may trip over our whistle.
In the end, all I write here is in development and undergoing change. We’re waiting to see what happens in the next three days, then we’ll get down to really figuring these details out. If there’s enough wind we’ll sail to even further unexplored territories.
I look forward to meeting you there.