hello :) I was reading some of your stuff about conversation systems, was wondering if you’d have any links/pointers to academic literature on modelling conversations (can be more theoretical/non-game-related), or stuff relevant to people who might be trying to do it?
This one came in via Twitter. I’ve covered some adjacent topics before on mailbag, including
- Games that do complex conversational mechanics
- Dialogue and story generation techniques (Parts 1, 2, 3)
- Dialogue filtering to apply personality and emotion to existing text — this includes links to some academic research into how personality traits affect people’s utterances
- And back in 2009 I wrote this on conversational analysis and how it applied to my work at the time, including going through a number of dialogue situations recorded in literature and talking about how the conversation model I was using at the time would address or fail to address those
But this question is asking something a little different, specifically about how conversation is modeled in the abstract, not necessarily in games and not necessarily for AI production purposes. What academic literature is out there to help us understand how people talk to one another? What types of approaches exist for modeling conversation in general?
Unsurprisingly, this is a huge field of study, so this is not remotely a literature review; instead, it’s a tour of a few pieces of terminology and resources that might be useful in digging deeper.
Also, I am not approaching it primarily from the perspective of a trained linguist (I’ve taken a few classes, but it’s not my field) and instead from the perspective of a person trying to model things for interactive conversation purposes.
So, with those caveats:
The Cooperative Principle
This refers to the idea, explored by Grice, that conversation can happen only because the participants are cooperating towards a common goal of mutual understanding. Grice advances four further maxims that govern how people are supposed to communicate in order to achieve cooperation, though these are also highly culturally determined. A lot of specific behavior in conversation can be explained in terms of how it corresponds to Gricean maxims.
Stephen C. Levinson’s book Pragmatics is getting older now — I got my copy in the mid-90s as part of a college linguistics course — but it covers a lot of ground. Its Amazon blurb doubles as a description of the field of linguistic pragmatics in general:
Those aspects of language use that are crucial to an understanding of language as a system, and especially to an understanding of meaning, are the acknowledged concern of linguistic pragmatics. This textbook provides a lucid and integrative analysis of the central topics in pragmatics – deixis, implicature, presupposition, speech acts, and conversational structure.
A search for “pragmatics” on Amazon turns up several other, more recent sourcebooks and textbook overviews of the field; I don’t own all of these myself so can’t speak to their relative value. They tend not to be cheap, so maybe something to get from a library unless you happen to have a large budget for linguistics textbooks.
I do also have the Routledge Applied Linguistics book Pragmatics: An Advanced Resource Book for Students, which is more recent, and which takes on some of the cultural and contextual aspects of pragmatics. It’s pretty readable and accessible, with short but concept-rich chapters on topics from how to collect research data to the more sociological aspects of pragmatics study (the Introduction), then follows up with excerpted readings from key writing (the Extension) and a large references section to guide the reader into the academic literature.
Where this most excites me is where it picks up topics that were less discussed in earlier pragmatics writing in my experience. To again quote the blurb:
examines the social and cultural contexts in which pragmatics occurs, such as in cross-cultural pragmatics (silence, indirectness, forms of address, cultural scripts) and pragmatics and power (the courtroom, police interaction, political interviews and doctor-patient communication)
…which is the sort of linguistics-meets-social-circumstance topic that I find completely fascinating and could read about for weeks whether or not I had any legitimate game application.
The Routledge book does come with a supporting website as well, where you can read some of their guidance about where to find corpora of existing conversational data, and how to make use of it, along with quite a bit of other supporting material. Reading through the website here may give a sense of what the book is covering and whether it’s likely to be relevant to you.
Historical pragmatics studies how conversations have worked in the past, based on what we can recover or guess from literature or other materials. Edinburgh University Press has a series that includes some coverage of historical pragmatics, though here again I haven’t surveyed all the texts in question.
Harvey Sacks‘ work on conversational analysis is considered foundational, but I have only read excerpts.
Semantics is the study of how signs relate to meaning. (Charles Morris divided semiotics up into syntax, semantics, and pragmatics, and the terms continue to be used in linguistics, though they’ve accrued additional meaning, the division between semantics and pragmatics is not always perfectly defined.)
Semantics covers a lot of territory, up to and including “what does this word mean in this context” (in NLP, the word-sense disambiguation problem). Some of this might not seem to be conversation modeling as such, though we need many of these techniques if we’re going to try make sense of a conversation in progress.
However, semantics would also be where we would look for models of how an argument might play out in conversation, and approaches such as Montague grammar exist to try to convert utterances into formal logical propositions.
In some of the Versu experiments — more those written by Richard Evans than the ones by me, though I also dabbled in this — we played with philosophical debate models in which characters would advance statements as part of a logically rigorous argument. (Several of my experiments in this line quickly veered away from the logical again at least as a character motivation, so that characters would enter into an argument on the side of the person they had a crush on, for instance.)
Paralanguage refers to the aspects of communication aside from the words themselves. For spoken language, that might be pitch, volume, prosody; the presence of disfluencies such as “um”s and slurring, and one fairly often sees discussions of paralinguistics that focus particularly on those areas. “Paralinguistic respiration” refers to gasps, sighs, and other breathing noises that signal extra information.
The Routledge Pragmatics book linked above gets into questions like how intonation affects the social meaning of ritualized utterances, so that “thank you” or “sorry” are spoken in many contexts, but it’s the tone of delivery that determines whether this is to be understood as sincere. (Unit A9 of that book.)
However, paralinguistic considerations also apply to text-only conversations such as chat, and to other kinds of written communication — for instance, a texter’s decision to avoid punctuation and standard caps is in some communication channels a sign of friendly informality.
In inbound speech, paralinguistic cues are often better than the words themselves at indicating the mood of the speaker: the same literal text might be friendly, angry, or sarcastic depending on the mode of delivery. Research challenges in this space may ask participants to tell the emotional affect of the speaker based on audio qualities.
Generating paralinguistically convincing output is one of the main challenges for text-to-speech synthesis. There are markup languages (SSML being the most common) for marking speech in order to instruct a text-to-speech system about how to deliver something — but adding that markup can be a lot of work unless you have a system that will do a good job of it procedurally. And generating a voice model that will convincingly shift between “angry”, “happy”, or “sad” in generated output is also challenging.
Computational Paralinguistics covers this field in detail; unfortunately it is ferociously expensive.
Topic modeling seeks to identify what topic(s) are present in a source text. Often this is applied to larger sources than a few lines of dialogue — processing lots of documents and figuring out in general what they seem to be about — but in conversation, we might want to identify and model what is currently being discussed. Most of my own earliest work building conversation models for games focused on working with an awareness of topic (what have we discussed before, what are we talking about now, and how do topics relate).
I was also interested in how a character might choose to move to a new topic, or how they might interpret questions to keep their answers as much as possible part of a flow with previous dialogue. Galatea contained a rudimentary tree of nested topics and would prefer to traverse that tree by small rather than large amounts when answering questions. That was all hand-authored. Nowadays, especially since the development of word2vec and related approaches, there are more computational methods for determining or expressing proximity of concepts and allowing conversational agents to prefer nearby ideas over distant ones.
A dialog (or dialogue) act is an action within a conversational context that serves a particular purpose: to question, to request, to give information, and so on.
The concept is arguably a subset of the concept of speech acts. The idea of speech acts goes some way back in the history of linguistics — look for the works of Austin and Searle if you want to trace this back. The concept originated with theory around “performatives” like “I now pronounce you man and wife” or “I bet you five dollars that horse will win,” where the statement does not have truth value precisely and is instead changing something in the world by being spoken. A lot of the theory proposed around speech acts is now itself considered a bit old-fashioned and open to criticism.
However, “dialogue acts” refer more tightly to the function of words or phrases in a flow of dialogue, and this idea is used in conversation modeling studies even where the broader philosophy around speech acts is not especially helpful.
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech (A Stolcke et al, 2000) provides a computational linguistics approach to recognizing dialogue acts. The article also tabulates the dialogue acts found in the Switchboard corpus, and gives a good sense of the granularity of the types of actions under consideration here. For instance, the article talks about statements, questions, and requests, but also about back-channeling to signal comprehension while someone is talking (“Uh-huh”), hedging (“I’m not sure, but…”), appreciation (“I can imagine!”), and a number of other similar acts.
Dan Jurafsky’s chapter Pragmatics and Computational Linguistics follows on Stolcke (even replicating some of its tables), but provides more of a literature review of the surrounding field as well. Additional corpus references here from the Air Travel Information System dataset.
Dialogue acts are interesting from a narrative characterization perspective because they tend to reflect speaker personality, preferences, and relationship to the person listening.
Direct and Indirect Dialogue/Speech Acts
Another useful idea, especially for natural language understanding, is the distinction between direct and indirect speech acts. Direct acts mean exactly what they say; indirect acts require some level of inference from the listener. For instance, all of the following are (at least potentially) requests despite their different sentence types:
- Pass me another slice of cake. (Imperative)
- Would you please cut me some more cake? (Interrogative, focused on the doer)
- Could I please have another slice of cake? (Interrogative, permission-seeking)
- I would love another slice of cake. (Statement)
- Oh look at that, I’m already out of cake! (Statement)
- I do love chocolate. (Statement)
- wordless, melancholy stare at empty plate (Nonverbal performance)
Which of these we choose often depends on politeness and formality register, the relative social status of the people involved, and how much confidence we have that the person listening will understand us correctly. Gricean maxims offer some leverage on this problem, since (by the maxim of relevance) you would presumably not mention your love of chocolate unless it had some bearing on the current situation, and I can then guess what your point might be.
For (quite a lot) more on the reasoning behind identifying and interpreting indirect speech acts, see “Indirect Speech Acts”, Asher and Lascarides, 2006.
When speaking to a conversational partner that we know is not a fluent speaker, not a member of our own culture, or (increasingly relevant) not human, we tend to be more direct in order to avoid misunderstandings. This is one of several points that tends to distinguish corpora of recorded dialogue between humans, and corpora that collect people’s interactions with chatbots.
In some cultures, being too direct is very rude; in others, being too indirect is considered annoying — which brings us on to the next area:
Politeness Theory and Facework
Politeness theory addresses how people act courteously to each other, while “facework” refers to the idea of “saving face” or preserving the other person’s face. Another aspect of politeness theory explores the challenge of avoiding imposition on others (negative politeness) and of expressing gratitude and positive affinity (positive politeness).
Stephen Levinson was again a major contributor to the academic field, along with Penelope Brown, with their book Politeness: Some universals in language usage.
Computational approaches to this area include studies where a large corpus of text has been human-annotated as “polite” or “not polite”, and then a supervised learning approach trained a classifier to determine the politeness of subsequent data.
As is often the case with social content, it turns out that the features marking something as polite tend to be fairly specific to the context of communication and to the medium. At one point I experimentally trained some classifiers with data from the Stanford NLP politeness corpus referenced in this paper, which comes from user comments on Wikipedia and StackExchange. I found that they tended to be pretty inaccurate for conversational dialogue because they were attuned to the nuances of typed punctuation. Meanwhile, the word “homework” was treated as a major rudeness marker, because in the corpus it appeared mostly in the context of irritated StackExchange users telling others off for trying to use the site to do their homework for them.
This is from Keith Johnstone’s work on improv, rather than from linguistics texts. The essential idea is that, in conversation, people are constantly sending cues about their relative status, and whether they consider themselves to be higher or lower status than the person they are talking to.
Some status play moves in fact might consist of dialogue acts: hedging, for instance, might suggest low confidence and low status. Other status-related actions might come from tone of voice or other markers in the domain of paralinguistics.
The idea relates to politeness theory and face-saving, but provides some categories that I find especially useful for digging into narrative conflict moments between characters.