Last Thursday I was at at the PCG-meets-autotesting unconference at Falmouth, which organized into a bunch of work-groups to talk through ideas related to the conference theme. This was a really fun time, and I am grateful to the organizers and my fellow guests for making it so intriguing.
Our morning work-group started with a suggestion I had: what if there were a casual text-generation tool like Tracery, but that provided a similar level of help in assembling corpora for leaf-level node expansion? What would help new users learn about selecting and acquiring a corpus? What would help them refine to the point where they had something they were happy with using? (And for that matter, are there applications of this that we could see being useful to expert users as well? What could such a tool offer that is currently difficult to do?)
This idea sprang from some of my own discovery that I spend a lot of my own procgen development time simply on selecting and revising corpora. What will productively add to the feel and experience of a particular work, and what should be excluded? How small or large do the corpora need to be? Is there behavior that I can’t enforce at the grammar level and therefore have to implement through the nature of the corpus itself? (I talk a bit about those concerns during my PROCJAM talk (video, slides), especially under the Beeswax category.)
We had a great conversation with Gabriella Barros, Mike Cook, Adam Summerville, and Michael Mateas. The discussion ranged to cover a number of additional possibilities, some of which went considerably beyond the initial “naive user” brief here.
Existing corpus resources
We talked about where one can easily find corpora already, if it turned out that there was material available that could be usefully plugged into a tool.
Mentioned in this conversation: Darius Kazemi’s github corpora resource, containing lots of user-contributed corpora set up in JSON. DBpedia. ngrams as a source of common word pairings, or a way to find adjectives that are known to go with a particular noun/type of noun.
Scraping new corpora
What data sources are on the web that one could imagine building an auto-scraper for?
This is an area where Gabriella has a lot of experience, because much of her research is in games that make use of external data. (She spoke about those games the next day during the PROCJAM talks, which means that you can see her introduction to them on this Youtube video.)
Mike has an existing tool called Spritely that is designed to look for images on the web that are isolated enough to use, then convert them into a sprite-style format. We talked about whether something similar could be used for pulling in text materials with particular associations.
A mixed initiative tool is one in which the computer and the user both contribute to the creative output, sometimes building on one another’s work. (Here’s a great chapter about different approaches to mixed initiative by Antonios Liapis, Gillian Smith, and Noor Shaker, which outlines a lot of different possibilities.)
What would a mixed initiative tool look like for corpus generation? One possibility would be something where the user typed in some words and the system came back with a list of possibly related words that the user could then choose to add to the corpus or not.
Google Sets used to provide this service, but it’s apparently now no longer available.
Adam suggested that we might look at tools based on word2vec datasets: for instance, wordgrabbag is able to find words that are proximate in the vector space to the words the user suggests,
Meanwhile word2vec playground completes analogies based on user input. This is hugely fun to play with. Some sample output from that, which I enjoy because they make a sort of sense without being entirely predictable:
STAR is to SUN as OAK is to DENSE CANOPY
CABERNET is to CHOCOLATE as CHARDONNAY is to MARSHMALLOW
SOLID is to LIQUID as SOAP is to ETHANOL
and, okay, the ethanol one is a bit odd. But part of the fun of mixed initiative systems is that they offer the creator options she likely wouldn’t have thought of in the first place. Besides, we could also imagine corpora that involved groupings of words, or words plus tags, as well as individual words.
An I-Feel-Lucky corpus-scraper
We speculated about a variant where you could specify the general type of list you wanted (e.g., “a list of books”) and then the corpus tool went off to wikipedia and came back with one of several possible lists of books there, such as the List of Books Written By Teenagers, or the List of Books Related to Hippie Subculture. (Sadly, the List of Books about Japanese Drums link just led to a generic article about Taiko and didn’t feature that much of a bibliography after all.) The idea again would be to surprise and delight the creator as well as the eventual reader.
To aid this discussion, Gabriella introduced us to the wikipedia List of Lists of Lists, which is one of the most pleasingly meta things I have seen in a long time.
Filtering and editing corpora
Another idea we batted around was of pulling a corpus of words with a lot of associated tags and then letting the user turn on and off subsets of the corpus. This would apply not just to removing offensive terminology, but perhaps to other purposes as well. (How the words would be automatically tagged in the first place is also a good question; perhaps via WordNet, ConceptNet, or information derived from word2vec or some other method.)
We talked about being able to generate corpora that were prefiltered for diction level (formal? slangy?) or historical period (e.g., “a list of vehicles appropriate to 1830”). We also raised the possibility to filter words by sentiment rating, but sentiment analysis is not always particularly reliable or high-quality, so I am not sure what I think of the plausibility of this. On the other hand, having that setting in the tool might contribute to teaching users about the limits of sentiment analysis! So there’s that, perhaps.
Additional controls on grammars
Here we went a bit outside the lines of just talking about making a good corpus, and got into a conversation about other ways to put controls on the tool. The strategy here would be to allow the grammar to overgenerate — create more material than was needed, and create some material that wasn’t suitable — but be able to specify some constraints on that material after generation so that unsuitable things would be discarded.
Here we talked about ideas like nodes that could be marked up to produce, for instance, alliterative output. (Later in the weekend Adam showed me a project he’d put together where this was actually working on top of Tracery, to let one create Tracery projects that enforced alliteration. But I’ll let him link that project if he wants to share it with others.)
We also talked about controls that would apply to an entire sentence or paragraph of generated output, if we wanted to control for qualities or behaviors that would only manifest themselves at a macro scale.
So for instance, suppose you wanted to have a paragraph that was guaranteed to demonstrate varied sentence length. You could do this by making the grammar go sentence by sentence, remember the length of the last sentence, and try to get a different-length sentence this time. (This is what Savoir-Faire does with its sentence generation about thrown and dropped objects: it has a concept of short, medium, and long sentences, and tries not to make the same kind of sentence twice in a row.) But this can be laborious and a little clunky; and sometimes it might simply be impossible to generate something that corresponded to requirements.
Alternatively, you could have the grammar generate a lot of paragraphs without doing any particular memory or control of particular sentence, then select after the fact for paragraphs that qualified as sufficiently diverse; and you could do this with a machine-trained classifier that was able to apply fuzzier requirements to the output — making it more likely that you would get some match even with an incompletely populated grammar, and that there would be more variety from one output to the next.
Another idea I really liked (though I haven’t written down who initially proposed it) was the idea of a probability curve that you could apply to generation over the course of a whole sentence or paragraph. This idea arose out of some of my ideas about the distribution of surprise in generated text (for more on which, see my PROCJAM talk, and the concept of Venom in Annals of the Parrigues). But the idea was that the user might be able to specify a curve — perhaps low at the beginning, then gradually rising over the course of the paragraph; or perhaps presenting several distinct peaks — that would determine how likely the system was to choose a grammar element with a particular stylistic feature. (Being “surprising,” statistically rare, offensive, high or low diction would all count here.)
Finally, Adam raised the possibility of running a grammar a number of times while keeping track of which nodes were expanded, classifying the resulting text (e.g., is this output paragraph Hemingwayesque based on a machine-taught stylistic classifier?), and then using that information to build in percentages so that the generator would know how often to use expansion X rather than expansion Y when generating a Hemingway-style paragraph. (Essentially building the results of a Monte Carlo tree search back into the generator for future reference, as I understand it.)
Transformational grammars on top of generative grammars
We talked a bit about the concept of the transformational grammar and whether it would be useful to introduce some transformational grammar tools on top of the generative ones. (Later in the weekend Joris Dormans’ dungeon-generation talk came back to the concept of transformational grammars, but in a rather different context.)
Tools and visualization
Someone floated the idea of a “rich text Tracery”: one in which you could affect the likelihood of a particular corpus element being selected, or associate tags with it, by changing the font size and color of the entry. (I proposed that in a corpus of mixed modern and archaic words, the archaic words could be rendered in a flowy handwriting script. This is probably silly.)
I also shared some of the ideas from my previous posts here about visualization of procedurally generated text and about notifying the user when added corpus features actually reduce rather than improving the player’s perception of output variety.
That talk about experienced variety also led someone to mention a method by Tom Francis, in which a game starts with a generative system but then over time additional generative features are unlocked. (This is done purely on the basis of time spent playing, not on whether the player has succeeded at something or unblocked a new checkpoint.) The idea is to let the player get to where they feel they fully understand the range of output possible in the grammar, and then surprise them by demonstrating that there’s still more. This in turn reminded me of Rob Daviau‘s work with Legacy boardgames that introduce new mechanics over the course of repeated playing.
This PCG book has chapters online about a lot of current research and possible tools.
Tony Veale has done a lot of work with computer-generated metaphor and analogy.
The slides from Adam’s talk contain some great information about different machine learning algorithms and their uses, including when it comes to text.
This workshop revealed incidentally that wikipedia features a List of Knitters in Literature. I just needed to share that.
My notes also contain one or two lines about training an LSTM on a large corpus and then testing the output of a generator to see whether the generated sentence was probable or not probable. I can’t recall what challenge this solution was supposed to resolve, but possibly someone else from the workshop will find it jogs their memory.
On a later page I also have the phrase “tagged corpora trained bidirectional LSTM sequence to number” but I’m also not certain what that was about. It’s in green ink, though.
Finally, the notes stress the important of QA-testing a large generated space. They do not suggest a solution to this problem.