October 02, 2004
Presidential Debate Analysis
Whenever I watch a televised debate, I always wonder what percentage of the speaker’s message is actually thinking on the feet and how much is canned material. With the advent of available transcripts, these sorts of questions can be addressed with various computational methods.
A simple way to identify repeated statements is to count the number of times a particular noun phrase is metioned. Noun phrases act as both a proxy to the subject matter of a given piece of text, but also the way in which things are worded.
For this simple experiment, we’ll need four tools:
- The transcript (simplified from the original)
- Lingua::EN::Tagger, an English Part-of-speech tagger written in Perl
- phrases.pl, a perl script to parse the document and extract the noun phrases
- Debate Spotter, an interactive interface to visualize the results
The results are quite interesting. Looking only at noun phrases of at least 2 words occuring at least twice for a given speaker, we arrive at some spectacular catch phrases. For Bush my favorite is "hard work," which he said repeatedly. Apparently Bush thinks that the world is a difficult place to be. For Kerry, a salient phrase was "war as a last resort."
The top 25 phrases for Bush and Kerry follow. The number following each phrase is a rank described by the length of the phrase and the number of times it appeared.
There are so many other types of analysis that could be run on these data. If you find anything interesting, please let me know. Also, the Debate Spotter allows for any query, so post any interesting phrases that you find.