Validating the General Inquirer Dictionary

tags:

posted by Zeke Shore on Nov 15th, 2009

We have been trying to hunt down more information about the General Inquirer Dictionary, since it is currently serving as our primary reference table for emotively evaluating the words within the discussions of New York Times articles. We were able to get in contact with Roger Hurwitz, a research scientist at MIT’s Artificial Intelligence Lab, and one of the GI dictionary’s moderators, who was able to shed some light:

The General Inquirer scores sentiment in texts on the basis of surface text words whose root forms and contextually disambiguated senses mark negative or positive attitudes, per the General Inquirer dictionary.  I realize that sounds circular, but there are many such words in the dictionary, so that coverage has proved adequate and results have acceptable inter-coder reliability with scoring of the same texts by human coders.  The GI also scores texts in just over 200 other fields or any subset thereof per users’ desires.  these fields include expressions of the eight social values that political scientist Harold Lasswell found basic to human social activity.  Namenwirth and Weber using the GI and Lasswell values dictionaries to code American political party platforms and speeches from the British throne, respectively, found long and short value cycles in American and English society (following a relative attention paradigm, as measured by frequency of mention.)  The book Dynamics of Culture (Boston: Allen & Unwin, 1987) may be out of print.  However, an article by Namenwirth lays out the theory and is available online.

So I found and reviewed the J. Zvi Namenwirth study that was published in the Journal of Interdisciplinary History (MIT Press) in 1973. Namenwirth is mostly mapping public values through the content of presidential campaign transcriptions from 1844 to 1964.

The following two graphs show the frequency of the word ‘wealth’ over time, normalizing for transcript lengths, and begin to reveal some interesting cyclical patterns over the 120 year stretch.

namenwirth_plot1

namenwirth_plot2

These early natural language processing studies are interesting to look at, partially because of how much was accomplished with such little computational resources available. While word count may be a relatively trivial metric by today’s NLP capabilities, it does reveal interesting patterns over longer time lines.

This seems to validate our efforts to develop a lens through which the pre-aggregated corpus of the web can be analyzed through more rigorous NLP systems, revisiting what the General Inquirer Dictionary might be able to reveal.

The study is not openly published, so I cannot post the PDF on the site, but here is the citation and Jstor link:

J.Z. Namenwirth, “The Wheels of Time and the Interdependence of Value Change,” J. Interdisciplinary History, 3 (1973): 649-683
Stable URL: http://www.jstor.org/stable/202687