Emotive Analysis Process V1.1

tags:

posted by Zeke Shore on Nov 11th, 2009

Here is a quick update on how our emotive analysis engine is playing out. The end to end process (for this initial prototype) will work as follows:

First, the user provides a search query, and we pull (and cache) all of the NY Times articles that are related to that query that have comments using the Article Search API and the Community API (this will be made more efficient in the near future… more to come on that later).

After article or comment results to a query are returned from either the cache or a new API call, what we will need to deal with initially on the Natural Language Processing (NLP) side of the equation will be comments, in the form of text strings.

Using NLTK in python, there is an information extraction architecture that is structured as follows:

ie-architecture

For our purposes, one of the more difficult challenges that we have is knowing what words we care about. If we are trying to visualize the emotional or affective characteristics of the discourse surround a keyword, we cannot just look at the full thread of comments for an article that was returned for a given keyword, and log every word that holds emotive weight. The NY Times article Bipartisan Spirit, at Least for a Moment is a perfect example as to why not. The article is about a meeting between President Obama and George Bush Sr. So as one may guess, that article would have been returned when querying either ‘Bush’ or “Obama,’ and the 38-comment discussion that follows the article contains references to both.

So before any sort of emotive analysis can occur, we must parse the text down to the words that we care about. This first involves identifying instances of our keyword within each comment, and extracting the sentences that contain the keyword.

For further coverage, and also to account for the fact the web-based comments are often less verbose and less refined than other forms of discourse, if our keyword is a proper noun, we might also look at sentences with pronouns that immediately precede or follow sentences with our keyword.

Ultimately, we will need to develop a comprehensive weighted dependency grammar, so that we can efficiently parse the sentences that we care about into relatively accurate dependency structures. This will allows to know (with far more precision) what words are referring to or modifying our keyword, and should therefor be emotively classified.

depgraph0

So now the fun part. Once we know what words we care about in relation to our keyword, we will go back to Charles Osgood’s Semantic Differential Theory which maps words along three main axises: the Evaluative (good/bad), the Potency (strong/weak) and the Activity (active/passive) which I have discussed in a previous post. We can do this using the General Inquirer Dictionary, including the Lasswell Value Dictionary and the Harvard IV-4 dictionary, which maps about 12,000 words across Osgood’s semantic differential axises (among other classifications).

To make the process more efficient, since we have tagged the part of speech of every word, we can throw out words that we know should have neutral affective values, such any determiners (’the,’ ‘a,’ etc) or any proper nouns, and map every other word against our three axises. For each axis, we will give a word a value of 1, 0, or -1, so on the evaluative (EVA) axis, for example, any word living at the ‘positive’ or ‘good’ end of the axis would hold a value of 1, whereas a word living at the ‘negative’ or ‘bad’ end of the axis would hold a value of -1, and of course words that are neutral on the evaluative scale would hold a value of 0. This system would carry out across the activity (ACT) and potency (POT) axises as well in the form of

affectiveValue(word) = [EVA,  ACT, POT]

affectiveValue(respect) = [1, -1, 0]

Where the word “respect” holds an evaluative value of ‘positive’ or ‘good,’ an active value of ‘passive’ and a potency value of ‘neutral’ (neither ’strong’ nor ‘weak’).

So ultimately this will leave us with six lists of words for each article in relationship to a given keyword, which we can then use as metrics for our data visualization.

One Response to “Emotive Analysis Process V1.1”

  1. Jacque Frutiger says:

    Completely understand what your stance in this matter. Although I would disagree on some of the finer details, I think you did an awesome job explaining it. Sure beats having to research it on my own. Thanks

Leave a Reply