OpinionFinder: Open Source Sentiment Analysis Toolkit

tags:

posted by Zeke Shore on Feb 17th, 2010

While exploring existing sentiment analysis processes, we stumbled across what looks like a fully integrate open source solution to several issues identified in our recent round of research.

OpinionFinder appears to be hosted and primarily developed at the University of Pittsburgh with contributions from Cornell University and University of Utah. While the OpinionFinder system was only mentioned off hand in Bo Pang’s article Opinion Mining and Sentiment Analysis, it appears to include some of the best solutions available for a lot of the common challenges that accompany effective sentiment analysis.

OpinionFinder, which was initially released in 2006, employs a multi-stage NLP process. As stated in the project’s extended abstract,

“OpinionFinder aims to identify subjective sentences and to mark various aspects of subjectivity in these sentences, including the source (holder) of the subjectivity and words that are included in phrases expressing positive or negative sentiments.”

Working in “batch” mode as more of a back-end pipe, OpinionFinder works as follows:

Document Processing

Taking any incoming text source, HTML or XML meta info is removed, and sentences are split and POS tagged using OpenNLP. Next, stemming is accomplished using Steven Abney’s SCOL v1K stemmer program. SUNDANCE (Sentence UNDerstanding And Concept Extraction), a partial parser from the NLP laboratory at the University of Utah, is used by Autoslog-TS to identify extraction patterns needed by the sentence classifiers and the SourceFinder (which identifies the source of subjective content, distinguishing author statements from related or quoted statements). A final parse in batch mode establishes constituency parse trees which are converted to dependency parse trees for Named Entity and subject detection.

Subjectivity and Sentiment Analysis

At this point a Naive Bayes classifier identifies subjective sentences. The specs seem to indicate that the classifier is trained against subjective and objective sentences generated by two additional “rule-based” (unsupervised?) classifiers drawing from “a large corpus.” This point in the process will require some exploration and validation.

Next a direct subjective expression and speech event classifier, built by Eric Breck, tags the direct subjective expressions and speech events found within the document using WordNet.

The final step applies actual sentiment analysis to sentences that have been identified as subjective. This is accomplished with two classifiers that were developed using the BoosTexter machine learning program and trained on the MPQA Corpus.

Evaluation

While we still need to rigorously explore the source code, this system appears to be a gold mine of solutions to both previously unresolved and newly discovered issues in our sentiment analysis process. Named Entity detection along with dependency parse trees will help us filter content to only include sentiment regarding the actual topic being explored (rather than visualizing all subjective content in a comment) as well as helping to reveal popular related topics that exist within any given topic of discussion.

Subjectivity detection and Speech Event Classification are challenges that are acknowledged in a lot of research on the topic of sentiment analysis, but comprehensive solutions have been much more difficult to come by. This system seems to combine a few processes towards those goals (including leveraging WordNet in a new way), and again could really help us filter down our corpus to relevant statements of sentiment for a given topic.

Finally the actual positive/negative sentiment analysis that is applied to subjective sentences is different than any other process I have read about (most including WordNet and trained classifiers, or our original ad hoc method of matching against the General Inquirer Dictionary). We might want to experiment a bit with this phase to see how more or less effective different methods are.

One process that is surprisingly absent from the OpinionFinder system is any sort of negation detection. We may want to explore possibly integrating the algorithm Bruno Ohana experimented with in his dissertation on sentiment analysis, or investigate other solutions.

It also maybe be interesting to see how things change if we begin to stack some of the process used by OpinionFinder with systems that we already have in place, such as our GI Osgood Emotive Assignments.

You can download OpinionFinder for free from the project’s website under an open academic license, or download a PDF of the extended abstract/description of the project here:

OpinionFinder-Extended Abstract

Negation Detection Processes

tags:

posted by Zeke Shore on Feb 16th, 2010

One issue of accurate sentiment analysis  identified in a recent round of research is the problem of negation detection. This is the process by which a negating word (such as ‘not’) inverts the evaluative value of an affective word ( for example, “not good” is similar to saying “bad”). This can be resolved in natural language processing by identifying negating words, and then inverting the value of any positive or negative word within n-words of the negating word, where n is the window of potential negation.

In Bruno Ohan’s 2009 dissertation “Opinion Mining with with the SentWordNet Lexical Resource” (Dublin Institute of Technology), a python algorithm is presented to perform this task. While Ohan’s tests with this negation detection algorithm only yielded accuracy improvements of about 0.5%, this might be a good start point for further exploration.

#
# populates array of negated terms based on document terms
# negation[i] indicates if term in doc[i] is negated
#
def getNegationArray(doc, windowsize):
PSEUDO = ( 'no increase', 'no wonder', 'no change' , 'not cause' ,
'not only' , 'not necessarily' )
PRENEGATION = ( 'not' , 'no' , 'n\'t' ,'cannot', 'declined' ,
'denied' , 'denies' , 'free of' , 'fails to' , 'no evidence' ,
'no new' , 'no sign' , 'no suspicious' . 'no suggestion' ,
'rather than', 'with no' , 'unremarkable', 'without' ,
'rules out' , 'ruled out', 'rule out')
POSNEGATION = ( 'unlikely', 'free', 'ruled out' )
ENDOFWINDOW = ( '.', ':', ',', 'but' , 'however' , 'nevertheless' ,
'yet' , 'though' , 'although' , 'still' , 'aside from' , 'except' ,
'apart from')
# Initialise array
vNEG = [ 0 for t in range(len(doc)) ]
# Initialise window counters
winstart = 0
winend = min( windowsize, len(doc) - 1 )
docsize = len(doc)
i = 0
found_pseudo = 0
found_neg_fwd = 0
found_neg_bck = 0
inwindow = 0
for i in range(docsize):
 #
 # build 1-ter and 2-term strings
 #
 unigram = doc[i].split('/')[0]
 if i < (docsize - 1):
     bigram = unigram + ' ' + doc[i+1].split('/')[0]
 else:
   bigram = unigram
   #
   # Search for pseudo negations
   #
for negterm in PSEUDO:
   if bigram == negterm:
       found_pseudo=1
       ##print 'found pseudo!', bigram, i
   if (found_pseudo == 0):
     #
     # Look for pre negations
     #
     for negterm in PRENEGATION:
         if unigram == negterm or bigram == negterm:
             found_neg_fwd = 1
         for negterm in POSNEGATION:
             if unigram == negterm or bigram == negterm:
                 found_neg_bck = 1
 #
 # If found fwd/backw negation, then negate window
 #
 if (found_neg_fwd == 1):
     ##print 'found forwards!', unigram, bigram, i
     #
     # negate terms forward up to window
     #
     if inwindow < windowsize:
         vNEG[i] = 1
         inwindow+=1
     else:
         # out of window space
         found_neg_fwd = 0
         inwindow = 0
 #
 # backward negation
 #
 if (found_neg_bck == 1):
     ##print 'found backwards!', unigram, bigram, i
     #
     # negate back until window start
     #
     for counter in range(max(winstart, i-windowsize), i):
         vNEG[counter] = 1
         #
         # done with backwards negation
         #
         found_neg_bck = 0
 #
 # now move window
 #
 for negterm in ENDOFWINDOW:
     if unigram == negterm or bigram == negterm:
         #
         # found end of negation, must reset windows
         #
         ##print 'found negterm!', unigram, bigram, i
         inwindow = 0
         found_neg_fwd = 0
         winstart = i
         winend = min( windowsize + i, len(doc) - 1 )
return vNEG

New Sentiment Analysis Research

tags:

posted by Zeke Shore on Feb 16th, 2010

I have come across some fantastic Semantic Analysis research over the past few days, and was able to tap into several research papers and dissertations exploring computational Sentiment Analysis or Opinion Mining (OM). Two that provided significant insight were “Opinion Mining and Sentiment Analysis” (Pang et al, 2008) and “Opinion Mining with the SentWordNet Lexical Resource” (Ohana, 2009).

Recent progress in Opinion Mining techniques within natural language processing tasks identify a handful of challenges and potential solutions for accurate sentiment analysis of text based content.

Subjectivity

If our goal is to extract the sentiment, opinions or emotions of users, then we should really only be looking at subjective statements within a user’s comment. This will prevent positively or negatively charged words that are present in objective statements to effect the comment’s overall sentiment score.  Subjectivity could be assed through a trained classifier algorithm like Naive Bayes or Max Entropy.

On Topic

A concern for topic relevance is an issue that we were already aware of, and were searching (with much difficulty) for solutions with dependency grammars. This new round of research seems to dismiss that approach as unrealistically difficult (I’m thinking that could be a project on its own). Unfortunately no good solution strategies were explored for this issue.

Polarity

This is our root goal of applying a negative or positive sentiment score at various text-unit levels, such as word, sentence, or comment. While VoxPop has thus far been using the General Inquirer Dictionary evaluative definitions… It appears a few recent projects have been utilizing the WordNet  (which we explored earlier in our research) and news SentiWordNet lexicons for evaluative sentiment assignments.

Negation Detection

An issue that was just now revealed to us is the problem of Negation Detection. Consider the following two sentences:

Obama’s policies are good.

Obama’s policies are not good.

A normal polarity tagger would give these two sentences the same sentiment score, both of them containing containing 1 positive word (good). Of course our second sentence expresses the opposite of positive sentiment, with the  adverb ‘not’ inverting the value of “good.” A negation detection process aims to identify these negating word, and then invert the value of any positive or negative words that appear wither n-words before or after the negating term.

Here are PDFs of two of the more informative articles:

Opinion Mining and Sentiment Analysis
Bo Pang, Lillian Lee

Opinion mining and sentiment analysis

Opinion Mining with the SentWordNet Lexicon
Bruno Ohan

Opinion mining with the SentWordNet lexical resource

Validating the General Inquirer Dictionary

tags:

posted by Zeke Shore on Nov 15th, 2009

We have been trying to hunt down more information about the General Inquirer Dictionary, since it is currently serving as our primary reference table for emotively evaluating the words within the discussions of New York Times articles. We were able to get in contact with Roger Hurwitz, a research scientist at MIT’s Artificial Intelligence Lab, and one of the GI dictionary’s moderators, who was able to shed some light:

The General Inquirer scores sentiment in texts on the basis of surface text words whose root forms and contextually disambiguated senses mark negative or positive attitudes, per the General Inquirer dictionary.  I realize that sounds circular, but there are many such words in the dictionary, so that coverage has proved adequate and results have acceptable inter-coder reliability with scoring of the same texts by human coders.  The GI also scores texts in just over 200 other fields or any subset thereof per users’ desires.  these fields include expressions of the eight social values that political scientist Harold Lasswell found basic to human social activity.  Namenwirth and Weber using the GI and Lasswell values dictionaries to code American political party platforms and speeches from the British throne, respectively, found long and short value cycles in American and English society (following a relative attention paradigm, as measured by frequency of mention.)  The book Dynamics of Culture (Boston: Allen & Unwin, 1987) may be out of print.  However, an article by Namenwirth lays out the theory and is available online.

So I found and reviewed the J. Zvi Namenwirth study that was published in the Journal of Interdisciplinary History (MIT Press) in 1973. Namenwirth is mostly mapping public values through the content of presidential campaign transcriptions from 1844 to 1964.

The following two graphs show the frequency of the word ‘wealth’ over time, normalizing for transcript lengths, and begin to reveal some interesting cyclical patterns over the 120 year stretch.

namenwirth_plot1

namenwirth_plot2

These early natural language processing studies are interesting to look at, partially because of how much was accomplished with such little computational resources available. While word count may be a relatively trivial metric by today’s NLP capabilities, it does reveal interesting patterns over longer time lines.

This seems to validate our efforts to develop a lens through which the pre-aggregated corpus of the web can be analyzed through more rigorous NLP systems, revisiting what the General Inquirer Dictionary might be able to reveal.

The study is not openly published, so I cannot post the PDF on the site, but here is the citation and Jstor link:

J.Z. Namenwirth, “The Wheels of Time and the Interdependence of Value Change,” J. Interdisciplinary History, 3 (1973): 649-683
Stable URL: http://www.jstor.org/stable/202687

Emotive Analysis Process V1.1

tags:

posted by Zeke Shore on Nov 11th, 2009

Here is a quick update on how our emotive analysis engine is playing out. The end to end process (for this initial prototype) will work as follows:

First, the user provides a search query, and we pull (and cache) all of the NY Times articles that are related to that query that have comments using the Article Search API and the Community API (this will be made more efficient in the near future… more to come on that later).

After article or comment results to a query are returned from either the cache or a new API call, what we will need to deal with initially on the Natural Language Processing (NLP) side of the equation will be comments, in the form of text strings.

Using NLTK in python, there is an information extraction architecture that is structured as follows:

ie-architecture

For our purposes, one of the more difficult challenges that we have is knowing what words we care about. If we are trying to visualize the emotional or affective characteristics of the discourse surround a keyword, we cannot just look at the full thread of comments for an article that was returned for a given keyword, and log every word that holds emotive weight. The NY Times article Bipartisan Spirit, at Least for a Moment is a perfect example as to why not. The article is about a meeting between President Obama and George Bush Sr. So as one may guess, that article would have been returned when querying either ‘Bush’ or “Obama,’ and the 38-comment discussion that follows the article contains references to both.

So before any sort of emotive analysis can occur, we must parse the text down to the words that we care about. This first involves identifying instances of our keyword within each comment, and extracting the sentences that contain the keyword.

For further coverage, and also to account for the fact the web-based comments are often less verbose and less refined than other forms of discourse, if our keyword is a proper noun, we might also look at sentences with pronouns that immediately precede or follow sentences with our keyword.

Ultimately, we will need to develop a comprehensive weighted dependency grammar, so that we can efficiently parse the sentences that we care about into relatively accurate dependency structures. This will allows to know (with far more precision) what words are referring to or modifying our keyword, and should therefor be emotively classified.

depgraph0

So now the fun part. Once we know what words we care about in relation to our keyword, we will go back to Charles Osgood’s Semantic Differential Theory which maps words along three main axises: the Evaluative (good/bad), the Potency (strong/weak) and the Activity (active/passive) which I have discussed in a previous post. We can do this using the General Inquirer Dictionary, including the Lasswell Value Dictionary and the Harvard IV-4 dictionary, which maps about 12,000 words across Osgood’s semantic differential axises (among other classifications).

To make the process more efficient, since we have tagged the part of speech of every word, we can throw out words that we know should have neutral affective values, such any determiners (’the,’ ‘a,’ etc) or any proper nouns, and map every other word against our three axises. For each axis, we will give a word a value of 1, 0, or -1, so on the evaluative (EVA) axis, for example, any word living at the ‘positive’ or ‘good’ end of the axis would hold a value of 1, whereas a word living at the ‘negative’ or ‘bad’ end of the axis would hold a value of -1, and of course words that are neutral on the evaluative scale would hold a value of 0. This system would carry out across the activity (ACT) and potency (POT) axises as well in the form of

affectiveValue(word) = [EVA,  ACT, POT]

affectiveValue(respect) = [1, -1, 0]

Where the word “respect” holds an evaluative value of ‘positive’ or ‘good,’ an active value of ‘passive’ and a potency value of ‘neutral’ (neither ’strong’ nor ‘weak’).

So ultimately this will leave us with six lists of words for each article in relationship to a given keyword, which we can then use as metrics for our data visualization.

Research Progress Presentation

tags:

posted by Zeke Shore on Oct 24th, 2009

We recently presented our research progress, mostly focusing around proof of concept results of using the NYT API as an effective corpus, and exploring the work of Charles Osgood, Shortest Path Distance mapping with WordNet in the NLTK, and mapping words against the Lasswell Value Dictionary. We show some initial emotive analysis on a NYT article comment (which was part of a 38 comment discourse) to show what our WordNet + Lasswell engine might reveal.

nyt_lasswel_analysis_2

Download the full presentation below:

research_presentation

Coupling Niche Browsers and Affect Analysis for an Opinion Mining Application

tags:

posted by Zeke Shore on Oct 13th, 2009

Abstract

Newspapers generally attempt to present the news objectively. But textual affect analysis shows that many words carry positive or negative emotional charge. In this article, we show that coupling niche browsing technology and affect analysis technology allows us to create a new application that measures the slant in opinion given to public figures in the popular press.

by Gregory Grefenstette, Yan Qu, James G. Shanahan, David A. Evans

Download the full PDF:

Coupling niche browsers and affect analysis for an opinion mining application

Notes and Further Exploration

Interesting research on computationally evaluating emotional charge in news reports from various sources surrounding various topics. The research utilized the Lasswell Value Dictionary, which was a project started in the 1960s as a lexicon of over 2,000 words emotively classified and rated. The formatted text version of the General Inquiry library could serve as an invaluable resource for the VoxPop project, providing a source of quantitatively mapped linguistic information that web based discourse could be mapped against.

Some interesting findings from the research include the fact that positivity charged words are often grouped with other positively charged words, and likewise negatively charged words are often grouped with other negatively charged words, allowing word proximity between classified and unclassified words to serve as a machine-learning tactic.

This article strongly demonstrates the complexity of attempting to accurately quantify the emotional tone of a body of text across even just two dimensions: positive and negative. While the process involved in this project was not simple, it was successful, and could definitely be tailored the emotive qualities of a heated op-ed debate surrounding a topic of interest.

Due to the complex classification network employed by the Lasswell General Inquiry Dictionary and the Harvard IV-4 dictionary, I would imagine more layers of emotive understanding could be quantitatively extracted from linear textual exchanges (i.e, emotions exchanged between users vs. emotions towards the seeding article).

Words With Attitude

tags:

posted by Zeke Shore on Oct 9th, 2009

Abstract

The traditional notion of word meaning used in natural language processing is literal or lexical meaning as used in dictionaries and lexicons. This relatively objective notion of lexical meaning is different from more subjective notions of emotive or affective meaning. Our aim is to come to grips with subjective aspects of meaning expressed in written texts, such as the attitude or value expressed in them. This paper explores how the structure of the WordNet lexical database might be used to assess affective or emotive meaning. In particular, we construct measures based on Osgood’s semantic differential technique.

By Jaap Kamps and Maarten Marx

Download the full PDF bellow

Words with attitude

Notes and Further Exploration

Kamps and Marx present some interesting research that is very applicable to emotively analyzing online discourse. In this paper, subjective understanding is computationally extracted from text using Charles Osgood’s Theory of Semantic Differentiation as a guide for mapping word relationships in Princeton University’s WordNet Lexical Database. Osgood’s work in the late 1950’s established

“semantic differential technique is using several pairs of bipolar adjectives to scale the responses of subjects to words, short phrases, or texts. That is, subjects are asked to rate their meaning on scales like active–passive; good–bad; optimistic–pessimistic; positive–negative; strong–weak; serious–humorous; and ugly–beautifully”

Osgood research further revealed that most variance in affective meaning assigned itself to three major factors:

“These three factors of the affective or emotive meaning are the evaluative factor (e.g., good–bad); the potency factor (e.g., strong-weak); and the activity factor (e.g., active–passive). Among these three factors, the evaluative factor has the strongest relative weight.”

Using these three emotive axises, Kamps and Mark mapped words as they related to “good” and “bad” within the context of the WordNet lexical database through finding minimal path lengths:

The minimal path-length is a straightforward generalization of the synonymy relation. The synonymy relation connects words with similar meaning, so the minimal distance between words says something on the similarity of their meaning.

Words can now be scaled via this process as either negative, neutral, or positive. This same process can be followed to assign activity and potency ratings to words as well, revealing clear emotive qualities of the content.