OpinionFinder: Open Source Sentiment Analysis Toolkit
tags: NLP OpinionFinder python sentiment analysis WordNet
posted by Zeke Shore on Feb 17th, 2010
While exploring existing sentiment analysis processes, we stumbled across what looks like a fully integrate open source solution to several issues identified in our recent round of research.
OpinionFinder appears to be hosted and primarily developed at the University of Pittsburgh with contributions from Cornell University and University of Utah. While the OpinionFinder system was only mentioned off hand in Bo Pang’s article Opinion Mining and Sentiment Analysis, it appears to include some of the best solutions available for a lot of the common challenges that accompany effective sentiment analysis.
OpinionFinder, which was initially released in 2006, employs a multi-stage NLP process. As stated in the project’s extended abstract,
“OpinionFinder aims to identify subjective sentences and to mark various aspects of subjectivity in these sentences, including the source (holder) of the subjectivity and words that are included in phrases expressing positive or negative sentiments.”
Working in “batch” mode as more of a back-end pipe, OpinionFinder works as follows:
Document Processing
Taking any incoming text source, HTML or XML meta info is removed, and sentences are split and POS tagged using OpenNLP. Next, stemming is accomplished using Steven Abney’s SCOL v1K stemmer program. SUNDANCE (Sentence UNDerstanding And Concept Extraction), a partial parser from the NLP laboratory at the University of Utah, is used by Autoslog-TS to identify extraction patterns needed by the sentence classifiers and the SourceFinder (which identifies the source of subjective content, distinguishing author statements from related or quoted statements). A final parse in batch mode establishes constituency parse trees which are converted to dependency parse trees for Named Entity and subject detection.
Subjectivity and Sentiment Analysis
At this point a Naive Bayes classifier identifies subjective sentences. The specs seem to indicate that the classifier is trained against subjective and objective sentences generated by two additional “rule-based” (unsupervised?) classifiers drawing from “a large corpus.” This point in the process will require some exploration and validation.
Next a direct subjective expression and speech event classifier, built by Eric Breck, tags the direct subjective expressions and speech events found within the document using WordNet.
The final step applies actual sentiment analysis to sentences that have been identified as subjective. This is accomplished with two classifiers that were developed using the BoosTexter machine learning program and trained on the MPQA Corpus.
Evaluation
While we still need to rigorously explore the source code, this system appears to be a gold mine of solutions to both previously unresolved and newly discovered issues in our sentiment analysis process. Named Entity detection along with dependency parse trees will help us filter content to only include sentiment regarding the actual topic being explored (rather than visualizing all subjective content in a comment) as well as helping to reveal popular related topics that exist within any given topic of discussion.
Subjectivity detection and Speech Event Classification are challenges that are acknowledged in a lot of research on the topic of sentiment analysis, but comprehensive solutions have been much more difficult to come by. This system seems to combine a few processes towards those goals (including leveraging WordNet in a new way), and again could really help us filter down our corpus to relevant statements of sentiment for a given topic.
Finally the actual positive/negative sentiment analysis that is applied to subjective sentences is different than any other process I have read about (most including WordNet and trained classifiers, or our original ad hoc method of matching against the General Inquirer Dictionary). We might want to experiment a bit with this phase to see how more or less effective different methods are.
One process that is surprisingly absent from the OpinionFinder system is any sort of negation detection. We may want to explore possibly integrating the algorithm Bruno Ohana experimented with in his dissertation on sentiment analysis, or investigate other solutions.
It also maybe be interesting to see how things change if we begin to stack some of the process used by OpinionFinder with systems that we already have in place, such as our GI Osgood Emotive Assignments.
You can download OpinionFinder for free from the project’s website under an open academic license, or download a PDF of the extended abstract/description of the project here:
Social comments and analytics for this post…
This post was mentioned on Twitter by darrenrush: OpinionFinder -an open source sentiment analysis & natural language system (incl named entity extraction) http://bit.ly/dkyc1a…