VoxPop aims to semantically analyze online discussions in relation to their seeding ideas, and temporally and emotively visualize the discourse surrounding online content.

By leveraging Natural Language Processing theory (NLP) with the Natural Language TooKit, an open source set of Python-based modules developed to aide in computational linguistic analysis, we aim to develop a semantic language processing engine that can categorically group online responses into emotive classifications within both the context of the content that seeded them, and the context of the discussion within which each response lives.

We will initially utilize the New York Time’s open APIs ‘Article Search’ and ‘Community’ to have access to every New York Times article that has been published since online commenting on articles has been allowed, as well as the associated discourses that surround these articles. Access to well formatted discussions and their seeding content will allow us to focus on developing our language processing engine to map emotive response to all previously published New York Times articles. An additional New York Times API, ‘Newswire,’ will give us access to new content as it it is being released, so that new content and discourse can be mapped in real time.

The project will manifest itself as a web application, allowing users to search keyword strings, and return a dynamic and explorable information graphic mapping the emotive responses to any topic of interest within a limited historical context (as online commenting is a relatively young form of discourse), and map the evolving emotive response to these topics of interest as these discussions unfold in real time.

Technically, the back end will be built with Python and Google App Engine, and the front end will aim to push the boundaries of what non-proprietary web technologies can accomplish in User Interface Design, specifically leveraging Canvas and HTML5 in conjunction with innovative use of javascript.

After successfully parsing and visualizing the well structured and easily accessible data sets that are available through the New York Times APIs, the project could continue to evolve with the development of a web scraping system to find and parse any content/discourse set that is online, and expand the scope of what is being examined to that of the entire web.

Harnessing intelligent semantic language processing across the conversations and associated content within everything from news sites and blogs, to social networks and Twitter, we could ultimately achieve a more universal perspective and active real time exploration of the emotive responses surrounding anything as they unfold. Theoretically, a fully expanded scope of content along with a continually evolving semantic language processing engine holds the potential to provide both anthropological and sociological perspectives on the under examined phenomenon of cloud-based discourse.