Design Iteration 1
tags: color NYT API process prototype visualization
posted by Zeke Shore on Jan 26th, 2010
While we have been good about posting research progress as it comes, progress on the design front has been a bit too quite. Here are some early iterations of the User Interface design process, and what we are learning as we go.
Working off of the data that we were beginning to generate, our starting point was a collection of user comments for all of the New York Times articles that would be returned for any given query. By parsing through the comments, we could match words against the General Inquirer Dictionary across Charles Osgood’s three-axis theory of Semantic Differentiation. You can read more about the first version of our emotive analysis process in my previous post on the subject.

So the initial output we decided to shoot for was essentially six lists of words from the comments for each article that is retrieved for a given query. Along the evaluative axis we would have a list of ‘positive’ words (shown above in green) and ‘negative’ words (in red), along the activity axis we would have a list of ‘active’ words (in orange) and passive words (in brown), and along the potency axis we would have a list of ’strong’ words (in blue) and ‘weak’ words (in gray).

Flushing out the design of this model included four “states.” Collapsing the emotive word lists for each article would yield colored bars extending above and bellow a base line of articles. Theoretically, this would reveal trends in the quantities of these emotively charged words over time for discussions surrounding any keyword. Clicking on a specific article would reveal the actual list of words that are being described by the colored bars of the collapsed view. Extending the idea, hovering over any word could potentially show the sentence from which that word was retrieved, and hovering over the article title could reveal the abstract of the article, and clicking either would bring the user through to the article or the specific comment on the New York Times website, all in an effort to provide easy contextual access as a validation tool.
So we built a prototype of this visualization. We did not build out all of the interaction levels spec’ed in the initial mockups, but even just getting a list of articles with their corresponding lists of emotively classified words from the discussions surrounding them seemed like a good starting point for exploring the data.

This prototype revealed a lot. The first obvious conclusion is that we are dealing with way more data than could be meaningfully expressed as ‘lists of words’. Even scaling the text size down bellow legibility did not allow most lists to be viewed in their entirety in a normal web browser window.
Another problem is that the data is really hard to read if you don’t already have a strong understanding of what was going on behind the scenes. This organization does not show the three clear axises that the discussions are being mapped against. Furthermore, this model gives equal weight to all of our emotive axises, despite Osgood’s conclusion that evaluative distinction (positive/negative) carries the most emotive weight, which is then supported by activity and potency as the second two most significant factors.
One more problem that this prototype revealed is the homogenizing effect that results from extracting lists of words at the level of the entire conversation rather than specific comments. One really long nasty comment could skew the negative word count for an entire conversation when looking at the data at this level of abstraction, and that sort of misrepresentation could be a serious cause for concern. The project is called VoxPop stemming from the Latin term Vox Populi, meaning “voice of the people.” This visualizing attempt was not yet showing the voices of any ‘people’… rather averaging out the ebbs and flows of entire conversations.
More to come on our newer design iterations soon.