We are excited to finally be wrapping up an alpha version of VoxPop for the Parsons thesis exhibition. Come by the Sheila C. Johnson Design Center, 66 5th Ave (right at 13th St.) between 6 and 8 PM on Tuesday, June 1st, to see our installation and enjoy the show’s opening reception. The gallery will also be open to the public through saturday from noon to 6 PM every day. More information can be found on the exhibition website.
While we aren’t quite ready to open up VoxPop publicly on the web (still have a bit of QA and scaling to work out), here is a sneak peak of what we have cooking, and be sure to swing by Parsons School of Design over the next few days to play with it a bit.
We have a first pass at our thesis show installation proposal. VoxPop would be displayed on a Samsung 47″ HD Screen through a Mac Mini housed within a pedestal supporting a Wacom Bamboo Touch Tablet, allowing viewers to interact with the web application.
The New York Times online consistently delivers interesting data visualizations to help enrich the stories surrounding popular news topics. The New York Times Innovation Portfolio provides a beautiful overview of all of these interactive explorations, organized by topic with project overviews, documention, and links to the actual interactive pieces.
Since VoxPop is working with New York Times data, this collection of existing data visualizations is a treasure trove of strong precedents, several of which relate very closely to our project. Here are a handful that live within the realm of reader sentiment.
Health Care Debate is a conversation platform that allows users to discuss various issues within the health care debate. The most interesting aspect of this tool is how the relevance of specific sub-topics within the debate can be instantly comprehended at first glance, with the surface of the tool depicting multiple “rooms” that are scaled relative to the number comments relating to that subtopic.
This interactive video of Obama’s speech to the Muslim world allows users to provide comments along the timeline of the speech, allowing a global discussion to unfold in the context of the time-based content that is seeding the discussion.
Election Word Train asked New York times readers to share one word that describes their current state of mind on the day of the 2008 presidential election. Much like a tag cloud, words are scaled relative to the number of people sharing the sentiment, and can be filtered to show words shared by Obama or McCain supporters. By leveraging scale and letting these words ’speak for themselves’ does effectively provide a general glimpse of reader sentiment, even if the forum is is somewhat contrived, specifically with the goal reducing group sentiment into a few dozen words, possibly hindering truly organic sentiment visualization.
Inaugural Words ranks the frequency of words used by presidents in Inaugural Addresses, showing what words each president used the most. While is not really reflecting reader sentiment. it does show an interesting break down of word frequency across time and political position.
The Twitter Bowl interactive visualization maps twitter chatter over the course of the 2009 Super Bowl, according to key topic mentions. This hits an interesting cross section of communicating time, space, and group sentiment, even if it is somewhat cryptic in what is actually being communicated. There is something very satisfying about seeing topics grow and shrink geographically over time, although it does not reveal what specifically about “steelers” or “ads” or “springsteen” people are sharing.
These projects all have several aspects that worth analyzing and building upon. As we begin to re-think how people engage with the news, its exciting to see major players like the New York Times continuing to push the envelope, and continue to keep their data open so that others can do the same.
The last design iteration I wrote about a couple weeks ago started to take a departure from earlier iterations by exploring the idea of representing the personality of every comment on the New York Times website (that relates to any given topic) as it’s own entity, and visually describing it’s sentiment or personality.
After reflecting back on our original reasons for wanting to visualize online discussions, our thesis question really centers around how can the ‘Vox Populi‘ still be heard as reader participation in the journalistic process scales to hundreds of thousands of comments spread across hundreds of articles and blog posts for even just one news source.
So this resulted in a design prototype that involved rendering comments for a given topic as balls swarming around the article that seeded the conversation, representing sentiment with color, opacity, and speed of movement, describing each comment’s polarity (how positive or negative), strength (strong or weak) and activity (active or passive) respectively.
While this iteration was both readable and interesting to look at, it suffered in terms of scalability. We could realistically only look at a couple conversations at a time for any given topic. So the next phase of the design process involved trying pull some of the more successful aspects of this iteration into a more real estate friendly composition. The logical progression of this involved breaking conversation into a linear organization (all of the following mock ups are not visualizing real data, but rather serving as design explorations).
Of course horizontal flows of information are rarely web-friendly, despite it being a logical way to organize content chronologically. So this quickly evolved into a vertical orientation, and opened the door for exploring the concept of possibly showing when commenters reference each other within a conversation.
After evaluating some of the issues we observed in the first design iteration for the VoxPop visualization model, we were able to establish some more criteria as the design evolved.
One idea that I explored early on was the concept of these emotive forces “pulling” the conversation in various directions. While the metaphor seemed interesting, data quantity vs. real estate would be even more of an issue than with the first design iteration. Another big take away from the first prototype was the non-representational and homogenizing side effect of grouping words at a full conversation level, as one abnormally “colorful” comment could dramatically swing the visual representation of the entire conversation.
Additionally, breaking words into these six groups unnecessarily abstracts the way in which these semantic classifications are actually describing attitudes. Going back to Charles Osgood’s Semantic Differential theory, his studies revealed that the Evaluative scale (good to bad) is the primary axis by which study participants classified affective meaning, followed by Potency and Activity as the two other universal characteristics of affective classification. rather than thinking of these three axises as separate scales, when trying visualize emotional qualities of these comments it makes more sense to have these metrics somehow layering on top of each other so that in conjunction, they draw the full “personality” of each comment.
So the idea of representing each comment as its own entity developed. In order for these three emotive characteristics to be able to layer on top of each other, representing each of the six poles as a different color (as done in design iteration 1) would not work.
In this next design iteration, The Evaluative scale is the only metric represented across a color spectrum, ranging from blue (negative) to yellow (positive) where the number of positive words minus the number negative words dictates the comment’s place along the color spectrum. Comments with mostly positive words would be closer to pure yellow, and comments with mostly negative words would be closer to pure blue, with more neutral and balanced comments being various shades of green.
The Potency scale (strong to weak) could then be represented with opacity, where the number of “strong” words minus the number of “weak” words would dictate where the comment lives on the opacity spectrum between fully opaque and almost completely transparent.
Our last axis is Activity (active to passive). While the color-inspired evaluative scale requires a somewhat subjective color scheme choice which of course will involve cultural and personal influences, opacity actually serves as a conveniently clear metaphor for our ’strong to weak’ continuum. The ‘Activity’ axis presented a similarly convenient direct metaphor that could be exploited. The more active a comment is, it could actually be moving faster and farther, and likewise the more passive a comment is, the more stagnant it would be.
The final characteristics of this design iteration was that the scale of each comment could be determined by the number of affective words it has. This could arguably be thought of as how “loud” the comment was (although this is open to some debate, a really long comment with lots of weak and passive words isn’t really “loud”… more “verbose”).
So we created a new prototype exploring how this design iteration might look with real data. We started with just three articles that the comments would “swarm” around. To help keep the composition balanced and as easy to read as possible, we had comments on the ‘positive’ side of the evaluative scale swarm on top of the article’s title, and comments on the ‘negative’ side of the evaluative scale swarm bellow the comment title. (The blue and yellow evaluative colors were accidentally reversed in this version of the prototype, with blue at the ‘positive’ end and yellow at the ‘negative’ end).
Evaluation
This design iteration was beginning to show a lot of promise. With a quick glace a viewer might more successfully read the tone of the entire conversion surrounding the article, while still preserving the ‘personality’ of each individual comment. This was also starting to become more interesting to look at. Active comments would quickly nestle their way in towards the article title while passive comments float aimlessly at the perimeter.
A major flaw with this design iteration was that we were moving in wrong direction with maximizing out screen real estate, with only three articles being viewable at a time. While the readability of each article’s conversion had improved, we were even further from being able to see any sort of trends or evolution in discourse surrounding a topic over time.
While we have been good about posting research progress as it comes, progress on the design front has been a bit too quite. Here are some early iterations of the User Interface design process, and what we are learning as we go.
Working off of the data that we were beginning to generate, our starting point was a collection of user comments for all of the New York Times articles that would be returned for any given query. By parsing through the comments, we could match words against the General Inquirer Dictionary across Charles Osgood’s three-axis theory of Semantic Differentiation. You can read more about the first version of our emotive analysis process in my previous post on the subject.
So the initial output we decided to shoot for was essentially six lists of words from the comments for each article that is retrieved for a given query. Along the evaluative axis we would have a list of ‘positive’ words (shown above in green) and ‘negative’ words (in red), along the activity axis we would have a list of ‘active’ words (in orange) and passive words (in brown), and along the potency axis we would have a list of ’strong’ words (in blue) and ‘weak’ words (in gray).
Flushing out the design of this model included four “states.” Collapsing the emotive word lists for each article would yield colored bars extending above and bellow a base line of articles. Theoretically, this would reveal trends in the quantities of these emotively charged words over time for discussions surrounding any keyword. Clicking on a specific article would reveal the actual list of words that are being described by the colored bars of the collapsed view. Extending the idea, hovering over any word could potentially show the sentence from which that word was retrieved, and hovering over the article title could reveal the abstract of the article, and clicking either would bring the user through to the article or the specific comment on the New York Times website, all in an effort to provide easy contextual access as a validation tool.
So we built a prototype of this visualization. We did not build out all of the interaction levels spec’ed in the initial mockups, but even just getting a list of articles with their corresponding lists of emotively classified words from the discussions surrounding them seemed like a good starting point for exploring the data.
This prototype revealed a lot. The first obvious conclusion is that we are dealing with way more data than could be meaningfully expressed as ‘lists of words’. Even scaling the text size down bellow legibility did not allow most lists to be viewed in their entirety in a normal web browser window.
Another problem is that the data is really hard to read if you don’t already have a strong understanding of what was going on behind the scenes. This organization does not show the three clear axises that the discussions are being mapped against. Furthermore, this model gives equal weight to all of our emotive axises, despite Osgood’s conclusion that evaluative distinction (positive/negative) carries the most emotive weight, which is then supported by activity and potency as the second two most significant factors.
One more problem that this prototype revealed is the homogenizing effect that results from extracting lists of words at the level of the entire conversation rather than specific comments. One really long nasty comment could skew the negative word count for an entire conversation when looking at the data at this level of abstraction, and that sort of misrepresentation could be a serious cause for concern. The project is called VoxPop stemming from the Latin term Vox Populi, meaning “voice of the people.” This visualizing attempt was not yet showing the voices of any ‘people’… rather averaging out the ebbs and flows of entire conversations.
I wrote previously about Jonathan Harris‘ project We Feel Fine from 2006, but it appears that the project has not grown dormant since it’s initial buzz. Harris has recently completed a book documenting his process of emotive exploration, and has compiled some interesting data over the three years that the project has existed. It’s exciting to see Harris return to printed work, since so many of his projects have lived within the digital realm. That said, the sample pages that he has up on the book’s website are very interesting.
Harris’ playful aesthetic appears to carry through to print form elegantly (I’m excited to see these spreads in the actual context of the book), while still managing to take a refreshingly academic departure from the original project. Reflecting back on the project after three years also adds the notion of time to the project that was frustratingly absent from it’s original manifestation.
The book will be available December 1st, published by Simon and Schuster, and it will be finding it’s way to my bookshelf shortly thereafter.
Christian Swinehart, an MFA student at RISD recently completed a project that explores the narrative paths of those Choose Your Own Adventure books that were popular in the 1980s. While the topic of exploration might be a bit trivial, the visualization solutions and execution are definitely noteworthy.
Specifically, Swinehart achieves a surprisingly sophisticated aesthetic utilizing a dark background, which can be difficult to pull off successfully in web based contexts. The color pallet is both diverse and cohesive, with points of saturation used sparingly within primarily light gray structural forms.
The Flash based animations that Swinehart uses to demonstrate narrative flows are also quite beautiful, unfortunately at the expense of removing themselves from any sort of informative context. However, this does serve as an intriguing precedent for visualizing flows of connection that exist within a parenting organization system (in this case, the time line of the story). This is an idea we may explore if we end up trying to visualize how users react to each others comments within a discourse.
Canadian artist Jer Thorp over at blog.blprnt does some pretty interesting computational (primarily with processing) information design pieces. Recently he has been doing some projects using the NYT API. One if his first experiments with the API maps the frequency of the words ‘internet’, ‘web’ and ‘twitter’ in the New York Times from the 1990-2008:
In addition to interesting work, Thorp also provides several comprehensive data processing development tutorials, and releases many of his projects as open Processing libraries, allowing the information design community to evolves his concepts and push development efforts forward.
Digital artist Jonathan Harris created an interesting data visualizing piece in 2006 called We Feel Fine. The project crawls the web for blog posts that contain the words “I” and “feel” in the same sentence then extracts that sentence from it’s original context. This content is visualized through a swarm of “feelings” as bouncing colored balls, interacting on a dark canvas. These balls swarm around the user’s cursor, prompting them to be clicked open and explored, allowing the feelings to be read in there entirety.
We Feel Fine functions strongly as a poetic exploration of feelings as they surface on the web, and the style of the visualization supports the idea of the web as a living breathing repository of human thought. The project does begin to explore this data through a quantitative lens, allowing users to sort feelings by year, gender, age, weather, location, and even the associated adjectives that accompanied these “feelings.” Alternate views of the data also begin to reveal more meaningful trends when examined.
Also of interest, We Feel Fine has an open API allowing other developers to harness the data that they crawl and format so that it can be used by other projects. More information on the API is available at http://wefeelfine.org/api.