Saturday, November 6, 2010

How do we reveal the hidden patterns in data?

There they are: hundreds of digits nestled in their little cells, staring back at you from within the Excel file they call home. Like a swarm of bees, the numbers assault your mind with a collective buzz signifying nothing. But there is a language to learn. You need to pull the melody out of the static, to give these pixelated numbers color, texture, flavor and symbolic meaning. You've got to visualize this data.

At the Data Visualization for Reporting and Storytelling session, three speakers discussed how to make information beautiful, how to mine data sets for hidden narratives and where to find free online data imaging tools anyone can use. The session featured Peter Aldhous, San Francisco bureau chief of New Scientist, David Harris, Editor of Symmetry magazine and Eric Hand, who works for Nature and is an MIT Knight Fellow.

The take home message: Symbols, shapes and colors reveal patterns and distinctions that raw numbers often conceal.

"Columns of numbers are not very good ways to find patterns," said David Harris. It's simply too difficult to see relationships between data points when all you look at are the numbers themselves. Harris further explained that visualizing data can provide some unexpected benefits: (1) you can find errors in data sets and (2) you can uncover manipulated data.

When it comes to representing data, scientists are often too detail oriented and have difficulty in relating with the lay reader's perspective, Harris argued. In contrast, journalists can provide a more useful perspective infused with context. But, Harris cautioned, one should only use data visualization when it's genuinely the best way to tell the story - it's not a gimmick.

In one visualization, Peter Aldhous compared healthcare spending and life expectancy for various countries including the U.S., Japan, Australia, France, Canada and the United Kingdom. Life expectancy was displayed on the X axis in years and spending on the Y axis. As dynamic Google Motion Chart made clear, a shift occured in the 1980s: the amount the U.S. spent on healthcare suddenly became incommensurate with the benefit to life expectancy. Whereas the other countries continued to improve their life expectancy without a sharp increase in healthcare spending, the trajectory of the relationship between spending and longevity for U.S. citizens swung up at a sharp angle. The U.S. was suddenly spending a lot more than everyone else but the extra investments did little to improve life expectancy - and that trend continues to the present day.

Eric Hand emphasized sifting through databases, like the U.S. Census, to discover stories that most people don't bother to discover or have no idea even exist. Consider, for example, how many people in America actually have indoor plumbing. Many of those who don't must rely on outhouses - even today. Hand found one 80-year-old woman who daily carried buckets of water between her home and the nearest outhouse, quite a distance away.

The speakers also suggested a couple of free online tools:

--Harris recommended MIT Media Lab's program Processing, "an open source programming language and environment for people who want to create images, animations, and interactions."

--Aldhous pointed to Google Motion Charts, "a dynamic chart to explore several indicators over time"

"Use these tools to explore data, poke it, see what's there," Aldhous said. Data visualization does not need to be scary or overly complex. Anyone can do it.


  1. Did any speaker mention Tableau Public?

    data viz software for free from Seattle company.

  2. Peter showed some examples of visualizations he had made in Tableau Public and recommended it highly. He even walked the audience through creating a viz starting with the raw data.