Visualization Comes of Age

Jerry Finn Vision&Values

A fair part of our activities to transform data into knowledge involves visualizing large amount of data and communicate the results in interfaces and stories. At BBVA we also advocate for best practices in data visualizations and train data scientists to improve their skills (see for instance our Introduction to Data Visualization and to Human Perception in Data Visualization). Here is a description of where our heart is with this practice.

Let’s Start With A Little History

In 1854 in the Soho district of London a Cholera outbreak was raging that would eventually kill 616 people. John Snow knew something that went against the conventional wisdom of the time, so to convince others he used not only data in its raw format but also a convincing visualization of that data. At that time it was accepted that cholera was caused by bad air or miasma, but Dr. Snow suspected the infection spread in the water. Unfortunately for him he could not prove a causation since germ theory would not be proposed by Louis Pasteur for some years to come. But he could establish a correlation and clearly demonstrate it. Dr. Snow used a dot map, what would become to be known as a Voronoi diagram, to trace the source of the infection to a pump on Broad Street. He placed dots over a street map of the infected area and saw a clear pattern of clusters around the pump in question. He then carefully documented that exceptions around the cluster had other sources of water. The city did disable the pump, but still would not accept his theory. One critic, Reverend Henry Whitehead, tried to disprove Snow’s theory, but was persuaded by his maps, and helped establish the root cause of the illness. John Snow’s maps are now a rather famous example of how visualization contributed to the field of epidemiology.

Original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854, drawn and lithographed by Charles Cheffins.

Original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854, drawn and lithographed by Charles Cheffins.

Today visualizations, both good and bad, are ubiquitous. New companies like Tableau, Spotfire, Pentaho, Qlik, Quadrigram and CartoDB are offering tools or services beyond traditional business intelligence. It would be impossible to say if we are in a bubble with regards to visualization companies, but it is certain that some sort of Big Visualization is a necessary consequence of Big Data.

Visualization Becomes a Discipline

The practice of Visualization became much more systematized in 1983 when Edward Tufte published his landmark book The Visual Display of Quantitative Information where he expanded on the objectives of useful graphics. The onset of Big Data shifted more importance to two of those objectives. One was to make large data sets coherent. It’s doubtful in 1983 Professor Tufte could have imagined the size of the datasets we are working with, considering smartphones and the internet are leaving a digital footprint of our every move. But one consequence is that it is harder than ever to separate the signal from the noise.  The other objective with increased importance is to “reveal the data at several levels of detail”. In 1983, if you were alive, you were probably consuming visualization in the form of static charts in newspapers and magazines. Today it’s a whole new ballgame. On web sites or smartphones, graphics are expected be interactive if not animated. Tufte’s principle of “data at several levels” takes on a new meaning as users demand a drill-down capability to explore data at more detail with a click of the mouse or swipe on the phone. For this to happen the old HTML documents had to be augmented with newer technologies such as Data Driven Documents (D3.js) via JavaScript that can dynamically change the document as the data changes. The topic even sparks debate on what deserves to be called visualization. Some would argue that charts of the previous generation is not visualization and only the new more interactive tools deserve this name. They hold that if a chart is a word, then visualization is a phrase.

Regardless of how we name it, some things remain constant. Professor Tufte warned against chartjunk, excessive and meaningless adornments that don’t convey information, that we now refer to as “Datatainment”, gratuitous animation on interactive websites.

Leading expert Albert Cairo, Professor of Visual Journalism at University of Miami and author of “The Functional Art”, asserts that Visualization is one of the hardest aspects of Big Data to get right. He offers a more concise list of objectives for graphics than Professor Tufte, recommending we pay attention to 5 principles. A visualization should be:

  1. Truthful
  2. Functional
  3. Beautiful
  4. Insightful
  5. Enlightening

Professor Cairo stresses that “Beautiful” does not mean flashy but elegant by conveying the maximum information with the minimum drawing. He also points out that he deliberately chose to use the word “Truthful” instead of the more common “Clarity”. He has inveighed against visualizations that are clear, but not truthful. Again this is not a new problem. The best selling book of all time on statistics is Darrell Huff’s 1954 book “How to Lie with Statistics” and its most cited chapters have been about lying with charts. His classic examples, taken from real publications, included truncating charts and changing scale to exaggerate changes, and using pictures scaled by height to show a change when reader’s eyes will really register the change in volume of the drawing, which will be a change greater than just the height.

In his blog, The Functional Art, Professor Cairo criticized liberal economist Robert Reich for a distortion that is common nowadays, that Mr. Huff failed to mention. Professor Cairo disparages the use of the double “y” axes, putting two variables on different scales, used by Reich to correlate union membership and declining middle-class incomes. Here is Dr. Reich’s work and Cairo’s correction.

Alberto Cairo's reinterpretation of Dr. Reich's graph. Taken from Alberto Cairo's blog

Alberto Cairo’s reinterpretation of Dr. Reich’s graph. Taken from Alberto Cairo’s blog

For more examples of the misleading double axes, see the quite enjoyable site “Spurious Correlations”. There you can see how well US spending on science and space correlates with suicides by hanging or strangulation, or how the per capita consumption of cheese correlates with deaths by becoming tangled in bedsheets. Such is the power of visualization to enlighten or mislead.

Cairo’s new book will be called The Truthful Art and will address bad practices. All this criticism may not get offenders to change their ways. Mr. Huff, whose examples also came from sources interested in political lobbying, cast doubt on these being honest mistakes when he noted “it is rather like being short changed: When all the mistakes are in the cashier’s favor, you can’t help wondering.” But while distorting information might not hurt your career if you are political pundit, this type of self deception can be fatal if you are businessman. Every businessman wants to believe all his investments will be profitable but if he acts accordingly without understanding the risks, he’ll be bankrupt.

The Business Value of Visualization

We have constructed a dataset from bank card transactions that contains the information a businessman would need to accurately assess risks and spot trends. It’s the discipline of visualization that helps us communicate that information. In 2011, before we were spun off from the BBVA Innovation Center, we took a clue from our friend John Snow and used maps to understand interesting data. But this time we weren’t looking for clusters of disease but the intensity of business activity. We used our data and other’s to display business activity changing through the day and to understand related demographics. Here is an examples of a map showing the intensity of commercial activity at a street level through the day in Barcelona.

image01We have built on this work for a new product we are currently launching, Commerce360. Commerce360’s dashboard provides easy to understand text with various visualizations with an interactive drill-down capability. Clients can display line charts to see trends, such as average transaction amount, activity by hour of day or day of week etc, in their own business and against other businesses in a given geographic area. Using CartoDB, Commerce360 can show maps of zipcodes of where customers live or where they are shopping with the intensity of commercial activity indicated by the intensity of the color on the map. And if the client wants, they can click to see this information in a bar chart to see the breakdown by exact percentages. Commerce360 can also reveal insightful information about a business’s customers and the average customer of the competitors in a given area. The business’s dashboard will show a bar chart of the demographics of customers by gender and age with information on how these numbers have changed over previous periods as well as what is the average for his competitors.

image03

 

360BarChart.4The dashboard visualizations allow our clients to quickly comprehend large amounts of data and spot trends that would be impossible my merely looking at spreadsheets or tables. With this information our clients can identify areas that need improvement in comparison to the competition, understand what factors are driving customer behavior and plan marketing programs to offer specials or launch new products.  

Visualization is a quickly evolving discipline, which makes it difficult to say where it will end up in the next few years. Business people need to comprehend data in more dimensions than ever before, not just seeing snapshots but how customers move through space and time. To illustrate this we can view some of the more cutting edge visualizations we have done in collaboration with the MIT Senseable City Lab.

Economic Activity in Easter in Spain

The following visualization not only illustrates Spain’s spending habits the week before easter, but animates them. There are several types of animation going on at the same time that let the viewer witness commercial transactions evolve minute by minute. Different colors are chosen to represent the types of businesses. The viewer can see how the aggregate amounts rise and fall through the day on the charts on the left, and where the transactions are occurring on the map to the right. The viewer can even understand the intensity of the commercial activity in each region as the circles on map representing spending get larger as the aggregate amount of the transactions rises. A number of data points are quickly comprehensible: We see that larger and smaller cities start their activity on different schedules, we see that there are different peak hours of business depending on the type of business, and we see that since different regions celebrate the holidays on different days this in turn is reflected in their spending patterns.     

Since tourism is such an important part of Spain’s economy we created a similar animated visualization with Vizzuality, a visualization consulting firm, for the July and August tourist season, with a drill down capability on 17 different business categories, by country of origin of the foreign tourists, and by region of Spain.

 

Tourism weight

Tourist trajectories

In the high technology businesses, the importance of communication is habitually underestimated, and at the same time there is a growing body of evidence that visualization is the most effective way to carry out that communication. Although estimates vary, there is a consensus that a huge amount of our brains’ processing time is dedicated to understanding visual imagery. Our brains can process visual information by an order of magnitude faster than we process text. In fact people are likely to feel engaged with visualizations and understand the information, whereas just pure text would disengage us due to information overload. A Wharton School of Business study demonstrated when groups are shown the same information purely with text and numbers or with visualization, that visualizations persuaded on average 67% of the audience as compared to 50% for the presentations without them. The blog WebDAM estimates that 84% of the communications on the web will be visual by 2018 and posts with good visualizations have a 650% higher engagement than text only. We might take those numbers with a grain of salt but the trend is clear. To engage your customers and understand massive amounts of business data, you have to take your visualizations up to a whole new level.