Size matters, but only as a ratio

The other day I received yet another breathless email demanding my urgent attention. This one trumpeted the outcome of a recent poll for “the generic Congressional ballot.” Oh, yeah, that ballot. I hope you have mailed yours in by now.

The poll outcome was illustrated with the graphic below.

In any case, the democratic advantage looked pretty impressive until my eye drifted over to the left hand edge, where I noticed that the bars began at 30%. Why start at 30%, rather than zero? TOO MAKE THE DIFFERENCE LOOK BIGGER! Forgive me for shouting but this is such an elementary error, or transparent subterfuge, that I can’t help but be exasperated.

The principle here is that you cannot appreciate the size of the difference between the two bars without knowing the absolute size of each bar. The difference between them is meaningful only as a fraction of the total. This is why a scale extending to zero is called a “ratio scale.”

But lest your eyes glaze over in anticipation of a boring lecture, let me illustrate the idea with a few more graphs. We take the same data shown above, and plot it several times, in each case changing only the starting point of the bars.

Which is “correct?” They all show the same data. The first one reproduces the original figure, starting at 30. But why not start at 40 (second graph) or even 42 (third graph)? That appears to show a gargantuan advantage for the blue party, but only because we can’t see the total lengths of the bars. The correct depiction is the last, starting at zero, which visually presents a much more accurate, and less impressive picture.

When should you use a ratio scale? The question has some depth to it, which we will not fathom today, but it is always the case that percentages should be plotted on a ratio scale.


Email from Democratic Congressional Campaign Committee

Received: June 22, 2012


A dimension is a terrible thing to waste

Graphs consist of ink spread out over two spatial dimensions. Sometimes the ink is colored, sometimes it is electronic ink, but the point is the same: to use the innate two-dimensional pattern-seeking machinery of human vision to expose some pattern in the data. Because we have only two spatial dimensions, each is highly valuable, and should not be squandered.

Here is a recent graph showing salaries in various life science specializations in the years 2009 and 2010. The graph consists of various vertical bars, each for one specialization in one year, with a height proportional to salary.

The various bars are arrayed along the horizontal dimension. First question: what is the meaning of the horizontal dimension? Take your time. Think carefully. OK, give up? The answer is…nothing! The data are not sorted alphabetically by specialization, by rank in 2009, by rank in 2010, or by any other discernible criterion. This is an example of wasting a dimension. As noted above, spatial dimensions of a graph are extremely valuable, and should never be squandered. They are the levers that we use to lift your consciousness of trends or singularities in the data.

Here we could at least use the horizontal dimension to rank the various specializations, for example by 2009 income. This is illustrated in our cleaned up example shown below. Now we can quickly see the best compensated specializations in 2009. We have indicated the two years with connected colored lines rather than discrete bars. We have labeled the specializations, but in a muted gray so that they can be read but do not distract from the data.

The original graph also has a number of other features worthy of ridicule. What are those little blue flags flying everywhere? Tibetan prayer flags? No, they are labels that tell us the actual numerical value of each salary. Excuse me, if we wanted a table we would have asked for a table. The point of a graph is to let the story be told by shape and position, not by a bunch of numbers. Apart from being unnecessary, the blue tags add clutter, one of the great villains of graph design. Here clutter masks the vital information in the graph, and obscures the important contour: the tops of the bars.

Another odd feature of the graph are the arrowheads at the top or bottom of the bars for 2010. What are they for? Some study will reveal that they are telling us whether that specialization increased or decreased in 2010. Did that jump out at you? I thought not. And of course half the arrowheads are at the top, and half at the bottom, so the eye can never move smoothly across the set and perceive some pattern. In our replacement graph, whether salaries increased or decreased in 2010 is immediately obvious from the position of the two curves.

Other minor quibbles with this graph would be: 1) why is 2009 to the right of 2010? By convention, dates usually increase to the right. 2) Why are the labels in all caps? Is there some reason to shout? 3) The number 2010 in yellow is barely legible. Yellow text on a white background is always a bad idea.


The ScientistLife Sciences Salary Survey, 2010,

Smile, and the world smiles with you

Perhaps the most potent weapon in the graphing arsenal is the contour. Our eye and brain are designed to appreciate edges between light and dark, or lines that mark a border. We instantly appreciate the connectedness of points along a contour, and we also appreciate the shape of the contour, and its trajectory. This is why it is so easy to display trends by plotting a collection of points and connecting them with a contour.

But like any powerful weapon, it can be misused. For example, by displaying a contour where there are no data! This seems pretty obvious, but it has escaped the chart designers at CNBC-TV. For those of you unfamiliar with this outlet, it is the leading cable business news channel worldwide. To their credit, they rely heavily on graphs. I recently clocked a random one-minute segment and counted six graphs. Most of their charts, naturally enough, show trends over time in financial data such as stock prices and corporate earnings.

Here is a chart from a recent edition of “The Call,” a morning business program on CNBC-TV. In many respects it is an unremarkable, even commendable graph. But wait: what is that curious smile-shaped shadow that extends across the entire graph? Is it some theoretical curve? Some baseline curve against which to compare the lofty performance of gold? Perhaps a projection of some sort?

To investigate further, we look at  second graph, showing the less impressive performance of Microsoft. Hmmmm…same smile. In fact, it turns out, this smile is superimposed on every single graph shown on “The Call.”

Of course, what we are looking at is an egregious example of what is politely called “decoration,” and less politely called “background crap.” Background crap is always bad, because it masks the visibility of the important information. In this case, the background is particularly pernicious because the trend of the pointless contour competes with the trend of the contour we should be perceiving: the trend of the data.

Indeed one wonders whether a more nefarious scheme is involved. Perhaps the chart designers are not clueless, but devious, and actually trying to bias our perception. Note that the curve declines (so sad…) but then recovers (yes!) Perhaps this smile is just that: a smiley face, meant to lift our spirits whatever the trajectory of the actual investment considered. All will be well; things go down, but then they come back up. Smile, and the world smiles with you. This is consistent with the generic optimism about all things financial that is part of the CNBC-TV culture.

More likely, a “graphic designer” was called upon to juice up the visual appeal of the graphs on the show. What was really needed here was a “graph designer.” But that job, however essential in our data-driven world, has yet to be invented.

Lesson: Don’t put meaningless contours in the background of your graph.

Reference: The Call, CNBC TV, 10/27/10 “Barrick Gold CEO on Earnings,” “Driving MSFT’s Growth.”

%d bloggers like this: