Less ink, more think

Occasionally I am asked to give a lecture on how to draw good graphs. While I am always tempted to drone on interminably about abstract principles such as minimalism, balance, and consistency, I have discovered that it is much more fun to criticize bad graphs, and to show how they can be improved. But how to quire a truly bad graph? Easy! Just use the defaults in Microsoft Excel!

Here is an actual example of a graph drawn with those defaults. The data are fictitious, but the ugliness is breathtakingly real. It is sometimes said (unfairly) that engineers lack all sense of graphical design, but I think they must have hired specialists to create something so painfully wrong. vss2011workshop.010 But what specifically is wrong, and how can we make it better? Michelangelo once said “I saw the angel in the marble and carved until I set him free.”  So here too we will chip away at the obscuring excess, to reveal the beauty that Microsoft tried to hide.

First of all, what purpose is served by the heavy black rectangle that surrounds the graph? It serves two purposes: 1) to obscure useful information, and 2) to waste ink. Let’s remove it.


Better, but still bad. Next we note that the quantity being plotted is identified in three separate places: the vertical axis label, a title above the plot, and a key to the left. Is this really necessary? I think not. Lets get rid of two of them. Of course a key can be useful when several quantities are plotted together, but not when there is only one. Likewise labels above a plot have their uses, but should be avoided when they are redundant with other information, such as the axis label. We remove the key and the title. Apart from reducing clutter, this substantially increases the area available for the useful parts of the graph.


Now we ask the question: what purpose is served by the gray background? It serves two purposes: 1) to reduce the contrast and thus visibility of the data points, and 2) to waste ink. Get rid of it!


Aaah…so much more cheerful and relaxing to look at! But a few troubling questions remain. For example, what purpose is served by those shadows behind each data point? Do they indicate some exciting three dimensional aspect to the data? Of course not. But they do serve two purposes: 1) to render ambiguous the actual locations of the data points, and 2) to waste ink! Please people, can all just agree to never, never, use little shadows to suggest that our data are floating above the page? Thank you. The corrected graph is below. We have removed the shadows and also changed the diamonds to discs for the very important reasons that 1) they are simpler, and 2) I like them better.


Next we note that graphs are usually employed to show a pattern or trend. This pattern is not communicated well by a set of individual points floating out there, each an island, entire of itself. Only connect! A line drawn between the points aids enormously in conveying the visual sense of the data.


Next we correct an obvious (except to the Microsoft designers) flaw: the axis number labels running through the middle of the graph. We move them where they belong: to the axis, outside the graph.


Now we are getting somewhere. It almost looks ok. But we can do better. Gridlines can serve a purpose – for example, to let the reader easily judge approximate values – but there is never a reason for them to be dark and heavy, and to mask the useful information in the figure. Lighten up! In fact, the gridlines should generally be as light as possible, and still be visible. In this example, we make one gridline a bit darker than the others, to identify the y = 0 line.

vss2011workshop.018Now we see that the data really stand out. But we can do better still. What remains to distract the eye from the data? Well we could try removing the gridlines altogether, and then there is no need for the top and right borders of the frame.


Next we ask: what is the purpose of the bold font on the axis labels? Of course, it is to waste ink. Using a bold font for your labels is like writing your emails in all upper case. It is the digital equivalent of shouting. Don’t do it. Use your indoor voice.


And finally (yes, finally) we can reduce the line weight of the remaining axes. All we really need is enough weight to see them, and note their positions.


Thus we arrive at our final graph. It is not particularly exciting, but the data are clear, the trends are evident, and there is little to distract the eye from the essential information. Clearly, not all graphs are this simple, and there are often reasonable justifications for more elaborate presentations. But it is often a good idea to start with the simplest possible presentation, and elaborate from there. 

We conclude with the motto of this presentation, and indeed of this entire blog:

“Less ink, more think.”

A dimension is a terrible thing to waste

Graphs consist of ink spread out over two spatial dimensions. Sometimes the ink is colored, sometimes it is electronic ink, but the point is the same: to use the innate two-dimensional pattern-seeking machinery of human vision to expose some pattern in the data. Because we have only two spatial dimensions, each is highly valuable, and should not be squandered.

Here is a recent graph showing salaries in various life science specializations in the years 2009 and 2010. The graph consists of various vertical bars, each for one specialization in one year, with a height proportional to salary.

The various bars are arrayed along the horizontal dimension. First question: what is the meaning of the horizontal dimension? Take your time. Think carefully. OK, give up? The answer is…nothing! The data are not sorted alphabetically by specialization, by rank in 2009, by rank in 2010, or by any other discernible criterion. This is an example of wasting a dimension. As noted above, spatial dimensions of a graph are extremely valuable, and should never be squandered. They are the levers that we use to lift your consciousness of trends or singularities in the data.

Here we could at least use the horizontal dimension to rank the various specializations, for example by 2009 income. This is illustrated in our cleaned up example shown below. Now we can quickly see the best compensated specializations in 2009. We have indicated the two years with connected colored lines rather than discrete bars. We have labeled the specializations, but in a muted gray so that they can be read but do not distract from the data.

The original graph also has a number of other features worthy of ridicule. What are those little blue flags flying everywhere? Tibetan prayer flags? No, they are labels that tell us the actual numerical value of each salary. Excuse me, if we wanted a table we would have asked for a table. The point of a graph is to let the story be told by shape and position, not by a bunch of numbers. Apart from being unnecessary, the blue tags add clutter, one of the great villains of graph design. Here clutter masks the vital information in the graph, and obscures the important contour: the tops of the bars.

Another odd feature of the graph are the arrowheads at the top or bottom of the bars for 2010. What are they for? Some study will reveal that they are telling us whether that specialization increased or decreased in 2010. Did that jump out at you? I thought not. And of course half the arrowheads are at the top, and half at the bottom, so the eye can never move smoothly across the set and perceive some pattern. In our replacement graph, whether salaries increased or decreased in 2010 is immediately obvious from the position of the two curves.

Other minor quibbles with this graph would be: 1) why is 2009 to the right of 2010? By convention, dates usually increase to the right. 2) Why are the labels in all caps? Is there some reason to shout? 3) The number 2010 in yellow is barely legible. Yellow text on a white background is always a bad idea.


The ScientistLife Sciences Salary Survey, 2010,


%d bloggers like this: