Less ink, more think

Occasionally I am asked to give a lecture on how to draw good graphs. While I am always tempted to drone on interminably about abstract principles such as minimalism, balance, and consistency, I have discovered that it is much more fun to criticize bad graphs, and to show how they can be improved. But how to quire a truly bad graph? Easy! Just use the defaults in Microsoft Excel!

Here is an actual example of a graph drawn with those defaults. The data are fictitious, but the ugliness is breathtakingly real. It is sometimes said (unfairly) that engineers lack all sense of graphical design, but I think they must have hired specialists to create something so painfully wrong. vss2011workshop.010 But what specifically is wrong, and how can we make it better? Michelangelo once said “I saw the angel in the marble and carved until I set him free.”  So here too we will chip away at the obscuring excess, to reveal the beauty that Microsoft tried to hide.

First of all, what purpose is served by the heavy black rectangle that surrounds the graph? It serves two purposes: 1) to obscure useful information, and 2) to waste ink. Let’s remove it.

vss2011workshop.011

Better, but still bad. Next we note that the quantity being plotted is identified in three separate places: the vertical axis label, a title above the plot, and a key to the left. Is this really necessary? I think not. Lets get rid of two of them. Of course a key can be useful when several quantities are plotted together, but not when there is only one. Likewise labels above a plot have their uses, but should be avoided when they are redundant with other information, such as the axis label. We remove the key and the title. Apart from reducing clutter, this substantially increases the area available for the useful parts of the graph.

vss2011workshop.012

Now we ask the question: what purpose is served by the gray background? It serves two purposes: 1) to reduce the contrast and thus visibility of the data points, and 2) to waste ink. Get rid of it!

vss2011workshop.014

Aaah…so much more cheerful and relaxing to look at! But a few troubling questions remain. For example, what purpose is served by those shadows behind each data point? Do they indicate some exciting three dimensional aspect to the data? Of course not. But they do serve two purposes: 1) to render ambiguous the actual locations of the data points, and 2) to waste ink! Please people, can all just agree to never, never, use little shadows to suggest that our data are floating above the page? Thank you. The corrected graph is below. We have removed the shadows and also changed the diamonds to discs for the very important reasons that 1) they are simpler, and 2) I like them better.

vss2011workshop.015

Next we note that graphs are usually employed to show a pattern or trend. This pattern is not communicated well by a set of individual points floating out there, each an island, entire of itself. Only connect! A line drawn between the points aids enormously in conveying the visual sense of the data.

vss2011workshop.016

Next we correct an obvious (except to the Microsoft designers) flaw: the axis number labels running through the middle of the graph. We move them where they belong: to the axis, outside the graph.

vss2011workshop.017

Now we are getting somewhere. It almost looks ok. But we can do better. Gridlines can serve a purpose – for example, to let the reader easily judge approximate values – but there is never a reason for them to be dark and heavy, and to mask the useful information in the figure. Lighten up! In fact, the gridlines should generally be as light as possible, and still be visible. In this example, we make one gridline a bit darker than the others, to identify the y = 0 line.

vss2011workshop.018Now we see that the data really stand out. But we can do better still. What remains to distract the eye from the data? Well we could try removing the gridlines altogether, and then there is no need for the top and right borders of the frame.

vss2011workshop.019

Next we ask: what is the purpose of the bold font on the axis labels? Of course, it is to waste ink. Using a bold font for your labels is like writing your emails in all upper case. It is the digital equivalent of shouting. Don’t do it. Use your indoor voice.

vss2011workshop.020

And finally (yes, finally) we can reduce the line weight of the remaining axes. All we really need is enough weight to see them, and note their positions.

vss2011workshop021

Thus we arrive at our final graph. It is not particularly exciting, but the data are clear, the trends are evident, and there is little to distract the eye from the essential information. Clearly, not all graphs are this simple, and there are often reasonable justifications for more elaborate presentations. But it is often a good idea to start with the simplest possible presentation, and elaborate from there. 

We conclude with the motto of this presentation, and indeed of this entire blog:

“Less ink, more think.”

Advertisements

Size matters, but only as a ratio

The other day I received yet another breathless email demanding my urgent attention. This one trumpeted the outcome of a recent poll for “the generic Congressional ballot.” Oh, yeah, that ballot. I hope you have mailed yours in by now.

The poll outcome was illustrated with the graphic below.

In any case, the democratic advantage looked pretty impressive until my eye drifted over to the left hand edge, where I noticed that the bars began at 30%. Why start at 30%, rather than zero? TOO MAKE THE DIFFERENCE LOOK BIGGER! Forgive me for shouting but this is such an elementary error, or transparent subterfuge, that I can’t help but be exasperated.

The principle here is that you cannot appreciate the size of the difference between the two bars without knowing the absolute size of each bar. The difference between them is meaningful only as a fraction of the total. This is why a scale extending to zero is called a “ratio scale.”

But lest your eyes glaze over in anticipation of a boring lecture, let me illustrate the idea with a few more graphs. We take the same data shown above, and plot it several times, in each case changing only the starting point of the bars.

Which is “correct?” They all show the same data. The first one reproduces the original figure, starting at 30. But why not start at 40 (second graph) or even 42 (third graph)? That appears to show a gargantuan advantage for the blue party, but only because we can’t see the total lengths of the bars. The correct depiction is the last, starting at zero, which visually presents a much more accurate, and less impressive picture.

When should you use a ratio scale? The question has some depth to it, which we will not fathom today, but it is always the case that percentages should be plotted on a ratio scale.

Reference:

Email from Democratic Congressional Campaign Committee

Received: June 22, 2012

All that glitters is not Silver

“Love is blind.”

So begins a teasing article in the New York Times Sunday Magazine, by Nate Silver, the current wunderkind of popular statistics.  “Popular statistics,” now that I think about it, is almost the definition of an oxymoron, and it is to Nate’s credit that he has made it possible to utter such a phrase without puzzlement. The gist of the article is that in the dating game you are more likely to get lucky on a wednesday night than on any other night of the week. The article is accompanied by a massive “infographic” that occupies more than half of a page.

Debate has raged over the years about “decoration” of graphs, and while I am obviously  firmly in the minimalist camp, I am not a wild-eyed fundamentalist. A little furbelow here and there is harmless, provided that it does not obscure or distort the data.

Regrettably, young Nate has been kidnapped by the graphic artistes at the Times, who have never met a graph that could not be obscured or distorted. Witness below their artsy creation.

Note that there is an overall graph, for the days of the week, and within each day, a graph for hours of the evening. From a visual point of view, the most prominent effect is the trend over days. What exactly is plotted by this larger graph, for days of the week? A little scrutiny will reveal that it plots: nothing! The top of each bar is offset from the actual data, for any hour, by bizarrely random amounts. This is not decoration, it is desecration.

But suppose we extract the data, and plot them correctly. For days of the week, which is the primary focus of the article, it might be sensible to take the average “score” over the evening hours, and plot that. If we do so, we get the graph below.

Wow! No wonder they call it hump-day! Look at that massive effect! Except of course, that a glance at the scale reveals that the needle, so to speak, has barely budged. A more correct rendition of the data, showing the variation as a fraction of the total score (a ratio scale), is shown below.

Umm…never mind.  For all practical purposes, every night is the same. The main point of the article is, how shall we put it, nonsense.

And what about the numbers for the different hours of the evening? Even though they are hard to see, at least they are big effects, right? Of course not. Here is the average score for the various hours of the evening, plotted on a ratio scale.

I don’t want to be Miss Grundy, and I know even serious statistics wonks need a night out every once in a while, but even if “love is blind,” Nate really ought to reconsider the artsy types he hangs out with. Whichever night it was, he didn’t get lucky.

New York Times MAGAZINE

Wednesday Night Is All Right for Loving

By NATE SILVER

Published: June 3, 2011

Approaching the singles scene statistically.

http://www.nytimes.com/2011/06/05/magazine/nate-silver-wednesday-night-is-right-for-loving.html

Warning: Contains graphic violence

Welcome to GraphicViolence. I hope you will enjoy or learn from my little project, thought that is not its main purpose. This is really about relaxation therapy, for myself. (Breathe deeply.)

These days, with alarming frequency, in all media, I am confronted with graphs that do violence to the most elementary principles of how to represent data with pictures. What are those principles? Follow this blog, and you will find out. It is much more fun to to see examples of those principles being violated, than to sit through a pedantic lecture.

The plan for this blog is to point at particularly egregious insults to our graphic intelligence. These will be found in public sources such as newspapers, TV, and internet sites.  In each case, I will show how the data could be presented in a simple efficient manner. Along the way, a few fundamental principles of effective graph design may be mentioned. I hope this will prove much better, for my own mental balance, than yelling at the newspaper.

I have several day jobs, so don’t expect very frequent postings. But given the torrent of bad graphs out there, I should be able to keep this little pipeline filled to bursting.

%d bloggers like this: