When a bad graph tells a good story

One of my fundamental principles of graph design is “visibility.” This is the rather obvious idea that the information in the graph should be visible to the reader. This leads to such simple rules as not hiding one data point behind another, using legible fonts, and so on.

But sometimes the lack of visibility in the graph tells a story. Let me show you an example. As a student of tax policy, I recently had reason to compute a table of numbers describing the distribution of income in the US in 2005. Each row in the table described a group of taxpayers with a particular range of income (these ranges were defined by the IRS). In the first column of the table, I put the fraction that the group constituted of the total of US taxpayers. In the second column I computed the relative income of that group. Relative income is just a particular income divided by the average US income. So if your relative income is 2, it means you make twice the average US income. For reference, here is the table:

Now let us plot these data in a sensible way. We will use a rectangle chart, in which the width of each rectangle represents the fraction of the population, and the height represents the relative income. The rectangles are arranged next to each other proceeding from lowest income on the left, to highest on the right. A nice feature of this representation is that you can think of the total ink area as representative of the total income, and the shape of the distribution as showing how that income is spread out among the population. Here is the graph.

Wait, you say, there is nothing there!

But on closer inspection you can see a minuscule thread of red along the lower edge of the graph, a slight clot in the lower right corner, and a needle-like spike against the far right. Yes, folks, that is the distribution of income in this advanced democracy. It is so bizarrely skewed that it cannot be plotted in a sensible way. The tools of graphing are brought to their knees in the face of such an economic monstrosity.

Before going on, it should be noted that the situation is even worse (or better, depending on where you lie in the graph) than depicted here. This is because the IRS, presumably to sedate the pitchfork-carrying masses, does not break out income categories larger than $10,000,000. So the merely super-rich get lumped with the hyper-rich. If not for that, the needle on the right would be much taller, and narrower, than it already is.

We could, of course, truncate the vertical scale, and only plot the data over a more modest range of relative incomes, say up to 20. Here it is. But the graph is still largely empty, and is in any case inaccurate, since it fails to show the true range of relative incomes, and the ink no longer encompasses the total income.

When the dynamic range of data is very large, we often take refuge in the log transform. For example, here we could plot relative income in terms of “factors of 10.” Here is the result. For example, the richest category has a relative income about 5.5 factors of 10 (330,729) times that of the poorest non-zero relative income. Now we can see all the data, but ink area no longer has a simple meaning.

To conclude, sometimes a graph may serve as a rhetorical device, in which case the normal rules may not apply. But this device, like sarcasm, should be used sparingly.

Reference: http://www.irs.gov/pub/irs-soi/05in02ar.xls.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: