# Show me the correlation!

April 16, 2011 Leave a comment

Suppose that we have two quantities that vary over time. We want to know if their variations are correlated, that is, do the dance to the same tune, or does each march to a different drummer. Of course, we often ask this question because we want to know whether one of the two causes, or at least influences, the other. The most effective way to compare the variations of two quantities is to plot them one against the other.

But sometimes the simplest solution is not good enough for the pop graphologist. (A new term I just invented.) Take Charles M. Blow, the top pop graphologist for the New York Times. (I have commented on his work previously http://wp.me/p19RFk-a. ) Mr. Blow writes excellent columns, on important issues, but he decorates them with grievously flawed graphs. Consider the following graph, from the New York Times of April 16, 2011. Please forgive the gargantuan vertical extent of the graph (you will probably have to click on the graph to see the full extent), we will address this below.

The red line shows the variation over time of the top marginal tax rate in the US. The tiny green bars at the bottom show the % change in GDP from the previous year. The first question we always ask of any graph is: what pops out at you? I thought so: nothing. Now maybe that is Mr. Blow’s point – that there is no relation between the two quantities – but if so, this is hardly the way to show it. Partly because it is the wrong type of graph, and partly because the two quantities are so far apart on the graph, it is difficult to discern any relationship between the two quantities.

As we stated at the outset, the simplest way to graphically display a synchronicity, or in technical terms a correlation, between the two quantities is to plot them against each other. I have done this in the graph below. Using the same data as Mr. Blow, we plot the % change in GDP against the top marginal tax rate.

Now we can immediately discern whether there is a close relationship between the top rate and the changes in GDP. If there were, the points would cluster tightly together, forming some curve describing the relationship. Instead, the points are widely scattered, and so the main point of the article – that there is no strong relationship – is verified.

However! However! Plotting the data in this way actually reveals an additional surprise, completely obscured in Mr. Blow’s graph. There is actually a small POSITIVE relationship between the two quantities! Yes, you heard me right. In yet another death knell for supply-side economics, we see that HIGHER tax rates lead to LARGER increases in GDP. (Forgive my upper-case outburst, I got a bit excited). Note that the two lowest marginal rates are associated with some of the largest declines in GDP, and two of the largest increases in GDP are associated with marginal rates near 90%! The red line shows the best-fitting linear relationship between the two quantities, and it climbs slightly as you go from low tax rates to high. True, the effect is weak. The slope of the line is slight; it takes a 17% rise in top marginal rate to get 1% rise in GDP. And the degree to which the points cluster around this trend is also weak. We measure this by the correlation statistic (http://en.wikipedia.org/wiki/Correlation_and_dependence), which must lie between -1 and 1, and in this case is a meager 0.26 (0 would mean no relationship at all).

But still. Mr Blow could have made a much stronger case if he had used the right kind of graph.

Now we are going to make a few nerdy points about graphs of correlation, and those of you who are only here for the entertainment portion of the show can go back to your other amusements.

In graphs of this sort, it is traditional to put the so-called “independent” variable on the horizontal axis, and the so-called “dependent” variable on the vertical axis. When we assign variables in this way, we are making an assumption about what causes what. That may or may not be reasonable, but it is good to adhere to this convention. In this case, the question addressed in the column is whether tax rates affect growth in GDP, so that is why we assign the quantities to the two axes as we have.

Another feature of a graph like this is the aspect ratio. One failing of Mr. Blow’s graph was the large vertical distance between the separate graphs of the two quantities. The distance was so large because Mr. Blow chose to plot them on the same axis (%). This may have seemed reasonable, since a % is a %, but in fact it is mistake. When we are exploring the relationship between variation in two quantities we should not presume that we know the ratio between them. It is better to let the data tell us what that ratio might be. The correlation graph does this by plotting the full range of one against the full range of the other. Because there is no reason to do otherwise, we make the two ranges the same size. In other words, the graph has an aspect ratio of one.

As I noted above, Mr Blow’s graph has an enormous vertical extent. It is so big that in native form it will not fit on a typical laptop screen without scrolling. OK, now, for extra credit, tell me the reason for using such a large vertical expansion of the graph? Take your time…no-one is timing you…plenty of time…all the time in the world. Not quite done thinking? Take a few more minutes. Ready? And the answer is…*there is no reason whatsoever*! Mr. Blow blows up his graphs to ridiculous proportions (usually in the vertical dimension) because he *can*. He is the big graph honcho at the *New York Times*!

Now I hesitate to make Mr Blow the poster child for bad graphs, since the intellectual points he makes are always good ones. But an intervention is required, for his sake, for the New York Time’s sake, and for the sake of the reading public.

Reference:

New York Times

The Pirates of Capitol Hill

By CHARLES M. BLOW

Published: April 15, 2011

Data at:

http://www.bea.gov/national/xls/gdpchg.xls

http://www.taxpolicycenter.org/taxfacts/displayafact.cfm?Docid=213