Size matters, but only as a ratio

The other day I received yet another breathless email demanding my urgent attention. This one trumpeted the outcome of a recent poll for “the generic Congressional ballot.” Oh, yeah, that ballot. I hope you have mailed yours in by now.

The poll outcome was illustrated with the graphic below.

In any case, the democratic advantage looked pretty impressive until my eye drifted over to the left hand edge, where I noticed that the bars began at 30%. Why start at 30%, rather than zero? TOO MAKE THE DIFFERENCE LOOK BIGGER! Forgive me for shouting but this is such an elementary error, or transparent subterfuge, that I can’t help but be exasperated.

The principle here is that you cannot appreciate the size of the difference between the two bars without knowing the absolute size of each bar. The difference between them is meaningful only as a fraction of the total. This is why a scale extending to zero is called a “ratio scale.”

But lest your eyes glaze over in anticipation of a boring lecture, let me illustrate the idea with a few more graphs. We take the same data shown above, and plot it several times, in each case changing only the starting point of the bars.

Which is “correct?” They all show the same data. The first one reproduces the original figure, starting at 30. But why not start at 40 (second graph) or even 42 (third graph)? That appears to show a gargantuan advantage for the blue party, but only because we can’t see the total lengths of the bars. The correct depiction is the last, starting at zero, which visually presents a much more accurate, and less impressive picture.

When should you use a ratio scale? The question has some depth to it, which we will not fathom today, but it is always the case that percentages should be plotted on a ratio scale.

Reference:

Email from Democratic Congressional Campaign Committee

Received: June 22, 2012

Pump up the volume!

One of the most egregiously deceptive practices in graphology is what we might call “dimension boosting.” Like the use of a performance drug in sports, it is an effort to gain un unfair advantage by playing outside the rules. Usually this crime consists of using the  width of a two-dimensional figure, such as a circle or a square, to depict a one-dimensional quantity. But as the width increases, the area, which is what we perceive, grows as the square of the width. With this device, a small difference can be made to look much larger. If the plotted quantities differ by only a factor of two, their areas will differ by a factor of four.

That is bad enough, but sometimes the criminal decides to do all the way, and throw in not one but two extra dimensions! In other words, they depict a one dimensional quantity with a three-dimensional object. Below is an example from a recent edition of the Sunday New York Times Magazine. It illustrates the decline of drinking among American teenagers over the last three decades.

Now we will perform a little test. Quickly, without looking at the axes, look at the two images at the beginning and end of the time interval and tell me by what factor drinking declined over that period. Got your answer? OK. Lets review. Well… in 1980 it looks like a 1.5 liter jug, while in 2010 they evidently had one shot glass (3 ounces?). You can fit about 17 shot glass servings in a 1.5 liter bottle. A 17x decline! Wow! Those kids sure have cut back!

But suspecting that todays teens are not quite so abstemious, and having been burned by criminal graphologists before, we examine the plot more carefully. First, we notice that even though the little bottles and glasses vary in not one, not two, but three dimensions, the axis on the left is a simple linear scale. Presumably the top of each vessel is the relevant aspect. Also, the axis is labeled in %. On that basis we realize that  the incidence of drinking has only declined from 70% of teens to 40%, a decline of only 1.75x. An impressive decline, but not 17x.

Now that this graph has been caught red-handed, and we have it in a holding cell while it calls its lawyer, we can investigate further. Notice that the vertical axis only goes down to 40%? That is another devious trick to exaggerate the magnitude of a difference. If the axis had extended all the way to zero, the difference between 1980 and 2010 would not seem quite so impressive. (that would provide what we call a “ratio scale,” for the technically inclined). And since we are plotting a fraction of teenagers, maybe it would be fair to extend that axis all the way from 0 to 100%, further reducing the apparent magnitude of the change.

And another thing: why are the bottoms of the bottles and glasses jumping all over the place? If the top is meant to indicate the value, it would only be fair to keep the bottom stationary.

And while it feels like piling on, what is going on with the horizontal position of the containers? Their positions seem to jump around a bit, and there are different numbers in each decade. Did they forget to make the measurement is certain years? Or is the artist just exploiting their “artistic license?”

This graph is an instance of what is often called an “infographic.”  An infographic is to a graph what an infomercial is to information. A bastard form in which information takes second place to entertainment or marketing. Look! Little bottles! What fun! One could imagine a form in which entertainment was provided, but truth was retained, but regrettably that is rarely to be seen.

In the printed version of the magazine, this graph is attributed to O.o.p.s. They should be ashamed. But the Times cannot escape the blame for this many-count indictment of graphical crime.

For completeness, we show a less entertaining but more accurate plot of the same data. It shows the full range from a fractions from zero to one, and does not introduce extraneous dimensions. The change in teenage behavior is significant, but not exaggerated by multidimensional trickery.

Reference:

New York Times

Well: The Kids Are More Than All Right

By TARA PARKER-POPE

Published: February 2, 2012

http://well.blogs.nytimes.com/2012/02/02/the-kids-are-more-than-all-right/

All that glitters is not Silver

“Love is blind.”

So begins a teasing article in the New York Times Sunday Magazine, by Nate Silver, the current wunderkind of popular statistics.  “Popular statistics,” now that I think about it, is almost the definition of an oxymoron, and it is to Nate’s credit that he has made it possible to utter such a phrase without puzzlement. The gist of the article is that in the dating game you are more likely to get lucky on a wednesday night than on any other night of the week. The article is accompanied by a massive “infographic” that occupies more than half of a page.

Debate has raged over the years about “decoration” of graphs, and while I am obviously  firmly in the minimalist camp, I am not a wild-eyed fundamentalist. A little furbelow here and there is harmless, provided that it does not obscure or distort the data.

Regrettably, young Nate has been kidnapped by the graphic artistes at the Times, who have never met a graph that could not be obscured or distorted. Witness below their artsy creation.

Note that there is an overall graph, for the days of the week, and within each day, a graph for hours of the evening. From a visual point of view, the most prominent effect is the trend over days. What exactly is plotted by this larger graph, for days of the week? A little scrutiny will reveal that it plots: nothing! The top of each bar is offset from the actual data, for any hour, by bizarrely random amounts. This is not decoration, it is desecration.

But suppose we extract the data, and plot them correctly. For days of the week, which is the primary focus of the article, it might be sensible to take the average “score” over the evening hours, and plot that. If we do so, we get the graph below.

Wow! No wonder they call it hump-day! Look at that massive effect! Except of course, that a glance at the scale reveals that the needle, so to speak, has barely budged. A more correct rendition of the data, showing the variation as a fraction of the total score (a ratio scale), is shown below.

Umm…never mind.  For all practical purposes, every night is the same. The main point of the article is, how shall we put it, nonsense.

And what about the numbers for the different hours of the evening? Even though they are hard to see, at least they are big effects, right? Of course not. Here is the average score for the various hours of the evening, plotted on a ratio scale.

I don’t want to be Miss Grundy, and I know even serious statistics wonks need a night out every once in a while, but even if “love is blind,” Nate really ought to reconsider the artsy types he hangs out with. Whichever night it was, he didn’t get lucky.

New York Times MAGAZINE

Wednesday Night Is All Right for Loving

By NATE SILVER

Published: June 3, 2011

Approaching the singles scene statistically.

http://www.nytimes.com/2011/06/05/magazine/nate-silver-wednesday-night-is-right-for-loving.html

We did it! (again)

Anyone living in the current millennium who is even moderately engaged in the politics of our time receives a daily onslaught of political email. Most of this is designed to stoke our outrage at the latest unspeakable act by the other side, and beseech us to send just a few dollars to answer this assault on all that is good. And sometimes the messages exhort us to send our money so that we can show that our side is collecting the most money, and must therefore be most in tune with the public. And if we have done as we are told, at the end of the fundraising cycle, we may get a message with the exhultant cry: “We won!” As if the real battle was over fundraising, rather than the deficit, health care, two wars, or financial regulation.

These messages annoy me somewhat, but nothing sends me into apoplexy like a bad graph. But the latest missive had a doozy: a graph that should shame the artist into permanent retirement from the field of graphology.

Behold Exhibit A: a graph that purports to show the fundraising results for Democratic and Republican groups in the last fundraising cycle. And as the DCCC says: “We did it!”

The numbers tell one tale: evidently $4.4M for the dems and a measly $3.0M for their opponents.

But there is something a bit weird about this picture. The blue bit seems out of proportion. As a professional graphologist (don’t try this at home) I quickly noted one obvious possible source of distortion: the width of the bars. Why is the blue bar wider? Are they using area to represent the number of dollars? That would be ok, though I have my doubts about our ability to judge relative areas. But when I took out my digital ruler, I discovered to my amazement (not!) that the areas were not in the ratio of 4.4/3 = 1.47, but instead in the ratio 2.54! Yes that’s right, the blue rectangle has 2.54 times the area of the red rectangle. If area were accurately representing dollars, then the NRCC would only have raised only $1.73M, not $3.0M. The red area is off by 73%!

Ok, I thought, so the increased width of the blue rectangle is just some sort of rhetorical flourish. It must be the height that represents the two quantities. Again, amazement overcame me. The heights are wrong too! The blue is 1.72 times higher than the red, rather than 1.47 as it should be.

In other words, the only thing the artist got right was that the DCCC got more dollars than the NRCC. For this, we need a graph?

Ok, so what would the correct graph look like? Assuming we use height to represent dollars, here it is. We put some numbers on the axis to make it clear that zero is included.

So, we come to the end of our sad tale. What is the lesson? Simple: if you are going to use a graph to represent numbers, make sure it actually represents the numbers.

Reference:

Email of March 8, 2011, from Rep. Steve Israel, DCCC Chairman <dccc@dccc.org>.

%d bloggers like this: