Attack of the little people

Where did they come from, the little people? Like a horde of replicants they have streamed forth to cover the world of infographics. No trendy depiction of any statistic related to humans is complete without the little people. Consider todays freshly populated example, from our favorite whipping boy, the New York Times.

The graphic is an attempt to put “into perspective” the numbers of people in poverty in the US. It does this by rounding up a bunch of little people, and penning them in various corrals that seem to have something to do with states or demographic groups. Hard to tell, since it is an expository jumble.

Let us ask a few questions of this graphic. First, the question that we ask of every such graphic: does the point leap out at you, in a flash of effortless cognition? Uh…lets see, half the people in poverty live in New York, and half in Texas? Fail!

Some more questions. If the orange little people are women and girls, why are they all wearing men’s business suits, albeit in a saucy feminine color? And do all the impoverished women and girls live in Texas? Rick Perry, are you aware of this? The state could at least provide more appropriate apparel for those in need. If you are a woman or girl, going to a job interview in an orange men’s business suit is not advisable, especially in Texas.

There seem to be a lot of impoverished white people (31.7 million), but amazingly, none of them live in Texas or New York. And if you think that is amazing…wait for it…none of them are men, boys, women, or girls. Maybe they are little people.

Ok, but here is where it really gets crazy. There are 16.4 million aged 17 or younger in poverty. But evidently none of them are girls or boys!

What is the lesson? The little people are no substitute for clarity of expression. The artist is to be commended for attempting to make the numbers more meaningful, but the exercise is doomed from the start. First of all, there is a fundamental difficulty in trying to carve up a total population (those in poverty) into a large number of overlapping sets. To be an accurate depiction, the corrals (technically, we call these Venn diagrams) should contain the correct number of little people, but so also should the intersections between two or more corrals (e.g., Asian and male and living Texas). Easier said than done (and it wasn’t that easy to say). Second, comparisons with state populations are problematic, since most americans have only a dim sense of the population of any state, even their own.

As is so often the case, traditional methods of data representation are perfectly adequate, and much clearer than the sad corrals of little people. Below is my quick draft of a bar chart of the same data. I have used different colors to group the different sorts of comparisons (gender, age, ethnicity), and as sop to the New York Times, included horizontal lines indicating populations of a few states (source

I hope you will agree that though my chart may be conventional, it is clear, and allows the viewer to make the comparisons that the Times felt were important.

The lesson? Beware the invasion of the little people. They look cute, and you figure they are so small they can’t do any harm. But invite them into your graphic, and they can create havoc. Advanced lesson: Venn diagrams are tricky to depict when many categories are involved.


New York Times

The Impoverished States of America


Published: September 17, 2011

State populations in my chart:

A dimension is a terrible thing to waste

Graphs consist of ink spread out over two spatial dimensions. Sometimes the ink is colored, sometimes it is electronic ink, but the point is the same: to use the innate two-dimensional pattern-seeking machinery of human vision to expose some pattern in the data. Because we have only two spatial dimensions, each is highly valuable, and should not be squandered.

Here is a recent graph showing salaries in various life science specializations in the years 2009 and 2010. The graph consists of various vertical bars, each for one specialization in one year, with a height proportional to salary.

The various bars are arrayed along the horizontal dimension. First question: what is the meaning of the horizontal dimension? Take your time. Think carefully. OK, give up? The answer is…nothing! The data are not sorted alphabetically by specialization, by rank in 2009, by rank in 2010, or by any other discernible criterion. This is an example of wasting a dimension. As noted above, spatial dimensions of a graph are extremely valuable, and should never be squandered. They are the levers that we use to lift your consciousness of trends or singularities in the data.

Here we could at least use the horizontal dimension to rank the various specializations, for example by 2009 income. This is illustrated in our cleaned up example shown below. Now we can quickly see the best compensated specializations in 2009. We have indicated the two years with connected colored lines rather than discrete bars. We have labeled the specializations, but in a muted gray so that they can be read but do not distract from the data.

The original graph also has a number of other features worthy of ridicule. What are those little blue flags flying everywhere? Tibetan prayer flags? No, they are labels that tell us the actual numerical value of each salary. Excuse me, if we wanted a table we would have asked for a table. The point of a graph is to let the story be told by shape and position, not by a bunch of numbers. Apart from being unnecessary, the blue tags add clutter, one of the great villains of graph design. Here clutter masks the vital information in the graph, and obscures the important contour: the tops of the bars.

Another odd feature of the graph are the arrowheads at the top or bottom of the bars for 2010. What are they for? Some study will reveal that they are telling us whether that specialization increased or decreased in 2010. Did that jump out at you? I thought not. And of course half the arrowheads are at the top, and half at the bottom, so the eye can never move smoothly across the set and perceive some pattern. In our replacement graph, whether salaries increased or decreased in 2010 is immediately obvious from the position of the two curves.

Other minor quibbles with this graph would be: 1) why is 2009 to the right of 2010? By convention, dates usually increase to the right. 2) Why are the labels in all caps? Is there some reason to shout? 3) The number 2010 in yellow is barely legible. Yellow text on a white background is always a bad idea.


The ScientistLife Sciences Salary Survey, 2010,

%d bloggers like this: