Thursday, November 11, 2010

Percentages can mask underlying data

Yesterday, SocImages posted a graph from Yglesias' Think Progress blog:

One of the comments on the SocImages post was by Kemlyn:
So, the 31-64 crowd is not particularly interesting?
To which I responded:

Kemlyn's question ("So, the 31-64 crowd is not particularly interesting?") is crucial. Why? If you add the percentages in 2008, you get ~34% (~18%+~16%=~34%). If you add the percentages in 2010, you get ~34% (~11%+~23%=~34%). This implies that the percentage of the "31-64 crowd" didn't significantly change.

This relative lack of change, then, magnifies the underlying trend in the data, since any change in the percent of the "under 30" group is effectively added to the "over 65" group. In other words, there is effectively double-counting in the changing percentages. What is needed is to look at the actual number of voters in each age group to determine how the overall voter demographics shifted.

Doing a back-of-the-envelope calculation using these presented numbers as well as the total voter estimates, I find this:

2008: 130million voters * 18% = 23.4million "under 30"
2008: 130million voters * 16% = 20.8million "over 65"

2010: 90million voters * 11% = 9.9million "under 30"
2010: 90million voters * 23% = 20.7million "over 65"

So, yes, the demographics did change between 2008 and 2010, but there was no significant jump in the total number of over-65 voters, which is what the graph can easily imply. Instead, the number of voters in the "under 30" category declined by ~58%, while the "over 65" group effectively didn't decline at all (<1%). For example, on Yglesias' blog:
According to exit polls, for example, the relative proportion of youth voters and senior voters shifted quite dramatically

While this is true, the underlying message of "voters over 65 turned out in the same proportion in both 2008 and 2010" is hidden by the presented graph.

I dislike this kind of graph because it obfuscates the underlying numbers by comparing relative values, instead of absolute values. True, it is important to know how the voters represent the population as a whole, but the manner in which the above graph does this is (at best) poorly rendered.

UPDATE 1 (11/11/10 12:07 PM): Here is a chart showing the estimated voter numbers:

This graph uses the numbers I calculated above. As you can tell, the estimated number of "Over 65" voters remained roughly the same between 2008 and 2010, while the number of "Under 30" voters dropped dramatically. If Yglesias had used this chart, the message of "the relative proportion of youth voters and senior voters shifted quite dramatically" becomes less important, since we don't have to worry about what the relative numbers mean, since we can tell the absolute condition: fewer "Under 30" voters showed up at the polls, while the number of "Over 65" voters remained constant.

Similarly, the conclusions a casual viewer might draw from this graph would be different than with the one presented by Yglesias.

UPDATE 2 (11/11/10 1:22PM): Upon further consideration, I made a graph showing the change in voter turnout between the 2008 and 2010 elections, including the 31-64 crowd.

Here, the narrative becomes relative, but only between years, and not relative between voter groups.

No comments: