Thursday, October 29, 2009

The problem with averages

For any set of data, there are numerous ways to measure central tendency. Central tendency refers to the middle value. The most common way this is done is to use what is known as the "arithmetic mean", or more commonly the "average". Everyone knows how to calculate an average. Just add up all the values and divide by the number of values. This is how your bowling average is calculated. There are other ways to measure central tendency, such as the median, but I won't get into that. I would just like to make the point that the average is not always the best measure of central tendency.

In bowling, your average is used to establish your handicap. It is intended to represent the central tendency of your level of skill. Someone with a high average is "better" than someone with a low average, right? This idea pervades sports. We apply it to batting averages, field goal or free throw percentages, earned run averages, and the like (none of which are technically averages). Well, this idea that average reflects skill is not necessarily true. Averages can be wildly off, particularly if you are dealing with small sample sizes because unusual values, or outliers, can produce very skewed averages.

Let's say that over 100 games of bowling, you would average around 140. After your first three games, however, you get 139, 210, and 143. You are now sitting on an average of 164. This is not a good reflection of your skill. With time and more games, your average will come down to reality, but it will take a while. The average could also be skewed the other way, say if you rolled 120, 85, and 143.

Why do I bring this up? Well,we have lost 15 of 16 games, and in part this can be attributed to this unfortunate property of averages. During our first week of bowling this season, we (and especially I) had an outlier of a day. We bowled very very well. Except for Johnebob, we all started the season with somewhat inflated averages. This has put us at a major disadvantage. It was not difficult to see this coming (see here and here).

If you look at our team handicap over this time period, it shows the classic signs of being skewed by outlers. For the first three weeks of the season, it was constant because it was based on our averages at the end of last year (very long term averages). In Week 4, it was adjusted based on our bowling over the first three weeks. We started very strongly; I would say too strongly. In all, our handicap dropped 43 pins per game. This was big deal. It meant that we lost 129 pins a night to our final total.

If that handicap was a good reflection of our level of skill, it would have just stayed there, but instead it has been slowly climbing back up to where it should be. In part, our recent losing streak can be attributed to this simple fact. Over this time period, we have lost some close ones. If our handicap was where it should have been we would have a won a few of those games. Instead, we were screwed over by bowling too well to start the season.

That bothers me. I do not like a system that punishes bowling well because it encourages sandbagging. The ideal strategy for a team in the current system would be to intentionally bowl very poorly for the first three weeks, and ride the falsely high handicap to victory for many weeks to follow. Maybe this is what other teams do. I don't know. One way around this problem would be to use a running average that carries over from the previous year. For example, if your average was based on the last 20 times you bowled, it should eliminate this problem.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.