Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Sunday, February 20, 2011

Sunday Numbers 2.0, Vol. 1: Return to football and mortality.

Over the past few weeks, I've received e-mails from readers saying they missed the Wednesday Math posts. I was a little surprised, but I did do 130 of them, so I could see how folks felt like it was part of the routine.

Right now, prepping classes is cutting into my precious blogging time during the week, so I am going to resurrect the Wednesday tradition on Sundays instead, calling it Sunday Numbers 2.0 to distinguish these new posts from the first set of Sunday Numbers in 2008 that used my system called Confidence of Victory to predict the result of the presidential election, predictions that were remarkably close to the actual landslide electoral victory of Obama over McCain, thank Odin, Krishna and the li'l baby Jesus.


About a year ago, I did a couple posts on professional sports and mortality. This week, a fellow named Jim Zimmerman who runs oldestlivingprofootball.com added a comment to the thread from last year, so I gave his site a visit.

A website full of numerical data sorted in an easy to understand way.

Honestly, I couldn't be happier if you sent me beer.

I took a couple hours to get the data, change the dates of birth and death to years instead of specific days and sorted it both by year of death and year of birth. Here are some of my early findings.



How has the Age of Steroids effected mortality of NFL players? This chart shows the average age of players who died in the years from 1980 to 2010. The general public started noticing steroid use in the late 1990s in baseball, but the premature deaths of John Matuszak and Lyle Alzado a decade earlier made steroid use in football a topic of conversation then. I took the average age of football players who died in the years from 1980 to 2010, and as we can see, the general trend is upwards, as it is for the public in general.

There is a fact that skews the in favor of longer life expectancy in more recent years that has nothing to do with improved health. More football players are living to be 90 or more because there more professionals as time goes on. If a man died in 1980 and he was more than 90, he had to be born in 1890 or earlier. If a man in his nineties died last year, he was born between 1910 and 1920, and probably played football in the 1940's or 1950's. There are more football players in that era than in the earlier era simply because the league started in the early 1920s.


Are more football players dying young as we move forward in time? Again, I looked at the years of death 1980 to 2010. If someone died in 1980, they likely played the game in the era from 1950 to 1980, which was not the age of steroids. For those who died young in 2010, their playing days would have been in the era of 1980 to 2010 when steroid use is assumed to be more prevalent. Looking at the graph to the left, we see the percentages of NFL players dying before the age of 50 fluctuates quite a bit year by year and the trendline (or line of regression) is almost flat. More than that, the correlation coefficient is incredibly weak, so the data does NOT let us state that steroid use has been a significant factor in premature death of the population of NFL players.

How do the ages at death of NFL players compare to the general population? For this question, I needed some sample from the general population that would be fair to compare to the list of NFL players who died in 2010. My method was to look at recent obituaries from the Associated Press. Both these data sets would be very different from a list of the deceased at a hospital because neither of the lists of celebrated people are going to have any infant deaths or deaths of teenagers. In the A.P. obituaries, I excluded anyone whose celebrity was being the Oldest Living Person, and I only took the obits that mentioned the age at death in the first paragraph. Here are the statistics I used for my tests.

2010 deaths of former NFL players
n = 134, average = 73.28, standard deviation = 16.82
% under 50 = 10.4%
% between 50 to 59 = 9.0%
% between 60 to 69 = 19.4%
% between 70 to 79 = 16.4%
% between 80 to 89 = 32.8%
% over 90 = 11.9%

100 deaths from A.P. obituaries, late 2010 to early 2011
n = 100, average = 77.65, standard deviation = 14.99
% under 50 = 5%
% between 50 to 59 = 7%
% between 60 to 69 = 13%
% between 70 to 79 = 23%
% between 80 to 89 = 28%
% over 90 = 24%

5% of celebrities who died were under 50 compared to 10.8% of football players. Is that significant? Good question, hypothetical question asker. With sample sizes this small and splitting into two groups for each set, under 50 and 50 or over, the chi-square test does not give us a test statistic that reaches even the 90% significance level. (test stat = 2.278, 90% threshold = 2.706.)

If instead we do a chi-square test and split both data sets into six categories, (Under 50, 50-59, 60-69, 70-79, 80-89, 90 and over), we get a test stat that does cross the 90% threshold, but not the 95%. (test stat = 10.368, 90% threshold = 9.236, 95% threshold = 11.071). The categories that add the most to the test stat are the over 90 numbers, and this can be at least in part attributed to league expansion. In 1960, the American Football League began, effectively doubling the number of professional football teams. When the leagues merge in 1970, there were 26 teams. There are now 32, but this increase is not as significant as the big jump ten years earlier.

The general celebrity list had an average age at death four years higher than the NFL list from 2010. Is that difference significant? Yes, it is. The test statistic t = 2.093 does get above the 95% significance threshold. Part of this is because of the greater percentage of celebrities dying over the age of 90 than football players over 90 dying, and again that can be partly attributed to league expansion. If instead we try to factor this out by looking at only the deaths at ages of 89 or less, of course the average ages of both groups go down dramatically. Here are the new numbers for those two data sets.

2010 deaths of former NFL players, 89 and younger
n = 119, average = 70.52, standard deviation = 16.0

deaths from A.P. obituaries, late 2010 to early 2011, 89 and younger
n = 76, average = 72.45, standard deviation = 13.32

Besides the averages going down, the difference goes from 4 to 2, the data sets get smaller and the standard deviations shrink slightly. The shrinking standard deviations would tend to increase the test statistic, but that small pressure to go up is overwhelmed by the smaller difference and sample sizes. The test statistic t = 0.911, which is not statistically significant at all.

What if we leave the NFL players alone and remove half the over 90 deaths from the A.P. list? Hypothetical, that's not a bad idea of how to adjust the data to compensate for league expansion. Let's give that a shot.

2010 deaths of former NFL players
n = 134, average = 73.28, standard deviation = 16.82

100 deaths from A.P. obituaries, late 2010 to early 2011, half the over 90s removed
n = 88, average = 75.65, standard deviation = 14.66

The new test stat is t = 1.11, not statistically significant at these data set sizes.

==

I want to thank Jim Zimmerman once again for maintaining this very nice website for mortality statistics of former professional football players. I still have all the info in an Excel file, so there may be more data mining in the future for my Sunday Numbers 2.0 posts.

Whew, that's lotsa 'splainin'. Glad it's a Sunday.

Next week: Perpendicular.

Saturday, March 13, 2010

Do football players die younger than the rest of us?


I was wandering around the 'Net this week when I ran into a report that pro football players die significantly younger than the general public, and that each year a pro stays in the game takes several years off his life. You can read such reports here, here and here, some written by doctors and others quoting studies by doctors. The article from Newsmax starts with this paragraph.

It is not a widely disseminated, downloaded or discussed fact that the average life expectancy for all pro football players, including all positions and backgrounds, is 55 years. Several insurance carriers say it is 51 years.

Wow, that's written by doctors and backed up by insurance companies. It must be true. Except that other doctors say it's 59.

I know a little about life expectancy from teaching statistics and this set my spider sense tingling. For one thing, people kept quoting different numbers. For another thing, life expectancy in the 50s is what you get in the absolute worst parts of sub-Saharan Africa, and a large part of those numbers being so low is high rates of infant mortality skewing the numbers down. The data for pro sports athletes won't be influenced at all by infant mortality. You have to live past your first birthday to play pro sports.

I don't have the time to compile massive sets of data, so I designed an experiment to see if these numbers had any validity. I took the 1960 rosters from professional football and professional baseball, selected 100 guys from each sport who where born in 1934 or 1935, then checked on Wikipedia to see if they were dead or alive. If a name didn't come up, I went to the Baseball Almanac or nfl.com. I used Wiki first because its search engine is way better. Also, nfl.com just says "deceased" instead of giving a date of death, and I only had to go there for the most obscure players, a total of three guys.

Since these athletes were between the ages of 24 and 26 half a century ago, we would expect by actuarial probability that some would be dead by now, and of course this is correct. 29 of the 100 NFL veterans born 1934 or 1935 are listed as dead as of March 13, 2010, compared to 30 of 100 MLB players. a completely insignificant difference. The ages at death are slightly different but not that much.

decade NFL MLB
20-29__1__0
30-39__0__1
40-49__1__2
50-59__5__4
60-69__15__10
70-76__8__13

Baseball players who died in this study are more likely to have died after the age of 70, while more football players are somewhat more likely to have died while in their 60s. Even so the average age of death for football players was 63 and baseball players was 64. While these numbers seem low, recall that all the guys on this list will eventually die, and 70% on both lists will be over the age of 74 when they go, so the averages will be over 71 at the absolute minimum, probably several years higher.
If I may critique my own study, it makes sense to do a similar data set for guys born in the mid 1920s and mid 1940s to see if baseball vs. football mortality rates show a greater disparity for those different demographic groups. But this first data set makes the idea that football players have an average life expectancy under 60 to be very far fetched indeed.

To the doctors who put their names on these studies, I'll make you a deal. I won't perform any surgeries or prescribe any drugs, and you should stick to addition, subtraction, multiplication and division, because it turns out math is hard for some of us.

But not me.