Once upon a time I hated statistics and avoided related courses in my undergrad. But soon I realized that it is the most useful science field if you want to learn the facts about real life. It is what people use to deceive you. Usually we have some checks-balances, and we can use a bit of common sense to see through lies. A lie of this kind is this: “Recent research works show that 100% of people who died in last three months have been drinking water daily”. Water is killing people! Does that lie seem too obvious? Well, there are similar statistical lies that you see everyday but you cannot identify them because your defense mechanisms quickly dissolve when the subject that you are lied about is beyond your reach and you cannot check facts.
It has been a long time since i hated statistics, and nowadays i think that each kid should be given a course in statistics before he grows up and faces information pollution. Books such as how to lie with statistics (http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728) should be essential reading.
From that book, you can learn about statisticulation (statistical manipulation). I will explain it with an example.
Take this infographics from CNN. It shows the Change in population living with HIV between 2001 and 2009.
Is not Turkey just terrible? It seems that Turkey is worse than many African countries, and all European ones. Poor people in Turkey must be dying from AIDS all over the country, right? The red color just makes everything look more dramatic. There is up to 200% increase in the number of HIV infected people in Turkey.
At this point, you would like to know how many people are infected. After all, Turkey has a population of 75 million, so judging from the graph there can be millions of deaths. That would help you realize the hoax, so they do not show you any numbers.
This graph above is the definition of statisticulation. I cannot think of a better example. If you are from a country around Turkey, or better, if you are from Turkey, you would get this eerie feeling that the graph is somehow misleading because Aids is not really a problem in Turkey.
Let’s turn our attention to facts. There is this useful site indexmundi which gives AIDS results in population percentage. In Turkey between 15-49 ages, one in every thousand people has AIDS. African rates for some are 200 times higher. Western countries are really in big trouble with AIDS rates. So how did CNN give higher risk scores to Turkey?
CNN shows the increase in percentage. So if you had 1 person infected in 2001 and 2 infected people in 2009, you will have 100% increase. If another country had 1 million infected people (some really has, unfortunately) in 2001, and 0.95 million infected people in 2009, they will have a decrease so their color will be green. There goes the rich red color. Turkey is very risky!
How do you like this lie? Do you think CNN infographics just come from an incompetent staff member, or are they really sugarcoating the AIDS problem? You are presented with such graphs everyday so make sure that you read the description.
Regardless of all, you would expect CNN to hire a good data scientist and present clear graphics.