View Big Data Rationally

Big data is a hot topic. But we should keep rationality to understand big data. Big data is not perfect.

Sample deviation always exists and big data does not exceed statistics. First what is sample deviation? One good example is from the second world war. The British royal air force wanted to strengthen aircraft armor to defend against anti-aircraft fire of German army. As the limit of carrying capacity, they could only strengthen part of armor. So they asked a statistician for help. After careful observation of ammunition mark from planes returning to airport, he gave a surprising conclusion that strengthen the part without ammunition mark. He explained that the planes who had ammunition mark in that part had crashed. Statistics is using part to speculate the whole or past to predict future.

However the biggest weakness is also sample deviation, as it can cause conclusion failure when part speculates the whole. In the era of big data, sample deviation still effects the accuracy of conclusion. Due to the reason of technology and benefit, the collected data of big data cannot cover every scene link. Besides, even though we know the past data, the world is always changing.

The conclusion of big data is an overall conclusion, not individual conclusion. Even though the accuracy is 99%, there are still million mismatches if the number of samples is 100 million.

Big data can get correlated conclusion instead of causal relationship. For example, with the rise of sales volume of ice cream, the number of drowning people rises. They are in positive correlation relationship. So can sales of ice cream cause drown? Of course not. That’s because the hot weather can increase the sales of ice cream and the possibility that people play in water at the same time.

As a conclusion, big data has its own limitations. Pure data analysis cannot guarantee the correct conclusion. Most times, traditional analysis approaches and experience should be considered together with big data.


