Traditional Data Protection Principles

To protect private data not violated immoderately, there are already some data protection principles existed.

Consent principle which the data subject (any information relating to an identified or identifiable natural person) has given his or her unambiguous consent. However, the effect is not as good as expected which has been discussed in last post.

Fairly and Lawful processing principle which means in general that processing shall not be contrary to any law. (Legislation) This is a broad principle and there shouldn’t be any doubt on this principle.

Purpose(specification and use) limitation principle which personal data may be collected for specified, explicit and legitimate purposes only and can only be processed(use, sharing, re-use) compatibly with these purposes. (OCED) But how we can apply the purpose limitation principle when many uses of data are not known at the time of collection is the challenge.

Data minimization and Data quality principles which the personal data processed must be adequate, relevant and not excessive in relation to the purposes for which it is collected and processed. (OCED) The collected data, the selection of data sources, the processing and etc. shall be fit and not excessive in relation to the purposes. Personal data must also be accurate and kept up to date and must not be kept longer than necessary for the purpose for which collected and processed. However, the challenges are how we can limit data collection when technology relies on inferences and thus on the potential of massive databases and the concept of adequate data for the purpose of the processing can not be accurate defined in such context.

Transparency principle which the controller has obligation to inform the data subject and notify the national Data Protection Authority prior to any processing activity. (Julia M. Fromholz, 2000) For example, many social media websites use profile setting dashboard to inform users the use of data.

Confidentiality and security of processing principle which implementation of appropriate technical and organizational measures to protect personal data from accidental or unlawful destruction, loss, alteration, unauthorized disclosure or access, or other forms of unlawful processing such as access control, logging, and encryption. (OCED)

As explained above, traditional data protection principles are working well in many situations, but still have some challenges to be met. Communication between data subject and data controller is needed frequently so that private data of data subject isn’t violated and data controller can have adequate data to improve service.



OCED. OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data. From:

Julia M. Fromholz, (2000) The European Union Data Privacy Directive, 15 Berkeley Tech. L.J. 471, 472

Legislation, USA (1992). CABLE TELEVISION CONSUMER PROTECTION AND COMPETITION ACT OF 1992.  Retrieved 18 March 2010.


The Fact of Privacy Protection

To understand privacy in big data, first we have to know what is personal data. “’Personal data’ means data relating to a living individual who is or can be identified either from the data or from the data in conjunction with other information that is in, or is likely to come into, the possession of the data controller.” (Data Protection Commissioner) The principle to treat data is that processing data is based on and covered by one of the legal grounds.




What are the legal grounds? “The data subject has given his or her unambiguous consent” to the processing. The consent should be given before the start of processing and contains active indication of his wishes. It’s should be specific and informed.

The question is “Is everything work fine”. Have we been informed? Yes, we all receive a document before we download an app or use the service from google or Microsoft. But the fact is that “ It would take the average person about 250 working hours every year, or about 30 full working days  to actually read the privacy policies of the websites they visit in a year.”(World Economic Forum Report 2013). Have everyone been reached? Think about google street view. I’ve heard many news of lawsuits from people who is photoed by good street view car without permission. Imagine the photo of your naked body is accessed by anybody who uses goole street view service. Do we really understand the meaning of privacy agreement? Actually, people don’t always make relational decisions in their exact interests.

The next post I will discuss some principles to deal with this situation.



Data Protection Commissioner, from

World Economic Forum Report 2013. (2013) Unlocking the Value of Personal Data:From Collection to Usage.  From

Privacy issues of Big Data

Before starting the blog, it’s very interesting to watch the video down below first.


Believe or not, there are already large amounts of our private information online now and the amount will be more and more with the fast development of technology. If someone can get access all your data online, you are just naked to this person. He or she can take advantage of your data to obtain his benefit and most people are not aware of this situation.

Sometimes it’s even worse. We all know that recently more than one hundred celebrities of Hollywood who use iOS operating system, fell victim to photo leaks. Don’t think that it’s none of our business. Hackers can hack the accounts of Hollywood celebrities, but they can hacker you and me as well. Can you tolerant the privacy violation? Of course not.

To understand the importance of data privacy, check the following video.


The question is “would you like to sacrifice privacy to get more convenience”, or “which extent can you sacrifice your privacy”. It’s predictable that your privacy will be accessible more in the future and the horrible situation could come true some day if there are no laws or regulations to restrict data owner to abuse our private information.



Possible iCloud accounts hacking: Nude Photos of 98 Hollywood Celebrities leaked online

 Is Big Data Useful

Needless to say big data is a hot word in IT and internet industry, however, not really a lot of people know what kind of benefits big data can offer to them and think there is no relationship with them.

No company can live without data. Every company is generating data and the speed of generating data is faster and faster. Whether they can analyze the insight of the collecting data to make fast and correct decision can determine success or failure in competition.

T-Mobile USA integrated Big Data to predict customer defections by combining customer transaction and interactions data. And they claim they are able to cut customer defections in half in a single quater. US Xpress saves millions of dollars in operating costs by collecting and analyzing a thousand data elements for optimal fleet management and to drive productivity. (Kotadia, 2012)

How data can influence our life? The famous story “beer and diaper” can reveal the close relationship. It happens in Wal-Mart supermarket 1990s. The supermarket manager found an phenomenon when analyzing sales data that beer and diaper, which seems there is no relation between them, were often bought at the same time. And this phenomenon happened more in young men. The reason behind the story is that mothers often take care of kids at home and fathers go to the supermarket to buy diapers. When fathers buy diapers, normally they take some bottles of beer. After Wal-Mart found this special phenomenon, they put beer and diapers at the same zone and as a result, this promoted the sales volume of beer.(Mark Whitehorn, 2006)

The nice video down below shows how big data changes our life.



Kotadia. H. (2012) 4 Excellent Big Data Case Studies. From:

Whitehorn. M. (2006) The parable of the beer and diapers. From:




Big data properties

“Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on hand database management tools or traditional data processing applications” -Wikipedia




Just pick up some examples from above picture, 2,000,000 search queries/m processed by google, 684,478 peices of content shared by users of facebook per minute and 3,600 new photos share on Instagram per minute. Data volume increases 44 times from 2009 to 2020 (0.8 zettabytes to 35 zettabytes, gigabytes-terabytes-petabytes-exabytes-zettabytes). The capacity of data is so huge and this is the literal explanation of Big Data. (Spencer.S., 2012)

The information is exploding. The capabilities of digital device is getting more and more advanced, but the price is going down. Between 1990 and 2005, more than one billion people worldwide has entered the middle class. As they are getting richer, they touch more with digital world. There are 4.6 billion mobile phone subscriptions worldwide and more people have not only one mobile phone. This is the source of data explosion. (The Economist, 2010)

The types of data are variable, for example relational data like transaction and electronic health records, text data like document, semi-structured data like XML, wikipedia and amazon, graph data from social networks and disease networks, etc.

Because of the properties of volume and variety of big data, traditional database management system can work well. NoSQL arises.

Data is also generated very quickly and needs to be processed quickly. Opportunity can be missed due to late decisions. Real time analysis is necessary, for example, the marketing effectiveness of a promotion is improved while it is still in play, and analyzing the feedback of product experience from users in real time can improve the product performance and companies can seize advantages over competitors.



Spencer.N. (2012) How Much Data is Created Every Minute?, from:

The Economist. (2010) Data, data everywhere, from:

Jacobs, A. (6 July 2009). “The Pathologies of Big Data”, from:

New Database Models

As we all know database is an organized collection of data. To make operations like definition, querying, update, and administration of database, the special designed software application called Database Management System (DBMS) arises. DBMS helps user to capture and analyze data. It’s classified by database model, for example the most famous model called relational model as represented by the SQL language. The database model is used to determine the logical structure of database and which manner data can be stored, analyzed and manipulated.

Relational database model is based on first-order predict logic which data is represented by tuples and grouped by relations.

However, when data is not structured and relational, relational is not capable to manage such kind of data. Relational database model is not good at adapting the change, as a result, it’s incompatible with agile development. Because of the various data formats such as hierarchies, cubes, linked-lists and unstructured data, it’s not capable to organize data into tables. Relevance, which requires text and data to be stored in document context with links to and from other documents, is also one drawback of relational database model.

Therefore, NoSQL (not only SQL) comes up. NoSQL database management systems enable data to be stored in a variety of formats like key-value store, graph store and document store. NoSQL is call not only SQL is to emphasize that SQL-like query languages may also be supported. But it does not guarantee the true ACID (atomicity, consistency, integrity, and durability) principle. NoSQL database management systems remove hard constraints, such as tabular row store and strict data definition, and have distributed architectures to support high performance throughput .NoSQL database are widely use in big data and real-time web applications.

The latest database management system is called NewSQL. It retains both SQL and ACID, and meanwhile achieves the scalability of NoSQL for online transaction processing (OLTP).