Please check github: https://github.com/yafei002/ar_head_detection
Please check github: https://github.com/yafei002/ar_head_detection
Please check github: https://github.com/yafei002/webvr_emotion_detection
Discrete points of geographic space including longitude and latitude coordinate, no size property.
Line data with the property of length, is the data to connect two or more points.
Regional data contains more information than point data and line data. It can express regional properties like population intensity and per-capita data.
Map can carry various types of complex information, which can be formed as multi semantic map.
陈为 沈则潜 陶煜波. (2013). 数据可视化. 电子工业出版社. (第六章）
CAP is first published by Eric Brewer in 2000 and is the basic theorem that describes any distributed system. (Brewer, 2000) Figure10 explains what CAP stands for.
Figure 1. CAP stands for Consistency, Availability and Tolerance to network Partitions.
The truth is that it’s impossible to meet all the three guarantees. If consistency is focused, the failure of write operation caused by system unavailability may happen. If availability is focused, It’s possible that read operation can’t get the latest value of write operation. It’s the focus of system who determines the strategy and normally a combination of two guarantees are chosen.
Figure 2. Databases pick up two guarantees from CAP
Figure shows how different databases discussed in this paper make their choice. Caution should be taken that each database picks up two guarantee, but it does not mean it losing everything in the third guarantee.
The explanation of possible selections is discussed as following.
To guarantee the database transactions, a set of properties of ACID is applied.
ACID stands for:
In order to handle transactions and keep data safe, traditional RDBMS works following the principle of ACID. NoSQL databases on the other hand need to guarantee the availability and scalability for storing lots more data and using a distributed set of servers working together. ACID in this case cannot guarantee this property and then BASE comes up.
BASE stands for Basically Available, Soft State and Eventually Consistency. (Mapanga & Kadebu, 2013)
Figure 3. Eventual consistency model
Figure 4. Strong consistency model
On the contrary to eventual consistency, strong consistency is quite different.
BASE model is against to ACID model, as it sacrifices high consistency to achieve availability or reliability. Relational database is designed based on strong consistency guarantee. To some extent, it loses scalability and performance. In current modern companies like Google, Amazon, Twitter etc., scalability availablity and performance are concerned more, as a result, most NoSQL databases follows BASE model.
Assuming a data record is accessed by two users, there is no problem when they read the data record at the same time, but what will happen if they update the data record simultaneously? Collision happens then. Concurrency control is used to deal with the issue which allows multiple users to access the same data simultaneously.
Concurrency control in databases is normally working with transactions. A Transaction is a series of data operations that have to follow ACID guarantee. Most relational management systems like MySQL support transaction, as users of RDBMS consider consistency and integrity of data as high preference. However, not all modern new databases support transaction such as MongoDB. If MongoDB operates on a single document, operations are always atomic. But operations on multiple documents are not atomic. In this case, multi-document transactions can not be executed. Fortunately, MongoDB can apply two-phase commits to offer transaction-like semantics. (MongoDB online manual – Perform Two Phase Commits). Whereas, NoSQL databases discussed in this paper Redis, HBASE and Neo4j support transaction.
If interleaving operations are allowed, without proper concurrency control, problems such as the lost update problem, the dirty read problem and the incorrect summary problem can occur.
Figure 5. The lost update problem
The mechanism of concurrency control categories (Bernstein, Hadzilacos & Goodman, 1987):
The selection of two lock categories depends on the application requirements. If concurrency doesn’t happen often and dirty read problem is not allowed, pessimistic lock is preferred. If there are a lot of concurrency issues, pessimistic lock can degrade the performance significantly, as a result, optimistic lock is preferred.
There are a lot of specific concurrency methods (Bernstein, Hadzilacos & Goodman, 1987).
Read lock – If one user applies read lock on one object, other users can also read the object. But any write operation of the object is blocked. Figure 15 illustrates how read lock works.
Figure 6. A flow chart to explain how read lock is performed.
Write lock – If one user applies write lock, other users can not read nor write on the object. Figure16 illustrates how write lock works.
Figure 7. A flow chart to explain how write lock performance
Consider two concurrent writes shown in figure 17.
Figure 8. Two writes to the same row
The performance of MVCC based on row lock operation is illustrated in the following figure 18.
Figure 9. A flow char to explain how MVCC is performed.
Considering the features of different databases, the methods of concurrent control is varied.
Bernstein, P. A., Hadzilacos, V., & Goodman, N. (1987). Concurrency control and recovery in database systems (Vol. 370). New York: Addison-wesley.
Brewer, E. A. (2000, July). Towards robust distributed systems. In PODC (p. 7).
Gray, J., & Reuter, A. (2007). Transaction processing: concepts and techniques, 1993.
Mapanga, I., & Kadebu, P. (2013). Database Management Systems: A NoSQL Analysis. International journal of Modern Communication Technologies and Research, 1(7).
Database is an organized collection of data. To make operations like definition, querying, update, and administration of database, the special designed software application called Database Management System (DBMS) is necessary. DBMS helps the user to capture and analyze data. It’s classified by database model, for example the most famous model called relational model as the data model is relational. The database model is used to determine the logical structure of database and which manner data can be stored, analyzed and manipulated.
Relational database model is based on first-order predict logic which data is represented by tuples and grouped by relations.
However, when data is not structured and relational, relational DBMS is not capable to manage such kind of data. But relational database model is not good at adapting the change. Because of the various data formats such as hierarchies, cubes, linked-lists and unstructured data, it’s not capable to organize data into tables.
As a solution, NoSQL (not only SQL) comes up. NoSQL database management systems enable data to be stored in a variety of formats like key-value store, column store, graph store and document store. NoSQL called not only SQL is to emphasize that SQL-like query languages may also be supported. But it does not guarantee the true ACID (atomicity, consistency, integrity, and durability) principle. NoSQL database management systems remove hard constraints, such as tabular row store and strict data definition, and have distributed architectures to support high performance throughput. NoSQL databases are widely used in big data and real-time web applications.
Relational databases are the most popular and widely used databases. The data model organizes data as tables or relations. Each table consists of rows and columns which is illustrated in figure1. Each row has an unique key.
Figure 1. An example of relational database model
Different from NoSQL databases, the data model of relational databases is fixed and the data is structured. It supports transaction management and guarantees true ACID principle. Relational database management system (RDBMS) which is based on relational model has been developed for several decades and still dominates current database market. It’s widely deployed in banks, schools, hospitals, governments and so on due to its properties.
Key-value store is one of the most simple database management system. Data is stored by key and value illustrated in figure2, and can be retrieved when the key is known so that the complex querying and management functionality of RDBMS is not needed.
A string can represent the key and the actual data can be represented by value. The data can be any kind of data types in programming language such as string, integer, array and so on, or an abstract object which bindings to the key. The data model is flexible so that the requirement for the formatted data is less strict.
Figure 2. An example of Key-Value database model
Compared to common SQL databases, it contains the advantage of fast speed in storing and retrieving data. This happens when relations, correlations or collations of data are not necessary. An SQL table can be organized into two columns, a key and a value. In this case, for querying, just find the value and return it. This is very fast.
Specifically speaking, SQL language has great advantages of dealing with structured data and allows highly dynamic queries. However, for current web applications, it’s another case. It’s an object oriented way of thinking such as the back-end database of MVC (Model-View-Control) pattern, instead of a highly dynamic range of queries which are full of outer and inner joins, unions and complex calculations over large tables. Meanwhile, it will result in complex hierarchies of tables if relational data models are transferred into object oriented models because of large amounts of normalization. For key-value store databases, the data model is schema less and an object can be just represented by a value with a key to identify the object. Therefore, the storage of arbitrary data indexed using a single key to allow retrieval is allowed. That’s why key-vale store is also called simple store.
The code tends to look clean and simple compared with embedded SQL strings in the programming language. As for object-relational mapping frameworks, a lot of complex code between an SQL database and an object-oriented programming language will be added.
Data is stored in the document store databases with the data format such as XML, PDF, JSON etc. The document contains a unique key “ID” to identify a document explicitly and a collection of documents. The example of the data model is illustrated in figure3. Documents of document store databases are similar to records in relational databases. The difference is that the data model in document oriented databases is more flexible as its property of schema-less. New documents, no matter which kinds of attribute are contained, can be stored as adding new attributes in existing documents at runtime.
Figure 3. The left figure is an example of document format of JSON and the right figure is an example of document format of XML.
Unlike relational databases whose records inside the same database have same data fields, document databases have the property that document may have similar as well as dissimilar data. The data model of document databases is slightly more complex than that of key-value databases, which instead of key-value store, the data model of document store can be represented as key-document pairs. If the database has a lot of relations and normalization, it’s not appropriate to use document database. Instead document stores are used for content management system, blog software etc.
Column store databases support the standard relational logical data model. Databases consist of a collection of tables and each table has a named collection of attributes which are columns instead of rows for relational data model. Attributes can form a unique primary key or foreign key referring to another primary key in another table.
The most different point of the two kinds of databases is that the data model of the relational databases is row oriented, however, on the contrast, the data model of column store databases are column oriented. Figure4 can illustrate the difference simply and clearly.
Figure 4. The upper figure illustrates the row- oriented data model and the lower figure illustrates the column-oriented data model.
From figure4, in a row-oriented database management system, the data would be stored as “1, John Smith, 19; 2, Jim Green, 18; 3, Lucy King, 16; 4, Freda Ford, 15”. Whereas in a column-oriented database management system, the data would be store as “1, 2, 3, 4; John Smith, Jim Green, Lucy King, Freda Ford; 19, 18 ,16, 15”.
The column store databases store the data with the way to be aggregated rapidly with less I/O activities and offer high scalability in data storage. They are efficient in applications including customer relationship management (CRM) systems, electronic library car catalogs, data warehousing and other ad-hoc query systems.
Graph store databases are common used to handle relationships due to their efficient management of heavily linked data. (Neo4j) The data model of graph contains nodes representing entities which hold proper types and numbers of properties like key-value pairs. Figure5 is an example of the graph data model. The connection between two nodes are revealed by directed, named semantic relationships. The relationships also have properties such as know, own, like etc. Two nodes can have not only one number or type of relationships , but many if there are. As relationships are stored efficiently, this won’t sacrifice performance.
Figure 5. An example of graph data model. The ellipses represent nodes. Each node is a data entity with types and values. The arrows represent connections with relationships and their properties.
MySQL is the second most widely used relational database management system. As it’s open source, nothing needs to be paid for using it. It uses a standard form of the well-known SQL data language and can run on many operating systems and embedded in many programming languages.(MySQL online manual)
Redis is one of the most popular key-value store database. The data type of keys can be not only simple strings, but also hashes, lists, sets, sorted sets and bitmaps, with a number of server-side atomic operations associated to these data types, which atomic operation means that the processing of read or write operation is not disturbed by other read or write operations until it is completed.
The data sets of Redis are kept in memory so that it can achieve very high writing and reading speed with the limitation that data sets can’t be larger than memory. Meanwhile, the memory representation of complex data structures is much simpler to manipulate compared to the same data structure on disk, as internal complexity is reduced. The complex data sets are simply stored as a value if they are not necessary to be mapped into relational data structure. (Redis FAQ)
Redis supports master-slave replication which is simple to use and configure. This makes Redis servers be exact copies of master servers so that it won’t be an issue if even few lost records are not acceptable in the application. Data is written asynchronously. If there is system crash, the last few queries can get lost. However, this is acceptable in many applications. (Redis Manual) Redis is recommended to run on Linux.
MongoDB is a document-oriented database. MongoDB supports document data with the format BSON (binary JSON). The data model of JSON provides seamless mapping to native programming language types. As the schema is dynamic, it’s flexible to evolve the data model.
For simple query, the speed in MongoDB where the relational data sets are simply kept in a single document is faster than that in a relational database where relational data is separated into multiple tables. Join operations in relational databases are eliminated in MongoDB.
MongoDB supports indexes. Indexes can make the execution of queries more efficient. MongoDB has to scan every document in a collection to meet the match of query statement without indexes. All indexes in MongoDB are B-tree (Comer, 1979) indexes. In this case, only the smallest possible number of documents is scanned, which optimizes the performance of query. Figure6 simply illustrates how indexes work in MongoDB.
Figure 1. Illustration of how MongoDB indexes work
MongoDB also has property of ease of use with features of easy installation, configuration, maintenance. MongoDB can run on Linux, Windows and OS X. (MongoDB online manual)
HBase known as Hadoop database, is a column-store oriented and scalable distributed store system with high reliability and high performance. HBase can be used to set up large-scale structured storage clusters on cheap PC servers.
HBase is an open source after Google’s BigTable. The reason that HBase is called Hadoop database, is because HBase relies on Hadoop HDFS as its file storage system, Hadoop MapReduce to realize massive data processing and Zookeeper to do coordination. Figure7 illustrates the role of HBase in Hadoop ecosystem.(Taylor, 2010)
Figure 2. Diagram of Hadoop ecosystem
As shown in table 1, the data model of HBase contains “table” and “column family”.
Table 1. Data model of HBase
Table – Collection of rows. When table becomes larger and larger with the increasing number of records, it will spit into multiple parts called regions. RowKey is the unique primary key of the table.
Column family – Collection of columns. One table can have one or multiple column families and one column family consists of any number of columns. Column family supports dynamic scaling without predefined number or types of columns.
Neo4j is a graph-store database. Neo4j reveals the relationship of data sets. The data model of Neo4j is also schema less.
Neo4j expresses queries as traversals. A graph is traversed by visiting the nodes following relationships under some rules. In reality, not the whole graph is visited, as not all nodes and relationships are interested in the graph. Neo4j supports fast deep traversal instead of slow SQL queries. Figure8 illustrates how traversal works.
Figure 3. Illustration of how traversal works.
Neo4j also supports indexes. It’s quite useful when a specific node or relationship can be found based on its property and the speed is faster than traversal which needs to loop up the entire graph. The figure9 illustrates how index look up works in Neo4j.
Figure 4. Illustration of how index look up works in Neo4j.
Reflection and doubt to big data have never stopped. They are mainly in two sides: big data itself and ethic problems created by big data.
In the last post, I discussed the problems causing by big data itself. There are some congenital defects in big data. As big data is used to predict the whole by part or the future by past. However, there is always deviation existed in this prediction.
Besides, new digital divide may occur. Indeed, big data can improve the decision-making efficiency. But in the meantime, challenges of privacy, interoperability between the systems, not perfect algorithm and so on can be accumulated in developing countries. Big data needs matched infrastructure such as facilities designed for large scale distributed data intensive work, high efficient store facilities, network facilities for fast large data set importing etc.
(google’s secret data center)
Another non-ignorable problem is ethic issues which is mainly about the privacy problems which I have discussed several posts previously. Rethink it by the most classic case in big data era. In the early 2012, one American burst into the Target shop near his home and angrily questioned the manager:” how could you send baby diapers and discount coupon of bassinet to my daughter? She’s only 17!.” The manager apologized immediately. However, after one month, this angry father called back and apologized as his daughter was really pregnant. Not only Target, but also google, yahoo, apple,twitter, advertisement companies, data analysis companies,software companies etc. are collecting users’ private data. How to protect public privacy from violation is a big challenge in the future.
Big data is a creature of the age and will have significant impact on current society. How to improve the accuracy of decision making and face on problems causing by big data needs efforts from all sectors of society.
The New York Times. 2012 The Age of Big Data FROM
The Economist, 2010, Data, data everywhere. FROM
Capella.J., Horne.A., & Shah.S. (2012) Good Data Won’t Guarantee Good Decisions. From
Danah.B.. 2010. Privacy and Publicity in the Context of Big Data.WWW. Raleigh, North Carolina
Big data is a hot topic. But we should keep rationality to understand big data. Big data is not perfect.
Sample deviation always exists and big data does not exceed statistics. First what is sample deviation? One good example is from the second world war. The British royal air force wanted to strengthen aircraft armor to defend against anti-aircraft fire of German army. As the limit of carrying capacity, they could only strengthen part of armor. So they asked a statistician for help. After careful observation of ammunition mark from planes returning to airport, he gave a surprising conclusion that strengthen the part without ammunition mark. He explained that the planes who had ammunition mark in that part had crashed. Statistics is using part to speculate the whole or past to predict future.
However the biggest weakness is also sample deviation, as it can cause conclusion failure when part speculates the whole. In the era of big data, sample deviation still effects the accuracy of conclusion. Due to the reason of technology and benefit, the collected data of big data cannot cover every scene link. Besides, even though we know the past data, the world is always changing.
The conclusion of big data is an overall conclusion, not individual conclusion. Even though the accuracy is 99%, there are still million mismatches if the number of samples is 100 million.
Big data can get correlated conclusion instead of causal relationship. For example, with the rise of sales volume of ice cream, the number of drowning people rises. They are in positive correlation relationship. So can sales of ice cream cause drown? Of course not. That’s because the hot weather can increase the sales of ice cream and the possibility that people play in water at the same time.
As a conclusion, big data has its own limitations. Pure data analysis cannot guarantee the correct conclusion. Most times, traditional analysis approaches and experience should be considered together with big data.
In the previous post, big data has the future possibilities which can excavate the reason behind, be applied in traditional region and reshape labor relation with market demand, but that’s not all of big data. Big data can change the world in more industries and regions.
Love and marriage model can be transformed. Individual can get accurate match based on big data analysis. Couple’s hobby, special talents, financial situation, profession and so on can be excavated deeply and matched accurately.
Traditional family model may be reshaped. People will be grouped by data instead of region. People with similar data trait can live together to realize resource integration and high efficiency of lifestyle.
Big data may even create the next lady gaga. Social media has great impact on sales of songs and albums. Peoples make comments and share their favorite music on twitter, facebook and youtube. By tracking this data online, we can know people’s concern, present popular points and which singer’s awareness is promoting gradually. And by considering all the characteristics, the next lady gaga can be predicted.
If you can identify how much energy a person or a building uses, you can reduce its consumption. From sensors, devices and the web, a massive amount of data is suddenly emerging, which taps into energy data and then results into a whole new meaning. As a result, the tools of big data can some day be a fundamental way to help the world curb energy consumption.
For more possibilities, check the following video:
Fehrenbacher.K. (2012)How big data can curb the world’s energy consumption. From
Higginbotham. S. (2012) Can gigabytes predict the next Lady Gaga? From
Staff. E. (2012) 10 ways big data changes everything From
Big data is a new and hot word in recent years. As the trend of global digitization and networked characteristics, some people even call this “ the fifth wave of science and technology”. However, big data is still a young kid, there is strong potential of development.
The way of analyzing data will be smarter and deeper. The result will not only the relationship of targets, but the reason why it’s so. Think of the market of diapers. To know what sells best and what ranks best/worst, thousands of reviews are collected and analyzed. By text analytics, three questions can be answered: why did it sell so sell, why people did not like it and what do they want. For example, if words like “price”, “special” and “value” are mostly frequently mentioned and then are further analyzed, this may tell manager the reason customers buy diapers not because of the quality or features, but price. And managers can make new strategy to sell diaper to increase sales volume and get more profit.
Sensors may be existed everywhere. As the break through of technology, sensors are more and more miniaturized. They can even put into human body to detect chemical environment and subtle changes of organs. As a result, the source collection of data is more diversified and Big Data can be applied more widely.
Big data will not only be used in new areas, but in the future, it will be applied to more traditional areas. It can analyze the status of soil to help improve agriculture farming, keep the coordination of supply and demand and mine the new growth point, make traffic system smarter avoiding traffic jams and reducing accidents, etc.
Traditional labor relation may be reshaped. Through big data platform, human resource and customer demand can be matched more accurately so that the individual potential can be put more to good use and break the barriers of region, language and culture.
Thank for the existence of big data, I believe society efficiency will be improved much more and people’s life will be much easier and more comfortable.
Spiegel. B. (2014) The future of big data, from
Park. H. (2014) Bigger, Better, Faster, Stronger: The Future of Big Data. From
Ohio state university, Big Data Future, from