MySQL vs Redis vs MongoDB vs HBase vs Neo4j

1.1 MySQL

MySQL is the second most widely used relational database management system. As it’s open source, nothing needs to be paid for using it. It uses a standard form of the well-known SQL data language and can run on many operating systems and embedded in many programming languages.(MySQL online manual)

1.2 Redis

Redis is one of  the most popular key-value store database.  The data type of keys can be not only simple strings, but also hashes, lists, sets, sorted sets and bitmaps, with a number of server-side atomic operations associated to these data types, which atomic operation means that the processing of read or write operation is not disturbed by other read or write operations until it is completed.

The data sets of Redis are kept in memory so that it can achieve very high writing and reading speed with the limitation that data sets can’t be larger than memory. Meanwhile, the memory representation of complex data structures is much simpler to manipulate compared to the same data structure on disk, as internal complexity is reduced. The complex data sets are simply stored as a value if they are not necessary to be mapped into relational data structure.  (Redis FAQ)

Redis supports master-slave replication which is simple to use and configure. This makes Redis servers be exact copies of master servers so that it won’t be an issue if even few lost records are not acceptable in the application. Data is written asynchronously. If there is system crash, the last few queries can get lost. However, this is acceptable in many applications. (Redis Manual) Redis is recommended to run on Linux.

1.3 MongoDB

MongoDB is a document-oriented database. MongoDB supports document data with the format BSON (binary JSON).  The data model of JSON provides seamless mapping to native programming language types. As the schema is dynamic, it’s flexible to evolve the data model.

For simple query, the speed in MongoDB where the relational data sets are simply kept in a single document is faster than that in a relational database where relational data is separated into multiple tables. Join operations in relational databases are eliminated in MongoDB.

MongoDB supports indexes. Indexes can make the execution of queries more efficient. MongoDB has to scan every document in a collection to meet the match of query statement without indexes. All indexes in MongoDB are B-tree (Comer, 1979) indexes. In this case,  only the smallest possible number of documents is scanned, which optimizes the performance of query. Figure6 simply illustrates how indexes work in MongoDB.

mongodb

Figure 1. Illustration of how MongoDB indexes work

MongoDB also has property of ease of use with features of easy installation, configuration, maintenance.  MongoDB can run on Linux, Windows and OS X. (MongoDB online manual)

1.4 HBASE

HBase known as Hadoop database, is a column-store oriented and scalable distributed store system with high reliability and high performance. HBase can be used to set up large-scale structured storage clusters on cheap PC servers.

HBase is an open source after Google’s BigTable. The reason that HBase is called Hadoop database, is because HBase relies on Hadoop HDFS as its file storage system, Hadoop MapReduce to realize massive data processing and Zookeeper to do coordination. Figure7 illustrates the role of  HBase in Hadoop ecosystem.(Taylor, 2010)

hadoop ecosys

Figure 2. Diagram of Hadoop ecosystem

As shown in table 1, the data model of HBase contains “table” and “column family”.

hbase

Table 1. Data model of HBase

Table – Collection of rows. When table becomes larger and larger with the increasing number of records, it will spit into multiple parts called regions.  RowKey is the unique primary key of the table.

Column family – Collection of columns. One table can have one or multiple column families and one column family consists of any number of columns. Column family supports dynamic scaling without predefined number or types of columns.

1.5 Neo4j

Neo4j is a graph-store database. Neo4j reveals the relationship of data sets. The data model of Neo4j is also schema less.

Neo4j expresses queries as traversals. A graph is traversed by visiting the nodes following relationships under some rules. In reality, not the whole graph is visited, as not all nodes and relationships are interested in the graph. Neo4j supports fast deep traversal instead of slow SQL queries. Figure8 illustrates how traversal works.

neo4j1.png

Figure 3. Illustration of how traversal works.

Neo4j also supports indexes. It’s quite useful when a specific node or relationship can be found based on its property and the speed is faster than traversal which needs to loop up the entire graph. The figure9 illustrates how index look up works in Neo4j.

neo4j2.png

Figure 4. Illustration of how index look up works in Neo4j.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s