The Hadoop Tutorial: Comparison of Hadoop with other systems

In my previous post I explained what is Hadoop and the need for Hadoop. In this post I'll compare Hadoop with other existing systems.

Hadoop vs RDBMS:

Disk latency has not improved proportionally to disk bandwidth i.e seek time has not improved proportionally to transfer time. RDBMS uses B-tree for data access which is limited by disk latency, therefore it would take large time to access majority of data. Hadoop uses MapReduce model for data access which is limited by disk bandwidth. Hence for queries involving majority of database B-tree is less effecient than MapReduce.

RDBMS is more efficient for point queries where data is indexed to improve disk latency. Whereas Hadoop's Mapreduce is more efficient for queries involving complete data. Moreover Mapreduce suits applications in which data is written once and read many times, whereas in RDBMS dataset is continuously updated.

MapReduce RDBMS

Size of data Petabytes Gigabytes
Integrity of data Low High
Data schema Dynamic Static
Access method Interactive and Batch Batch
Scaling Linear Nonlinear
Data structure Unstructured Structured
Normalization of data Not Required Required

These difference are likely to blur in near future.

Hadoop vs Grid Computing:

Grid Computing has been doing large scale processing by dividing a job over a cluster of systems. But it is efficient for only compute intensive jobs. For data intensive jobs huge data has to be transferred over the network and the network bandwidth becomes the bottleneck. This is where Hadoop outperforms Grid Computing. Mapreduce tries to locate computations on the node where data resides thus saving network bandwidth. This is called as principle of locality which lies at the heart of MapReduce.

Moreover Mapreduce saves the programmers from writing code for node failure and handling data flow as these are handled implicitly by MapReduce.Whereas Grid Computing provides great control to handle data flow and node failures.

Thus we can say that Hadoop is not a replacement for RDBMS and both these systems can coexist simultaneously.

The Hadoop Tutorial

Sunday, July 7, 2013

Comparison of Hadoop with other systems

Hadoop vs RDBMS:

MapReduce RDBMS

Hadoop vs Grid Computing:

No comments:

Post a Comment