Hadoop is a cost effective solution for Big Data. We keep hearing this. But what is the real cost of Hadoop for Big Data Analytics? How economical is it compared to a traditional RDBMS system?

A typical Hadoop cluster is a collection of machines, each being the Master node or slave node or a client machine. Interestingly, unlike RDBMS systems, the machines used in the Hadoop cluster can be of commodity hardware and not necessarily enterprise class. The Hadoop software framework implements enough fault tolerance techniques to handle failures in commodity hardware.

Scaling a RDBMS might require upgrading the available hardware or buying more RDBMS servers of enterprise class. The cost of a RDBMS software is also usually very high.

However scaling a Hadoop system just requires adding more commodity hardware whose overall cost is going to be very much lesser than RDBMS systems.

Now that we keep saying Hadoop infrastructure is far more economical compared to RDBMS systems, how much cheaper is it exactly? Let’s get down to the numbers:

Cost of an RDBMS system for 1 TB of Data – $10,000 to $15,000

Cost of Hardware (a processor, a network card and few hard drives) for a Hadoop System – $4000

Clearly the difference is massive.

However this cost does not include the cost of software, maintenance cost, installation cost, employee salary etc. These costs are not negligible. Let’s make another estimate including these numbers as well to get a more realistic comparison.

Hadoop Systems

Assuming the cluster has 100 nodes, cost of each node is $4,000.

Hadoop qualified engineers are paid really high. Let’s assume on an average, an annual salary of $150,000/engineer.

Let’s assume that Apache open source free version of Hadoop is deployed.

Now, based on these assumptions, amortizing the cost for a period of 3 years, we get the following estimate per hour.

  • Hourly hardware cost (over three years): $15.21
  • Hourly maintenance cost: $17.11

That comes out to an operational cost of about $32 per hour for the entire system.

RDBMS Systems

Assuming an RDBMS system of similar size.

An Oracle database machine with 168 TB of storage costs $650,000

Its software costs $1.68 million and hence the number $14,000/ TB.

Assuming the annual salary for an Oracle database administrator is $95,000.

Now, based on these assumptions, amortizing cost for a period of 3 year, we get the following estimate per hour.

  • Hourly hardware cost (over three years): $88.60
  • Hourly maintenance cost: $10.27

That comes out to an operational cost of about $99 per hour for the entire system.

We see that the RDBMS systems are nearly 3 times costlier than the Hadoop system of similar size.

We have talked about how Big Data Solutions using Hadoop can save big bucks for you. However Hadoop is not necessarily a replacement for RDBMS systems. The RDBMS systems still has its strong place in transactional data management. Recommendation is to consider deploying Hadoop infrastructure along with existing Database management systems to exploit the best from both worlds.

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.

Jigsaw’s Data Science with SAS Course – click here.
Jigsaw’s Data Science with R Course – click here.
Jigsaw’s Big Data Course – click here.

Suggested Read:

 A Two Minute Guide to the Top Three Hadoop Distributions

Machine Learning in Hadoop

SHARE
share

Are you ready to build your own career?