I have often come across this question – at times as a direct question from few of my colleague and also at times as a point of discussion while designing business intelligence system for the clients.

Data warehousing is the buzzword for the past two decades and big data is a hot trend in the recent decade. Let’s find out what could be the answer for this question.

Obviously, first thought for anyone who is technically not much deep into these technologies is that recent big data will replace older data warehousing. An additional reason for this simple thinking is the similarities they offer:

  • Both hold a lot of data
  • Both can be used for reporting
  • Both are managed by electronic storage devices

But still, Big data and Data warehouse are not interchangeable. Why?

What is Data Warehouse?

Data Warehousing is extracting data from one or more homogeneous or heterogeneous data sources, transforming the data and loading that into a data repository to do data analysis which helps in taking better decisions to improve one’s performance and can be used for reporting.

Data repository generated from the process as mentioned is nothing but the data warehouse.

What is Big Data?

Big data refers to volume, variety, and velocity of the data. How big is the data, the speed at which it is coming and a variety of data determines so-called “Big Data”.  The 3 V’s of the big data was articulated by industry analyst Doug Laney in the early 2000s.

  • Volume. Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.
  • Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors, and smart metering are driving the need to deal with torrents of data in near-real-time.
  • Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data, and financial transactions.

Why does any organization want Big Data or Data Warehouses?

  • Big Data: Organizations want a big data solution because in a lot of corporations there is a lot of data. And in those corporations that data – if unlocked properly – can contain much valuable information that can lead to better decisions that, in turn, can lead to more revenue, more profitability, and more customers. And that is what most corporations want.
  • Data Warehouse: Organizations need a data warehouse in order to make informed decisions. In order to really know what is going on in your corporation, you need data that is reliable, believable and accessible to everyone.

Both the above look similar but there is a clear difference. Big data is a repository to hold lots of data but it is not sure what we want to do with it, whereas data warehouse is designed with the clear intention to make informed decisions. Further, a big data can be used for data warehousing purposes.

Why is it like comparing apples to oranges?

Big data and data warehouse are two different things, it is like comparing apple to an orange.

  • A big data solution is a technology whereas
  • Data warehousing is an architecture

A technology, such as big data, is a means to store and manage large amounts of data. Organizations make use of various big data solutions to store a large volume of data at a lower cost.