Introduction
Data Warehouse (DW or DWH) is a central repository of organizational data, which stores integrated data from multiple sources. To develop and manage a centralized system requires lots of development effort and time. Also, there will always be some latency for the latest data availability for reporting. The traditional integration process translates to small delays in data being available for any kind of business analysis and reporting.
These days, we are observing changes in data behavior, which is driving changes in business needs. Data is generated in high volumes, with high velocities and in many varieties, for example, structured, unstructured, semi-structured. It is becoming challenging to support the new data behavior and business growth using traditional methods of DWH design and development. New, modern Data Warehouse design patterns are required to develop and leverage the latest technology components.
The modern DWH design helps in building a hub for all kinds of data (for example, structured, unstructured, semi-structured, or data streaming) to initiate integrated and transformative solutions like Business Intelligence (BI) and reporting, real-time analytics and predictive analytics. To achieve all these goals and to support modern designs, Microsoft has introduced a set of fully managed, cloud-based services such as Azure Data Factory, Azure SQL Data Warehouse, Azure SQL Database and Azure Databricks, etc. All these fully managed services not only support modern DWH design patterns but also provide the advantages of inbuilt scalability, high availability, good performance, and flexibility.
Pattern of Modern Data Warehouse
The traditional DWH and BI system design used to be straight forward. It primarily has a standard set of design layers like Data Intake, Data Transformation and Storage, and Data Consumption and Presentation layer. Any standard and traditional DW design is represented in the image below:
Figure 1- Traditional DWH + BI System Design
The modern DWH brings together all kinds of data, at any scale, without much effort and time, to get insight through operational reports, analytical dashboards, and advanced analytics for all users.
Microsoft Azure provides a set of fully managed services, which allow you to build modern DWH in a few minutes. All Azure services support a fully cloud based solution, or a mix of cloud and on-premise based solutions, to meet the business need.
A modern Data Warehouse can be designed to meet business need and accommodate change in data behavior using the latest technology components such as cloud based scalable data storage for big data, real time analytics, predictive analysis and machine learning, global distribution of data, high availability, etc. Some of the Modern Data Warehouse design patterns are as follows:
Modern Data Warehouse: This is the most common design pattern in the modern data warehouse world, allowing you to build a hub to store all kinds of data using fully managed Azure services at any scale.
Advanced analytics on big data: This modern design pattern consists of actionable insights, using machine learning tools along with other characteristics of the Modern Data Warehouse design pattern. This design pattern helps in building and deploying custom machine learning models at scale.
Real-time analytics: This modern design pattern helps in getting insight from live stream data. This design allows you to capture data continuously from IoT devices or any web log and process it in near-real time.
Design and Components of Modern Data Warehouse
The Modern Data Warehouse combines all types of data, like structured, unstructured and semi-structured data (sensor logs, IoT, and media streaming) using Microsoft Azure Data Factory to Microsoft Azure Data Lake or Azure Blob Storage. Once data is stored in Data Lake or Blob Storage, data can be cleansed and transformed and perform scalable analytics with Azure Databricks. These analytics can help users and businesses to understand the behavior and then cleansed and transformed data can be moved to Azure SQL Data Warehouse to merge with other existing data and build an integrated data source.
Figure 2 Modern Data Warehouse Design
Once integrated data is available the data can be accessed and moved using Azure connectors. Also, operational reports and other analytical dashboards can be built on top of Azure Data Warehouse. These reports and dashboards derive insights from the stored data and use Azure Analysis Services to understand the data trends. Even, ad hoc queries can be executed directly on data within Azure Databricks and publish dashboards using Power BI.
Some of the key Azure technology components that help to design Modern Data Warehouse:
Azure Data Factory, is a hybrid data integration service that can create, schedule and orchestrate ELT workflows; workflow is also known as a pipeline. A pipeline consists of three steps – Connect & Collect, Transform & Enrich, and Publish.
Azure Data Lake Store or Azure Blob Storage, is the most cost effective and easy way to store any type of unstructured data.
Azure Databricks, an Apache Spark-based analytics platform.
Azure SQL Data Warehouse, is a fast and flexible cloud data warehouse. A massive parallel architecture with compute and store elastically.
Azure Analysis Services, Azure based analytics as a service that govern, deploy, test, and deliver a BI solution
Power BI, a suite of business analytics tools, which connect to hundreds of data sources, simplify data prep, and provide ad hoc analysis.
Summary
The modern DWH is needed to support the growing business needs and changes in data behavior. Also, there are several other factors that make today’s DWH as “Modern DWH”. The other factors are the use of Hadoop with Machine Learning, Near Real Time Data processing using Lambda architecture, a Hybrid solution (cloud integration with on-premise solution), Global Distribution of solution, and Self-Support Deployment, etc.
In this article, we discussed the design of Modern Data Warehouse. In the next article, we will discuss advanced analytics and the real time analytic design of Modern Data Warehouse.