In this era of big data and data lakes, has the enterprise data warehouse become irrelevant?
The Data Warehouse under pressure
Maintaining an enterprise data warehouse is an expensive affair, between software licenses, maintenance costs, consultants and the full time employees required to keep it going. There are several trends unfolding that are putting this expense under scrutiny:
- The proliferation of business friendly data discovery tools that can be used against any data source.
- The increasing flow of data into the enterprise, through digital interactions with customers and suppliers (email, social media, CRM systems, etc.).
- Ubiquitous data capture via the emerging Internet of Things (IoT).
- The availability of enterprise grade Hadoop based systems, which permit the cheap storage and retrieval of massive amounts of data.
- Self-service reporting and data visualization software which enables business users to create and publish reports without IT involvement.
Then you can add to this list the same old issues the business has always had with the enterprise data warehouse: it takes too long to make changes, it is not flexible enough for a fast changing business environment, reports are slow to run, the data is not real time enough or exactly what is needed. With lines of business controlling more of the IT budget, they are looking at alternatives for keeping costs low, and more importantly, for obtaining control and flexibility over the data that they need to drive their business analytics. Cloud software vendors are all too happy to serve up SaaS solutions that do not require any involvement from the IT department.
Benefits of a Data Warehouse
Nonetheless, the data warehouse still has an important role to play. It is a proven technology for accomplishing the following tasks:
- Consolidating, managing and distributing structured data
- Providing data lineage and auditability
- Maintaining data history
- Enforcing data security policies
- Ensuring data integrity and cleanliness
- Being a source for standardized metadata
- Handling the long running SQL queries generated by reporting tools
- Validating data and applying business rules
- Serving as the system of record, or a “single version of the truth”
- Accelerating traditional BI report and OLAP development
- Enabling self-service business intelligence and data visualization for non-technical business users
Reconciling the Data Warehouse and Big Data
While the benefits remain compelling, the data warehouse and the team responsible for it must evolve. The data warehouse strategy should be inclusive of big data, not a plan for running it in parallel. Areas of the business that do not change frequently will continue to source data exclusively from the data warehouse. For example, financial reporting, or any reporting that must be done to demonstrate compliance with legislation or industry regulations. Other areas of the business, such as marketing, are exploring new technologies and have started using massive amounts of data that previously did not exist. They can leverage the structured data from the data warehouse by combining it with fresh and messy unstructured big data. Not all business analytics have to be based on the single version of the truth, especially in areas of innovation, where the “truth” is impossible to define.
Newer data sources are for experimentation. Something might become part of the data warehouse later but first the data scientists and analysts must work with it. Having cleansed data is less important than being able to explore, experiment and test theories. To use an example from marketing, new advertising venues on social media websites provide vast amounts of data but the rules of the game change frequently. For example, Google+ and Twitter were once fast growing rivals to Facebook for advertising dollars but that is not the case now.
The data warehouse team must take on the role of providing data for exploratory purposes, perhaps even to parts of the organization that historically did not require data or had always procured data from external sources. It is also key for the data warehouse team to revisit development practices to be able to keep costs low while delivering blended data to reporting and analytics tools managed by the business. This means looking at modern technologies to get the most out of mature ETL tools. Virtual cloud infrastructure can be used to ramp up processing power for busy times (usually the end of fiscal periods). Agile development methodology should be adopted to deliver more frequent code changes and to increase the engagement of business users during the development lifecycle.
- Big data tools are still maturing so it is not clear what will be their ultimate role alongside the enterprise data warehouse. However, it looks as though big data will be more complementary than overlapping.
- A mature enterprise data warehouse still has an important role in the enterprise but it must evolve to stay relevant.
- The data warehouse team should focus on becoming a data provider, delivering data with varying levels of quality and structure for different purposes, to be consumed by many different groups using an array of reporting, visualization and analytics tools.
David Currie has been helping businesses get the most out Cognos Business Intelligence software since 1999, first as a Cognos employee and since 2008 as an independent consultant. He develops the solution architecture to satisfy complex business reporting and analytics requirements, sourcing data from operational databases, data warehouses and now big data repositories. He blogs about business intelligence and big data at davidpcurrie.com. Connect with him through the blog, LinkedIn or Twitter.