How to reap the benefits of data integration, step by step
In a new book, just out from Technics Publications, data experts Bill Inmon, Patty Haines and David Rapien tackle the essential (but too often avoided) task of data integration. They delve into the benefits of data integration and explore specific categories of integration to create an essential guide for anyone looking to get the most from their organization’s data.
Early in this selection, this selection addresses the critical question of why data integration is so important to organizations of all types.
What are the main processes that take place in a data warehouse? Perhaps the most important is the integration of data. The integration provides the organization with the same unified view of the data. Unfortunately, despite the benefits of integration, no one wants to do the integration. Everyone hates integration, including vendors, data analysts, data scientists, and consultants. So why do people hate data integration so much?
People hate integration because integration requires thought and work—a lot of thought and a lot of work. There are no shortcuts.
A typical vendor tactic is to provide users with a platform and then give those users responsibility for the integration. This process is often referred to as Extract, Load, and Transform (ELT). And what happens when ELT is activated? Due to ELT complexity, integration is often not performed.
E and L get done, but we forget T.
The problem is that to get a truly unified view of data across the enterprise, you need to integrate the data. There are no shortcuts. There are no easy ways out.
So what happens if you don’t integrate data? You have a world full of information silos. A silo cannot communicate or cooperate with another silo. Data exists solely in its silo and you cannot use it anywhere else. They have no way of seeing information across the company.
We need to integrate all types of data: structured, transactional and text data. There is a lot of important data within the organization that we overlook today. And that’s a shame because organizations are missing out on a great opportunity. Businesses need to look at all of their data, not just the data that’s convenient to use.
There have been many unsuccessful trends to avoid data integration:
- Easily create a data mart from applications that use the dimensional model. Who does all the work of building a data warehouse? Go straight from the application to the data mart and skip all data integration work with dimensional modeling.
- Let’s change the definition of a data warehouse. And don’t do any integration because it’s difficult and complex.
- Let’s do ELT instead of ETL. Let’s just skip the T part of the equation.
- Let’s copy operational data to a separate platform and call it “integrated data”. It’s a lot easier than getting in there and unifying the data.
- Let’s bring in big data. I’ve heard that with big data, we didn’t really need to integrate our data. The vendor told us that if we just put all our data in big data, we wouldn’t need a data warehouse.
- Let’s create a data mesh. Who needs all the complications of integration?
- Let’s just put everything in one data lake. Then people can go to the data lake and just find what they want. That’s all there is to it.
Every day vendors create more excuses not to integrate. And every day the problems with isolated systems are getting worse.
If you want to create enterprise-wide, credible data from your business, you need to integrate the data from your silos.
We integrate data on three levels:
Data integration techniques are very different for classic structured data versus text data. Integration with structured data involves a data model, while integration with text involves a taxonomy and other mappings, ontologies, and inline contextualization. And these different tools that support integration are very different.
There are (at least) two important aspects of integration. The first aspect is the mechanics of the integration and the second aspect is the project management of the integration. This book covers both aspects.
Providers and consultants fear integration. The first step in not fearing integration is to understand it. Once you understand the integration, you can rationally start planning the integration.
Would you like to find out more? Read the rest of Chapter 1 by downloading this PDF.