When it comes to business information, Chief Information Officers (CIOs) and Chief Data Officers (CDOs) are tasked with bringing order to the chaos.
As organizations collect more and more data, they face both commercial pressures to do more with the information they store and increasing regulatory burdens to manage data, particularly in relation to customers.
The situation is further complicated by the range of tools available to store and manipulate data, from data lakes and data hubs to object storage, machine learning (ML) and artificial intelligence (AI).
According to a survey by memory manufacturer Seagate, up to 68% of business data remains unused. As a result, organizations are missing out on the benefits that data should provide. At the same time, companies are exposed to regulatory and compliance risks if they are not clear about what data they store and where.
To address this complexity and make data “work” for the business, companies need to review their data architecture. At its most basic level, data architecture is about knowing where the organization’s data resides and mapping how data flows through it. However, given the wide variety of data sources and ways in which data can be manipulated and used, there is no single blueprint. Each organization must build a data architecture that is appropriate for its own needs.
“Data architecture means many things to many people, and it’s easy to get drowned in an ocean of ideas, processes, and initiatives,” said Tim Garrood, data architecture expert at PA Consulting. Businesses need to ensure data architecture projects deliver value to the business, he adds, and that requires knowledge and skills, as well as technology.
However, part of the challenge for CIOs and CDOs is that technology is driving complexity in both how data is managed and how it is used. As management consultancy McKinsey put it in a 2020 paper, “Technical advancements — from data lakes to customer analytics platforms to stream processing — have tremendously increased the complexity of data architectures.” This makes it difficult for organizations to manage their existing data and create new ones to provide functions.
The move away from traditional relational database systems toward much more flexible data structures — and the ability to capture and process unstructured data — gives organizations the potential to do far more with data than ever before.
The challenge for CIOs and CDOs is to tie this opportunity to the needs of the business. Building a data architecture should be more than just a housekeeping or compliance exercise.
“I like to ask the question: What can we achieve with better data, what could be different?” says Garrood of PA Consulting. “Unless it’s about an articulated business issue, this is the next stop.” This is followed by physical data architecture, data flows and the integration of data sources and applications.
What is data architecture?
Data architecture is often described as a blueprint for data management. Of course, an effective data architecture must map the flow of information through the organization.
This, in turn, requires a good understanding of the data being collected and stored, the systems that store it, and the regulatory, compliance, and security rules that apply to the data.
Businesses also need to understand which data is critical to operations and which delivers the most value. As companies store and process more and more information, this becomes increasingly important. Sometimes it’s more art than science.
“There is an art to understanding that there are few principles to truly adhere to and understanding which data is key to the business,” said Tim Bowes, associate director for data engineering at data consulting firm Dufrain. “Enterprises have vast amounts of data floating around, but not all of it is absolutely critical to a successful operation. Knowing what data is important is fundamental.”
Data architecture must be linked to the organization’s data strategy and its data lifecycle – but it also relies on sound data management.
Often organizations divide their data architecture into two parts: data delivery and data consumption or usage.
Nick Whitfield, KPMG
On the supply side, CIOs and CDOs must deal with data sources including transactions, business applications, customer activity and even sensors. On the consumer side, companies are examining their reporting, business intelligence, advanced analytics, and even ML and AI capabilities. Some companies will also try to further exploit data by reselling it or using it to develop new products.
The relative importance of these parts will shape the data architecture.
Consulting firm KPMG, for example, applies what they call a “Four K” framework to data architecture—create, curate, consume, and commercialize.
According to Nick Whitfeld, the company’s UK head of data and analytics, creation and curating fall on the supply side, separate from consumption and commercialization. Each side may need its own data architecture.
“I don’t think a company can have a unified, homogeneous data architecture,” he says. “I think there are different types of data architectures for different purposes.
“It’s more than just a data model. It is the collection of processes and the governance framework, the underlying technology and the data standards. Together these ensure that data is well organized and controlled so that it flows precisely through your business processes.”
Why and how to implement a data architecture
The drive to create or update a data architecture can come from either changes in technology or changes in the business.
Changing a core component of an organization’s IT or analytics systems presents an opportunity to take another look at data flows. And the move to cloud technology offers a way to update data flows without requiring a “lift-and-shift” replacement of systems. Instead, changes can be made from application to application or from project to project.
“Part of the data architect’s role is to paint a picture of what the benefits can look like,” says Garrood of PA Consulting. “But it’s also about identifying what needs to change and what new flows need to be added to the pipeline.”
The switch from data warehouses to data lakes also supports this, since data should no longer be tied to specific applications.
“Companies have a lot of new sources and data at their disposal,” said Roman Golod, CTO and co-founder of data ops company Accelario. “They not only need to move to continuous integration between different sources, but also to new technologies, including web services and the cloud.”
Golod notes that most, maybe 80% of its customers are still running on-premises systems. But new capabilities are increasingly coming from the cloud or hybrid technology.
This allows organizations to revisit that all-important design or data flow, identify new data sources, and perform more advanced analytics, ML, and AI.
But before they do that, companies need to get their data house in order.
Data quality and master data management are not necessarily part of data architecture, but good quality data is still essential to deliver the business outcomes of an architecture project. Experts who have worked on large-scale data architecture projects say that connecting disparate systems can often uncover data quality issues that previously went unnoticed. And a clear understanding of which datasets are the master or “golden” data is essential if the organization is to trust the decisions that come from advanced analytics or ML/AI tools.
This is even more true when organizations have large numbers of systems, including legacy architectures and systems that have accumulated technical debt. As KPMG’s Whitfield points out, one of his clients in the oil and gas industry had more than 1,500 data integrations. Integrating these data points into a data lake, for example, raises practical, compliance, and data standards issues.
“This range of information needs to be managed based on the information type, and therefore the underlying datasets need to be managed as well,” he says. “On the one hand, you have data that needs to be heavily controlled, heavily governed, very, very consistent, and largely untouched. On the other hand, you give data scientists access to large pools of data and let them go and explore whatever they want. The fact of the matter is that the data architecture needs to cover both ends of this spectrum, which is no easy task.”
Data experts recommend an iterative approach or looking at the data architecture on a project or business case basis. Otherwise, there is a risk that the work will become unmanageable and bring no business value. However, this has yet to be incorporated into the overall data model that the company is working towards. This will always be a challenge – too many small projects come with their own risks, with different data standards and isolated information silos.
Data architecture and business case
Nonetheless, investing in a data architecture can provide a significant and sometimes rapid return on investment.
Organizations will make better use of the data they have and will be better able to take advantage of new and emerging platforms and applications, including AI and the cloud. And, as Dufrain’s Bowes points out, an updated data architecture gives organizations a better view of their customers, not least by enabling the connection of data from cloud and software-as-a-service (SaaS) systems to existing data stores.
Organizations can also leverage data architecture to manage technical debt and ensure data collection and retention policies are compliant. But ultimately, it’s about unlocking the value of the data the company has already spent money on collecting.
“Basically, it serves to model the world that we see and to represent that world in a way,” says Garrood of PA. “Basically, it’s still about modeling entities and the relationships between them. It boils down to the same basics of being clear about what you are trying to achieve.”
However, this also requires corporate governance and ongoing management or even curation, as well as a willingness to leverage the new data and insights.
“There’s no point in having data architecture, a nicely documented thing, without proper data leadership,” says KPMG’s Whitfield. “It’s clear that better insights are valuable, and there’s a lot of business opportunity there. But how we physically organize our data is a relatively small part of it. It’s about the business case leadership, the right operating model, the right governance framework, the right tools, and then the right culture.”