If data is the new oil, then organizations must establish their data refineries. High-octane data is the fuel for creating strategic advantages in data-driven organizations, leveraging analytics, and developing machine-learning models. Leaders must consider a proactive data governance model to validate data quality, establish master data sources, and improve data catalogs. Instituting data governance requires defining organizational responsibilities, standardizing data practices, forming partnerships, and maturing the use of multiple data ops platforms.
The problem for many leaders is understanding where to get started and how to make prudent investments that address legacy data processing issues and longer-term objectives. Large organizations have a mix of different business needs and opportunities to leverage analytics and machine learning, but there is an oversupply of platforms sporting disparate features and marketing jargon-rich capabilities.
Leaders often have many questions on improving data integration and quality, including:
- Can you improve data accuracy and reliability with a data quality tool or a master data management (MDM) platform?
- Do you need a data integration, ETL, or data streaming platform?
- How should leaders evaluate their organizational capabilities, business needs, and platforms when there are disparate needs?
The more challenging data refinery capabilities are on data accuracy and reliability, so here are three things you should know about data quality and master data management platforms and practices.
[ For more on MDM, read Why master data management is key to digital transformation. ]
1. Data quality vs. mastering data
Data quality tools provide a mix of data profiling, automation tools, and exception-handling workflows to address different data quality issues. Some common data quality issues include physical address cleansing, deduping customer records, and normalizing fields used to categorize data.
MDM tools help organizations create and maintain golden records of primary entities such as accounts, contacts, products, and other reference data. These tools enable data stewards to define rules on merging conflicting data from multiple sources into cleansed records. Data scientists and application developers can then use the mastered data in machine-learning models, analytics, and integrated applications via APIs.
Organizations striving to make data a strategic advantage often require data quality and master data management, but instrumenting both may be overly cumbersome for organizations getting started on their data governance journey.
So I offer a few simple considerations.
Organizations with lots of internal and external data sources that have never been profiled or cleansed may choose to start with a data quality platform. Leaders can then prioritize a backlog of data cleansing work to an agile data team making incremental data quality improvements.
On the other hand, if citizen data scientists and application developers require connections to a small number of primary data entities, then an MDM platform may be a better starting point. The MDM platform then ensures that teams have a consistent data source, and the platform provides two-way integrations with systems that consume and update records.
2. Improving data quality requires agile practices and automation
There is a misconception that plugging in a data quality tool auto-magically fixes all the underlying data issues. Unfortunately, data debt can only be addressed incrementally, especially with fast-moving data sources, data extracted from documents and other unstructured data sources, or data entered through forms without adequate data validations.
There is a separate misconception that improving data quality can be one hundred percent automated. The reality is that there is an incremental cost to automate rules around exceptions and outliers, and providing off-the-shelf tools for data stewards to address them can be a more prudent approach.
Improving data quality requires agile data methodologies so teams can implement automation against defined problems in prioritized data sources. It then requires operational processes, tools, and metrics for data stewards to address issues and exceptions. Unfortunately, there’s no free lunch for managing data quality.
3. Buying data models with master data management platforms
Master data platforms form a centralized hub of consumption and cleansing activities. For example, a customer 360 platform that integrates with the enterprise’s CRM, ERP, marketing automation tools, content management systems, and customer support tools may pull and push data to any of these systems. When a customer calls customer support to change their billing address, the support system pushes this change to the MDM, and then other systems receive the updated information.
But today’s MDMs are more than data clearinghouses. They are the centralized data platforms for mastered data, and MDM platforms often come with out-of-the-box and configurable data models. Using these data models can save organizations significant development expertise and time when the models are industry-specific, incorporate best practices in data modeling, and enable model enhancements with no-code experiences.
Both data quality and MDM tools have come a long way over the last few years as more organizations invest in citizen data science programs, analytics, and machine learning. When leaders align organizational responsibilities, agile data practices, automation, data operations, and data ops platforms, then this establishes the data refinery organizations need today to compete.
[ How can automation free up more staff time for innovation? Get the free e-book: Managing IT with Automation. ]