Whether mandated by regulatory considerations, driven by executive dashboards, or meant to enable personalized targeting of marketing messages to consumers, the rapidly increasing reliance on analytics has made Data Quality a higher priority than ever before. In turn, this new status has reshaped the very meaning of Data Quality. There was a time when Data Quality really meant one thing: a simple, binary assessment of the accuracy of data. That was the beginning and end of the Data Quality discussion. Today, however, the questions have grown more complex.
From “Is my data correct?” to “What does my data actually mean?,” the questions surrounding Data Quality are undergoing a rapid transformation. This change has been driven by four major factors:
- The disruptive data flow and data model concepts of Big Data and unstructured data. The concepts of Data Quality must now take into account the greater variety, volume and velocity of Big Data. Data Quality rules and metrics, for example, must be applicable to a range of data types, including semi-structured and unstructured data, across huge volumes of data. The application also may need to be tailored for real-time or streaming content that is constantly consumed.
- The explosion of commercial, open-source and home-coded analytics tools and their underlying data integration methodologies. Users now have access to a range of emerging tools and methods that can be used to analyze data. As a result, the ways that the quality of data can be measured, controlled, monitored and reported has evolved to address this vast array of tools and methodologies.
- The new emphasis on user- and context-driven organization and curation of information. The data in user-driven applications frequently ranges to high-quality, well-curated structured datasets to unstructured, crowdsourced, low-quality data. Similarly, aggregations of data for a specific purpose or with an eye to how the data will be used may have a flexible definition of “quality.” Is Data Quality still important for these applications?
- The ability to measure Data Quality performance and impact. As data quality management tools have proliferated and matured, it has become easier to get a handle on measuring, controlling, monitoring, reporting, visualizing, and trending data quality across the enterprise. Data Quality management platforms that allow for the creation of reusable data quality rules that can then be executed across the enterprise on multiple platforms provide an almost limitless solution for monitoring and reporting data quality across the enterprise. This has dramatically increased the visibility of data quality at executive levels.
So what is Data Quality, if it has evolved to no longer be just about the accuracy of data? In response to this evolution, Knowledgent defines Data Quality as the consistent availability, interpretation, and accuracy of current and future data flowing throughout the organization, including ease of integrating new sources, types and volumes of data over time.
As a result of this transformation, figuring out how to achieve high Data Quality has become a much more nuanced and confusing challenge. Organizations looking to effectively manage data quality need both an operating model tailored to their data management needs and the appropriate tools and technologies to execute it. And the technology selection is actually the easier part of the equation. The increased importance and complexity of Data Quality management demands that the data management program be much more clearly defined. (Check out our white paper, “How to Build a Successful Data Quality Management Program” for best practices for managing quality.)
Are there other factors behind the transformation of data quality? Share your thoughts in the comments.