All posts by Ariel Fabian

Deterministic versus Probabilistic Matching in Big Data

“Information is the new oil” is the latest trend, and like oil, crude data needs to be refined before it can be consumed. In other words, having big data won’t serve any purpose unless the data is good enough to be useful. With the potential for mismatching, duplication, and other quality threats from ingesting data across disparate sources, ensuring the accuracy and quality of data is more important than ever.

This is where big data meets Master Data Management (MDM). Based on the concept of “better to be safe than sorry,” MDM users can apply data matching techniques to resolve some data quality conflicts. Applying these techniques enables users to determine the data that is “most likely” to be correct, and if not perfect, at least at a “Fit to Purpose” level of quality. This post discusses two matching techniques, Deterministic Matching and Probabilistic, or “Fuzzy,” Matching, in the context of big data. Continue reading Deterministic versus Probabilistic Matching in Big Data