5 Best Practices for Applying MDM to Non-Traditional Data

The concept that, more than any other variable, has put the “big” in Big Data, has to be the notion of uncontexted, unstructured, or non-traditional data and the potential it represents.   The term “non-traditional” when applied to data generally refers to data that does not easily lend itself to be captured in spreadsheets, tables, or relational databases. Some examples of this type of data include non-relational database data, such as documents, email, instant messaging(IM)/texting, and sensor data, and “signal” data like blogs and social media.

But non-traditional data typically can’t be captured with the same old tools or analyzed with the same old methods. Applying MDM to non-traditional data raises a different set of challenges than when dealing with traditional data. Although you will be asking some of the same questions as you would with traditional data, you may need to use a different approach or involve a completely new perspective to realize Big Data’s potential.

Here are five best practices to keep in mind when applying MDM to non-traditional data:

  1. Clarify your purpose. What question are you asking? What’s your hypothesis? What’s the problem Big Data is going to answer? Do you want to be proactive or reactive? Are you mining historical data or forecasting and predicting based on patterns or signals?  A clearly articulated business goal will focus your efforts and inform how you apply MDM.
  2. Understand your environment. What flavor of Big Data are you living with? Is it well defined but heavy on transformations? Is it ever changing, fast and furious? Is there still too much “noise” in your social media? Is it coming in from all directions and the volume is quite massive?  Understanding what you have is key to implementing MDM effectively.
  3. Develop a solid foundation. Non-traditional data is mostly unstructured, but MDM relies on constructing a “master record” within a domain. You can’t have Master Data without a “master”. Deciding what can or cannot be mastered from tweets and blogs, documents, and emails, IM and texts, and sensor data will be a challenge. What data elements can be carved out and populated from these non-traditional sources?
  4. Define your standards. Analytics can be based on metrics, which in turn should be based on enterprise or global standards. But non-traditional data, by definition, is unstructured. If no standards can be developed or applied, parameters, ranges and thresholds for non-traditional data elements should be defined. Analytics on deviations may help you locate a signal within the noise.
  5. Choose the right tools. What’s the best tool for your enterprise and your processes? Choosing a tool should be the last thing you do; however, for many organizations, it seems to be one of the first.  Whether your data is traditional or not, it will benefit your organization to pick the tool that best supports the processes which enable your organization to achieve its business goals and not the one that requires you to retrofit your processes to support the hot, new tool.

What are your best practices for applying MDM to non-traditional data? Let us know in the comments!

2 thoughts on “5 Best Practices for Applying MDM to Non-Traditional Data”

  1. Nice read Mark. I think the most pertinent point about the problem of implementing MDM on non-traditional data is the lack of structure inherent in it. Even with structures and standards in place, we have found that MDM efforts can go haywire. Do you have any suggestions, or for instances about how a ‘master record’ can be identified or constructed in non-traditional data?

    1. Vipul: A couple of thoughts: One relates to more traditional ideas of MDM, which essentially depends on the idea of a key or keys that uniquely identify a record or object. This key is matched against a master key to determine if the key exists or is new. If the unstructured data lends itself to key extraction (for example, identifiers in a JSON object or entities that can be identified by text analytics (entity/name extraction)) in a less-structured document, then this approach works, with the additional step of “extracting” the keys. The other thought relates to the use of more statistical methods where the content of the unstructured source is semantically parsed (keywords, homonyms, synonyms, etc.) and categorized to obtain a “fingerprint”. We have used this method to match and master information found in documents that relate to chemical compounds used in pharmaceutical research.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.