The concept that, more than any other variable, has put the “big” in Big Data, has to be the notion of uncontexted, unstructured, or non-traditional data and the potential it represents. The term “non-traditional” when applied to data generally refers to data that does not easily lend itself to be captured in spreadsheets, tables, or relational databases. Some examples of this type of data include non-relational database data, such as documents, email, instant messaging(IM)/texting, and sensor data, and “signal” data like blogs and social media.
But non-traditional data typically can’t be captured with the same old tools or analyzed with the same old methods. Applying MDM to non-traditional data raises a different set of challenges than when dealing with traditional data. Although you will be asking some of the same questions as you would with traditional data, you may need to use a different approach or involve a completely new perspective to realize Big Data’s potential.
Here are five best practices to keep in mind when applying MDM to non-traditional data:
- Clarify your purpose. What question are you asking? What’s your hypothesis? What’s the problem Big Data is going to answer? Do you want to be proactive or reactive? Are you mining historical data or forecasting and predicting based on patterns or signals? A clearly articulated business goal will focus your efforts and inform how you apply MDM.
- Understand your environment. What flavor of Big Data are you living with? Is it well defined but heavy on transformations? Is it ever changing, fast and furious? Is there still too much “noise” in your social media? Is it coming in from all directions and the volume is quite massive? Understanding what you have is key to implementing MDM effectively.
- Develop a solid foundation. Non-traditional data is mostly unstructured, but MDM relies on constructing a “master record” within a domain. You can’t have Master Data without a “master”. Deciding what can or cannot be mastered from tweets and blogs, documents, and emails, IM and texts, and sensor data will be a challenge. What data elements can be carved out and populated from these non-traditional sources?
- Define your standards. Analytics can be based on metrics, which in turn should be based on enterprise or global standards. But non-traditional data, by definition, is unstructured. If no standards can be developed or applied, parameters, ranges and thresholds for non-traditional data elements should be defined. Analytics on deviations may help you locate a signal within the noise.
- Choose the right tools. What’s the best tool for your enterprise and your processes? Choosing a tool should be the last thing you do; however, for many organizations, it seems to be one of the first. Whether your data is traditional or not, it will benefit your organization to pick the tool that best supports the processes which enable your organization to achieve its business goals and not the one that requires you to retrofit your processes to support the hot, new tool.
What are your best practices for applying MDM to non-traditional data? Let us know in the comments!