With the rise of Hadoop, the demise of the data warehouse seemed only a matter of time. Surprisingly (or not), that’s not the case. Instead, organizations are looking to augment their current enterprise data warehousing solutions with the analytics and cheaper storage that Hadoop brings.
As discussed in my previous blog, data in Hadoop is mainly accessed via programming languages such as MapReduce or Python, a scripting language like Pig, or an SQL-like language (Hive). However, the skillset for data analysis most prevalent in IT shops is SQL, which means that Hadoop will have to support a SQL interface in some capacity to appeal to these people and to the widespread BI tools in existence. However, Hadoop was created to process data in a “batch” mode – you submit jobs to analyze massive datasets stored in HDFS. A number of initiatives and solutions are underway that are focused on marrying SQL with Hadoop. This convergence is truly in progress.
Most of our conversations about information management and other data-focused topics touch on data governance (specifically, what good data governance is and how to leverage it). This may seem like a good place to start; after all, if you’re trying to get the most value out of your data, you want the data to be clean, accurate, and accessible.
However, at Knowledgent, we like to look at things from another angle and ask a different question: Why data governance?
Knowledgent strongly believes all companies have the data required to know their customers a whole lot better. But who out there is developing the necessary customer insights to make an impact on their business? Is Big Data the solution to developing some of those insights? The hype tells us it’ll be just that. But will Big Data play all by itself at the deep end of the pool, or will it need a helping hand? The clear answer is that synergistic technologies are the key.
Since there is such an extreme amount of data available in a Big Data platform, it is important to understand the situational relevance of certain types of data needed to drive business value. Understanding and managing the semantics of the data is centrally important for optimizing its relevance.
Many organizations approach Master Data Management (MDM) by taking an inventory of the data in their source systems, defining policies to improve the quality and usefulness of that data, and building a consolidated hub of that data. This “build it and they will come” approach too often results in an MDM hub that fails to meet the needs of the processes that want to consume that data.
We have just released the 2014 version of our Big Data Ecosystem, our industry-leading reference architecture that categorizes Big Data vendors, tools, and technologies to help you determine the Big Data solution that is right for you.
This year’s Ecosystem features a new Data Security category, which we added in response to increased questions about options available to secure and protect data. Sub-categories also have been added to the Hadoop and NoSQL Data Management categories for faster and more accurate differentiation.