This is the second part of our two-part blog series on why data lake data governance is different. To read part 1, please click here.
Data Lakes promise speed and agility with regard to change management. Companies can quickly ingest new and ever larger sources of data and start working with the data. The tendency is to brush governance aside if it viewed as something that will slow down agility. With the advent of the technology advances that a Lake introduces – and, especially with the elastic cloud technologies that are “serverless”, enterprises will start to explore Real Time Streaming, Big Data storage, advanced data analytics and modeling.
Governance will need to keep up with technology trends such as real time streaming and rapid ingestion. This means that the cataloging process and the data quality monitoring processes need to fit within the fabric of Data Lake and Cloud Technology. Governance will remain proactive – “e.g. – All data must be cataloged in the Raw zone before moving to curation” – but it will have to establish the second lines of defense such as “auto detection of data ingested into the Raw zone without having been cataloged in the acquisition process – what? Signal the alarms”.
The same goes for Data Quality where real time streaming of data may end up on a client’s report – “Is it accurate?”. The ever complex and evolving technology stack linked to Cloud and Data Lakes will require an equally responsive evolution of metadata technology, data quality monitoring technology, and leaner, more agile data governance processes to keep up with the velocity and volumes of data we will see in the future.
Data Governance – the People, Process, and Technology – will need to keep up with the evolving changes in technology and more importantly in a manner that is non-invasive and capable of enabling pro-active as well as fail-safe data governance in a rapidly changing environment. This is both science and art.