By Stewart Bond
Published on September 21, 2021
I grew up in a family that did a lot of camping in recreational vehicles. My dad had this uncanny ability to go somewhere once, and never need a map to get back there again, so we never relied on maps when we went to our favorite campgrounds close to home. Once or twice a year we would do longer trips, going to places throughout North America, places my dad had never been, and so we relied on maps – paper maps! We also relied on a citizen’s band radio in our vehicle to hear about traffic conditions and, on occasion, took the advice of strangers on how to get around traffic jams.
Compare this story to what we have today: smartphones with traffic navigation apps that provide us with turn-by-turn directions that continuously try to find the optimal path based on traffic conditions on the route being traveled. I like to think that I inherited my dad’s travel memory, but I will also be the first to admit that even if I know where I am going, I will use the navigation app for the benefit of route optimization based on current traffic conditions.
When we work with data, we know where we are starting from, often with a destination in mind, but the path we take is not always the fastest because we may not have all the necessary intelligence to optimize our route. Modern data environments are highly distributed, diverse, and dynamic, many different data types are being managed in the cloud and on-premises, in many different data management technologies, and data is continuously flowing and changing – not unlike traffic on a highway.
Not only is the data distributed, diverse, and dynamic, but so are the people working with data. At IDC, we refer to data-native workers as a generation, Generation Data, or Gen-D for short. Gen-D is not a chronological generation but a vocational one. It includes those that have “data” in their title, but also people in the business that work with data on a daily basis as a part of their job function. With the 3 D’s of data and data-native workers, connecting data producers to consumers in an optimal way requires a new approach, an approach called DataOps. DataOps is a combination of technologies and methods with a focus on quality, for consistent and continuous delivery of data value. It is people, process, technology, and data — more importantly, metadata.
Metadata provides technical and behavioral intelligence about data that can be used to decide where to start and how to plot a course to the destination, including who needs to be involved and who can offer assistance if roadblocks are encountered. Despite the rise of importance in metadata to DataOps and data-driven decision making, in assuring the right data is being used at the right time, by the right resource, and for the right reason(s), organizations sometimes put metadata management software lower on their priority list.
DataOps is not DevOps for data, but ambiguity of the term is persistent. According to a 2021 IDC survey of Gen-D personas involved in DataOps, 85% claim to have had DataOps in place for more than a year. However, we see DevOps methods being more prominently used in what these respondents believe is DataOps. DevOps is about the continuous motion of development, test, and deployment of applications. DataOps is about the continuous development, testing and deployment of data products, and the data itself. Continuously testing of data definitions, values, and context of data flowing within pipelines against acceptable tolerances, policies and thresholds can stop bad data from being used to make decisions and protect against data governance and compliance exceptions.
Another key differentiation is in how the term “continuous” is used. DevOps sprints deliver application capabilities in weeks or at the most, months. Data is always flowing and changing, and therefore continuous in the context of DataOps needs to be synonymous with real-time. Similarly, governance of DevOps is concerned with governance of code and testing, requiring version control. DataOps governance is concerned with governance of the code and testing of data products, and it is also concerned with governance of the data. Continuous governance requires continuous collection and analysis of metadata to optimize the path and control the path between data producer and consumer, keeping within governance and compliance guardrails.
Data intelligence in DataOps can help organizations draw the map of data flows throughout their environments, safely and optimally navigating data-driven journeys in the following areas:
Intelligence about who is producing and consuming data will foster collaboration and good data culture.
Intelligence about where data is, what it looks like, what it means and how it is flowing to combat the 3 D’s and help Gen-D visualize data pipelines from producer to consumer.
Intelligence about the quality and value profile of data will help deliver trusted data solutions.
Intelligence about operational data behaviors can inform automated testing and monitoring to help prevent bad data from being used in decisioning.
Continuous intelligence will inform DataOps disciplines and data governance policies, enabling greater control over data and the pipelines it flows through.
Taking a data-driven journey without intelligence about data is like going for a drive having an idea of where you are going, without in-car dynamic navigation capabilities, and having to rely on periodic traffic updates from the local radio station. You may be able to get to your destination, but data availability and quality issues, data access delays, data drifting and shifting can cause slow-downs and dead-ends that require you to turn around. There is also an increased risk of driving outside the lines of data compliance, possibly causing damage to you and the organization. Stay safe, and informed in DataOps, with data intelligence.
Schedule a weekly demo today to get a deeper look at how your team can drive informed decision-making with Alation.