By Talo Szem
Published on 2021ĺš´7ć8ćĽ
Got data? Odds are yes, and youâre lost in the trees of it! A central challenge for modern organizations is no longer gathering data â itâs navigating a veritable forest of it. Your average data analyst will spend three to six weeks just searching for an accurate, trustworthy dataset before they can get around to doing their job: analyzing, offering insights, and enabling others to see the data forest and the trees.
Data presents an opportunity. Data-driven businesses know this. They ask: How can we ensure weâre using data to grow smarter?
It takes a system. In his presentation, MIT CDOIQ: The Data Catalog as a Platform for Intelligence, Satyen Sangani, CEO of Alation, shares how that system inspired the structure of the modern data catalog. In this talk, Sangani details how academia offers the ideal model for growing data intelligence alongside human intelligence.
Long before Satyen Sangani co-founded Alation, the first data catalog, he was a software developer at Oracle, which today is the second largest software company in the world.
âAt Oracle I delivered and built analytical software,â Sangani recalls. âAnd through that process, Iâd often end up seeing how much of a mess of data there was within our companies.â But that crushing volume of data wasnât the real problem. âMore fundamentally,â says Sangani, â how people didnât really know how to use it.â
When faced with a mountain of data, where does an analyst even begin? This is where Sanganiâs background in academia came in handy (he holds his Bachelors in Economics from Columbia University and his Masters of Science in Economics for Development from the University of Oxford.) Academia boasts an established system for creating truth in the scientific method.
âI was trained in economics,â Sangani explains. âAnd you get this in academia, where there are lots of different papers, which become published, and offered to multiple different journals, and all of those papers cite previous articles from those journals, so you come to see this idea of truth as this evolutionary thing.â
Transparency is key. Academics use the database JSTOR (âJournal Databaseâ) to research and cite from a canon of articles, which help them build more complex arguments toward a more highly evolved truth.
This got Sangani thinking, âfrom a software perspective, borrowing this academic model, how do you get to this intelligence?â He soon came up with a series of key parameters for data intelligence.
You need the right information at the right time
Youâre able to link to evidence (and show your work)
You have open access to all the data (to the extent that itâs legal)
Youâre able to test multiple scenarios
You can access and add documented prose of the claim, the âtruthâ
The claims are traceable as they build on one another â both the datasets and the assumptions guiding them
These parameters are nothing new, as theyâve guided academicsâ process â and progress â for decades. But in the world of software, applying this thinking was a breakthrough.
Inspired by academia, Sangani modeled the Alation data management framework on the scientific method: This means analysts have insight into not only who is using what data, but how theyâre using it and the conclusions drawn. They have complete transparency into the supporting queries for a given report.
âHistorically, thereâs this idea, which many of us have heard about, which is called âthe single source of truth,ââ says Sangani. But his training in academia led him to realize that, when it comes to data management, the real need isnât laying bare a âsingle truth.â What analysts and data managers truly need â just like academics â is a single system of reference.
âIn this quest for intelligence, particularly in a world of physical information where evidence is distributed, what you really need is a single system of reference,â he elaborates. At Oracle, faced with that mountain of data, Sangani wondered: What if data analysts had a point of reference they could trust, just as academics have scholarly journals, the library, and JSTOR?
âTruth isnât singular,â Sangani points out. âItâs this thing that exists in a lot of different places, and it largely depends on your point of view.â But a common source of reference enables data users to draw distinct conclusions from the same evidence â which all can trust.
âYour truth exists in systems,â he explains. âSo there can be a truth according to the procurement department, which is different from the truth according to the finance department⌠and there may be a truth in the sales department thatâs totally different! And every one of these systems is on some level a mask for a perspective on truth.â
Truth has many faces â even in the world of data. This makes a system of reference, paired with a process for truth-seeking, absolutely essential.
For an organization to grow smarter, data alone is never enough. Progress needs a process. Why reinvent the wheel? The scientific method offers a brilliant action model for hypothesizing, testing, and incrementally proving new truths. Academia is the shared point of reference for these agreed-upon truths. Alationâs framework for data management, modeled on these systems, enables data users to collaborate toward a fluid, evolving truth in a changing world.
Loading...