Truth, Data, and FAIRness

Francesco Marzoni, Chief Data and Analytics Officer, Ingka Group

Francesco Marzoni

Francesco Marzoni has been the Chief Data & Analytics Officer of Ingka Group since May 2021. Prior to holding that role, he was Group Head of Analytics, Data & Integration at Nestlé. He has held several country, regional, and global roles in the business analytics space — from data analyst to division analytics head — at companies including Procter & Gamble and Bayer.

Francesco Marzoni

Chief Data and Analytics Officer

Ingka Group

Satyen Sangani

As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”

Satyen Sangani

CEO & Co-Founder

Alation

Satyen Sangani: (00:03) For today’s guest, an unlikely source inspired him to enter data science: Aristotle. One of Aristotle’s seminal works is Categories. This was one of the first attempts in human history to capture and organize all knowledge. The essence of Aristotle’s argument was that information is converted into knowledge by categorizing it, and further defined what these categories ought to be: things like substance, quantity, and quality, amongst others. So apparently, Aristotle was our first-ever head of data governance. But I digress. So, anyway, reading this work, my guest began to see the power of structured knowledge and, more importantly, the act of structuring knowledge.

Satyen Sangani: (00:47) Since then, he — and, really, all of us — have continued Aristotle’s project. We are still refining how we organize our data. The better we get at categorizing data, the better we can leverage it for success. The same idea is at the center of FAIR, which is a set of data management principles established in 2016 in an article in the journal Scientific Data. When a data set is FAIR, it is findable, accessible, interoperable, and reusable. When a data set is FAIR, it becomes something more than data. It becomes a competitive asset. Today’s guest, Francesco Marzoni, was the first person to introduce me to the FAIR framework. So today, Francesco and I discuss the FAIR framework and how you can set the stage for it within your own organization. Francesco is currently the chief data and analytics officer at Ingka Group, a holding company for Ikea, and prior to this position, Francesco worked for other leading global brands, such as Bayer, Procter & Gamble, and Nestlé.

Producer: (01:53) Welcome to Data Radicals, a show about the people who use data to see things that nobody else can. This episode features an interview with Francesco Marzoni, chief data and analytics officer of Ingka Group. In this episode, he and Satyen discuss how to implement FAIR principles at your organization, the dangers of data silos, and the importance of effective data governance. This podcast is brought to you by Alation. Our platform makes data easy to find, understand, use, and govern so analysts are confident they’re using the best data to build reports the C-suite can trust. The best part? Data governance is woven into the interface, so it becomes part of the way you work with data. Learn more about Alation at alation.com.

What is FAIR data?

Satyen Sangani: (02:38) Francesco and I began our conversation by breaking down the four components of FAIR. We started with findability.

Francesco Marzoni: (02:45) Findability is about making sure that you don’t have to pick up the phone and have an informal network of people within your organization if you want to get hold of a specific data set but, rather, you have a structured way to do that, right? As you will teach me — and you’ve definitely mastered more than me — metadata plays a big, big, big role in that space. Accessibility is about setting rules for data to be accessed and also default ways to access data without having to figure out every time, “How do I get access to this data set once I know that I need it?” So having predefined ways to access data but also a policy that somehow starts from data open to everyone in the organization, unless there is a reason — legal, ethical, or whatever good reason — to restrict access to data. The more data gets shared, whenever it’s possible, the better.

Francesco Marzoni: (03:40) Interoperable is about making sure that different domains or different parts of the organization use a common dictionary when it comes to data so data can travel across different corners of the organization. You can integrate them, and you can unlock more and more knowledge out of new data sites that you integrate. The reusability — reusable — is probably the one that is closest to my heart because it speaks to the fact that organizations cannot afford to think about data every time that they need it for a specific purpose. We do not know the value that we can get out of a dataset. We can know maybe a specific use case today about a given data set, — maybe two, three use cases — but the goal is really to make sure that we manage data in a way that it then become reusable over time with very little lead time because we don’t know yet how and in which context we will need to use certain data.

Francesco Marzoni: (04:37) I think that’s the biggest shift probably that the industry needs to still embrace, right? We come from several years in the data space, where companies and teams have built point-to-point solutions: I need to build an app. I need to build an application. I need this data for that application. So I go and reinvent from scratch every time the pipelines, the data pipelines, that I need for that application. Now, we need to shift and build data assets or actual data products that then get reusable over time as we need it. I think that’s probably been also one of the big lessons that I got during the pandemic, the importance of having reusability in the data assets that you can tap into and how much it can hurt you if you don’t have reusable data assets at hand in terms of prices.

Satyen Sangani: (05:27) Yeah, which is such an interesting deconstruction of this problem of, in this case, building open science, but on some level, that’s another way of saying building data culture, which is obviously what this podcast is all about. It gives people a framework to think about, “Okay, well, if I’m going to build this data culture and I want to enable people to use data, these are the four factors that I have to focus on.” How does one take that framework and follow this verification process? So I’ve got this findable, accessible, interoperable, reusable. Now, how do I make that more practical? What do I need to do as a chief data officer if I wanted to start down this journey?

Francesco Marzoni: (06:10) The first thing I would start from is identifying what are the core data assets that you want to start your journey from. Those data assets are typically the ones that are about the qualification of the knowledge of those entities that you need to master to run your business. So if you are, for example, just a random example, a home furnishing manufacturer, you may want, most likely, to have as one of your core data assets all the data that speak about your furnish range. Once you have identified those core data assets, it’s critical that you define end-to-end accountability around those assets. I think probably the first starting point that is the most undervalued in what I’ve seen in the past year is the importance of creating accountability around data.

Francesco Marzoni: (07:03) Most companies have always a clear accountability around systems. You make the name of an application rather than of a system of records. Very rarely in a company, you will not have an owner or an accountable person or team for that. Companies are very good at creating accountability for business outcomes, of course, so for P&Ls, et cetera. Accountability around data is something less natural, I have to say, and, for me, that’s always the starting point, so understanding what are the core data entities that we need to run the business and create accountability around that because then, once you create accountability and you know what are the teams accountable for the different data elements that you have in your set of priorities, then that’s where you can start discussing about, “Okay, this accountable team then has also as part of the measures the FAIR frameworks elements. How do you bring FAIR to life?”

Francesco Marzoni: (08:00) It is definitely, as we said, a metadata management play. So if you want FAIR data, you need to make sure that you have and you create an activity system in the company around metadata. The truth is there are still many companies that did run for several years without even, let’s say, focusing or realizing the importance of having metadata as one of the types of data that we need to manage to get value out of all of them.

Satyen Sangani: (08:30) Some of the FAIR terminology may sound unusual, but, really, FAIR is a set of best practices, and its two final steps will sound very familiar to data radicals.

Francesco Marzoni: (08:41) So, governance is a key element because, despite [that] we all believe that the more we decentralize data accountability, the easier it is to bring data to life in a relevant way, especially in a big organization. Still, in order to ensure interoperability of the different data that travel across the organization, a certain level of governance is critical and then figuring out how you shift and help the organization to go from talking data to talking specific data. Talking about data in general is probably, for me, a good sign of not a high level of data capture within an organization. A good proxy of saying where data capture stands in an organization is also about how people talk about data. Do people talk about data in general, or do they talk about customer data rather than my product data, rather than my consumer data, rather than my different types of data? So, again, metadata rather than master data. I think that’s a very important thing because if we don’t make that disclaimer about what data we’re talking about, it’s very difficult to land the first step toward the FAIR data journey.

Satyen Sangani: (09:56) That specificity matters a ton. It’s funny because as you describe this idea of authority around data, it reminds me of one of the former podcasts where we had Paola Saibene, who’s a data governance expert, talk about data governance. I believe she described it as the assignment of authority and accountability around data. What you’re describing feels a lot like data governance. Is FAIR just another form of data governance or a replacement for data governance or complementary to it — or how do you think about that?

Francesco Marzoni: (10:32) First of all, I think, of course, the concept of authority is important, especially in a big organization because, otherwise, without that, we might have chaos. Nevertheless, I have to say, my entry point is always around duties rather than rights when it comes to accountability around data. So, ultimately, authority needs to be linked to who has the duty to do what with which data and a bit less about who has the rights to do what with which data because, at the end of the day, the rights on data should be more defined by a policy rather than by specific individuals.

FAIR data and data governance

Francesco Marzoni: (11:04) But then, to your point regarding the link between FAIR data and the concept of data governance, I think the interlink is total. I would probably think about FAIR, though, as one, a framework to measure what you want to govern — so how you set goals and how you say, “Okay, I achieved that goal on this specific data set with my governance, so now it is findable, so now it is accessible, now it is interoperable.” It’s also about … you need governance to ultimately then establish all those base practices that are needed to deliver FAIR data. So, without the governance, most likely, you will not have master data management, you will not have metadata management — and without those you cannot bring simply the FAIR concept to life in a real way.

Satyen Sangani: (11:54) When I heard the original concept of FAIR, it was insightful for me because I remember back to a conversation I had with this guy, Andrew White, who’s a Gartner analyst. I don’t know if you know him. But very early in our journey, got on the phone, and we were just learning about what data governance was and were thinking about the space, and he said, “Satyen, you just have to remember that there are really only eight data governance policy types, and I’ve written about this, and you should just go read my article.” So, of course, I go and read the article, and all of these policies, as it turns out, are about very negative things. There’s a security policy type and an access policy type and a privacy policy type and a data quality policy type and a retention policy type.

Satyen Sangani: (12:40) It strikes me that FAIR has the ability to complement that framework by adding maybe four other or five other policy types, basically, around what does it mean for something to be findable, what does it mean for something to be accessible, how do we measure that, and how do we say, “Oh, if it is findable, then it has the following three elements of metadata,” and that could be a policy through which then a data steward could apply authority and review. So it does strike me as a way of extending data governance to be something that’s actually quite offensive and differentiating and empowering as opposed to simply being this thing that’s all about restricting people’s access to data and making sure that risk is at the forefront of everyone’s imagination.

Francesco Marzoni: (13:20) I love what you say, and I would add that, indeed, the four letters of FAIR allow you also easily to build stories around how each of those characteristics of data, from findability to reusability, can have a one-to-one link with how you can unlock a specific value for your organization if that condition is verified and how much, instead, the organization might suffer if that condition is not verified.

Satyen Sangani: (13:53) During the early days of the pandemic, businesses struggled to manage the supply chains that were in chaos. At the time, Francesco was at Nestlé confronting that chaos up close. He credits the FAIR framework for the company’s ability to react swiftly during the crisis.

Francesco Marzoni: (14:07) At a time in which we had to figure out a number of things that we had not thought about before, and in a moment in which we had to start to figure out how do we collect more and new data points now at once of this completely unforeseen situation, we started to look for partnership, for external partnership to understand how we could get hold of new data points regarding, for example, mobility data around physical stores or how the different governments’ policies were reflecting and were reflected in consumer behaviors. So, ultimately, we started to need a number of data sets that we did not have before to understand what we had to keep food on the shelf back then.

Francesco Marzoni: (14:52) There was a clear predictor of how a third-party partner would help us in the journey fast enough based on our needs, and it turned out that partners that were, let’s say, more advanced in their data journey. I’m thinking about a Google. I’m thinking about startups in charge of last mile delivery in the food tech ecosystem. With those companies, once we identified what were data that would’ve been valuable to exchange — where it was allowed, of course, and in a fully analyzed way, of course — in order to address a real challenge for the communities where we were operating for with those companies, we went from idea to execution within hours.

Francesco Marzoni: (15:40) So, “Hey, we could leverage your data to do something good. Tomorrow, we can start doing it.” Because there was already a framework for those companies to share data because data were accessible, data were findable in their organizations. There were many other opportunities that we had to leave on the table, simply because, yes, there were a good intention and a good idea about how two companies could come together, share data to understand better what was going on, but then the partnership did not materialize because there was still a fully manual activity system around data. FAIRness was not in place, so it was really pointless to invest because, between idea and execution, we would’ve needed projects of months to either integrate data or to retrieve data.

Satyen Sangani: (16:26) Yeah, because the reactivity of the organization was born out of your ability to have FAIR data in place and your partner’s ability to have FAIR data in place. For those that didn’t, the ability to build and prospect around these new business models, just to ensure freshness of food on the shelf, essentially went away.

This dovetails into another topic that you and I have talked a little bit about, which is that often when you free data or open the data up, you’re able to discover new opportunities. You talk about the relationship between data silos and network effects, and one can intuit why those things might be related. But tell us more and tell us about why you see these relationships existing and how you see value being created.

Francesco Marzoni: (17:17) The way most companies are organized and make decisions and plan their budget, their business, their activities is around the concept of function. So there is a little bit, I would say, of a natural inclination for a company to operate within the silos created by functions despite that the world doesn’t work based on functions. The world works based on processes, processes that are end-to-end and that operate across functional areas. So the importance of removing data silos is ultimately about reflecting how the world actually operates, how the different actors of a given organization care about outcome and not about the task that each and every individual is in charge of.

Francesco Marzoni: (18:08) That’s why I really believe in data being ultimately the glue of an organization, because the more you become mature in your data journey, the more it means that you have taken a process lens — an outcome-based lens — on your way to drive the business. I think it’s super powerful to then start from big small examples, but you say, “Okay, if we have three different functions in the company, a finance function, a planning function, and a commercial function, and they all look at data in a very siloed way, we will end up having three different ways to make forecasts around how much profit we will make, how much product we will sell to a given customer, or how many products we need to foresee that we will sell so that then we produce them accordingly.”

Francesco Marzoni: (19:01) This is a very typical example where you realize that if you take a functional siloed approach to data, data will give you an output, but it will be very controversial and very, let’s say, counterproductive for your organization because if you operate with three forecasts in your company, chances are that your business operations will get stuck at a certain point. So this is one of many examples where the mode we take instead [is a] process-based lens on data, which means in this specific example that I make around forecasting, we talk about what are my data sets that inform my overall 360-degree forecast, where I do have consistency of forecast across the three different functional areas? That’s where you see the power of having data as a strategic asset, per se, and how much that strategic asset would lose value if instead you stick to your traditional functional silos in the company.

The power of reusable data

Satyen Sangani: (20:01) There’s an interesting interplay here. We had another guest, General Stanley McChrystal, who wrote a book around “the team of teams,” and he also talked about how organizations are optimized for efficiency and functional competency and expertise. That means that while, on one hand, you have a lot of efficiency, on the other, you don’t have a lot of resilience and you don’t create opportunity for knowledge-sharing for new processes. So you are in good company, as it were, in making this observation. But how does that tie into these network effects? So I can understand this idea that says, “Hey, if you break down the silos, you make the data more FAIR, as it were, you’re unlocking process efficiency, you’re unlocking process intelligence, as it were,” where do these network effects come in, and how does that result?

Francesco Marzoni: (20:52) I think the core stands on the point of nailing down the power of reusability of data. So the opposite of FAIR is every time we need data, we figure out where they are, how to retrieve them, how to play with them for our specific single-purpose mission, which might be building a specific application or running a specific analysis. So the network effect comes into play when we have a specific application of data that is in charge of building the first brick of my data asset. Then we have a second use-case scenario that comes into play, can reuse the first brick that I built, but then will contribute to my company’s assets by building a second brick and then make it further available for further use, to the point in which, if we do enforce FAIR data as part of how we drive the work on our data activity system, and if we enforce incentives for reusability, then the organization comes to a point in which it’s going to be minimal: the lead time between when I need data and when I can activate them.

Francesco Marzoni: (22:11) This lead time becomes as small as possible over time if we do have a critical mass of actors that go and build those bricks of data assets in a reusable fashion. So I think, really, for me, the network effect comes into play when we think about the link between how many people in the company are already part of an activity system that builds data — a reusable data asset for a given purpose — and make that data asset available for future use, and how much time each and every player then over time will need to get access to data and build value with it. So I think it’s really about this correlation between volume of practitioners and lead-time reduction. That’s why, by the way, it’s critical that still we do have data governance as foundation layer because, without that data governance, the first step of this network effect is never triggered — while once we trigger it, then over time it will self-sustain itself because everyone in the organization will see the benefits.

Satyen Sangani: (23:21) So conceptually, then, you start by making data more FAIR. This programmatically unlocks these silos. The unlocked silos on some level give data this ubiquitous availability so that people can either improve or implement new processes. Those can be major significant process improvements or in themselves new big processes, which can then transform the business, which makes sense. I mean, it’s funny because you talk about network effects. It seems a lot like a virus. We’ve all taken a bachelor’s course in epidemiology over the last two years, and on some level that seems like something that we’ve tried to fight. But on the other hand, what it seems like you’re saying is, “Look, we want to make data as transmissible as possible, and maybe some of the silos that we unlock may have transmissibility rate of less than one, but many could have multiplicative transmissibility rates,” or something like that. Am I bastardizing your network effect premise or am I…

Francesco Marzoni: (24:30) Well, I guess we take the risk of trying to build a positive story out of the viral story, but no, I’m just smiling about that because…. No, I really think about that as the moment in which you need to get access to data — to a specific set of data, to do something that produces value — I think it’s ultimately about narrowing down your options to two things. Either those data are already available somewhere because someone made them FAIR and they’re reusable, so you just go and use them, or your other option is you build it for the first time on your own within the organization, you make them FAIR and you make them available for the first time, so that you achieve your goal of using them. But then you have this duty of giving back to the system, and you give back to the system by exposing those data in a way that then the rest of your organization and of the community can use them. So I guess if we take away from the virus effect — the fact that there is also a good way to give back — I think that’s a perfect, spot-on analogy.

Satyen Sangani: (25:46) Yeah. You’re trying to infect people with “data culture virus.” All right. So —

Francesco Marzoni: (25:54) Now, it’s getting tricky, though, because talking about data capture and spreading data capture like a virus, that might become more difficult to position it in a very positive way.

The role of training in data culture

Satyen Sangani: (26:06) Yeah. My marketing needs some work. So you bring up data culture. You’re obviously a practiced CDO who’s thought about and implemented these topics. One of the things that historically I’ve talked to and listened and heard from lots of data governance professionals is there’s an old trope that says data governance is about people, process, and technology, and it’s in that order.

One of the things that I’ve heard you say is actually quite in contrast to this, and I’m not sure if this is actually oppositional or if it’s just you’re talking about two different things. But you said data as a product and technology must happen first in the context of data culture. Tell us, how do you start? Do you in your work start with the data and the technology, or do you start with the people? How do you think about developing these programs, and what comes first?

Francesco Marzoni: (27:00) Well, I would definitely say it is the people first. I would never start from a technology play. Data is a people first play for two reasons: one, because data is not a technical matter. Data is a business matter if I think about now organizations that have run businesses, right? So that’s the first. The second one is succeeding in your data journey requires you to make data anybody’s, everybody’s business. So when I think about the steps that we take with our organization around building a data culture, around developing data literacy, around explaining to people what’s unique for them if we continue to invest in our data journey, it really starts from saying we need to aim for data to become one of the tools at disposal of everyone in the company.

Francesco Marzoni: (27:58) I think the biggest mistake that we can do as data professionals is to think that you can drive — let’s say data-driven transformation of a business — by injecting specific and a limited group of experts, of data experts and data practitioners, and saying, “Now, this limited group of people will go and turn this company into a data-driven company that can drive value with data.” I think the starting point needs to be the other way around. You need to first make sure that you reach out to all the employees or all the members of your organization, explaining what’s needed for them, which means then, for me, the data capture efforts start from including with a future lens, with a lens on the future, starts also from identifying what are the skills that now we want to start injecting in each and every job description across the organization.

Francesco Marzoni: (28:55) So what does it mean to be today a marketing manager and tomorrow data-literate marketing manager? What does it mean today to be a sales assistant and be tomorrow a data-literate sales assistant? Then develop models around training, upskilling, also development criteria or the recruiting criteria based on also that data lens that you then apply to each and every role in the company.

The biggest challenge in data: translation

Satyen Sangani: (29:23) As we look forward and close out the conversation, I’d love to ask you about your challenges today. You’ve been in a variety of different roles. You’ve seen best-in-class implementations. Obviously, you’ve also seen your fair share of challenging journeys and failures. What are the big problems that you’d like to see solved today, and where do you think the biggest opportunities exist now with data over the next two to five years?

Francesco Marzoni: (29:54) The biggest challenge is to reduce losses in translations when we talk about what we can do with data because there are still two worlds that need to come together every time that we try to do something big with data — and they’re the worlds of technical practitioners of the data that think about data from a technology system implementation perspective and then the world of business leaders or of, let’s say, knowledge professionals that think about data from a data content standpoint. We do not have yet a silver bullet or a perfect recipe on how to have these two worlds within every organization to talk to each other and understand that data is a common denominator across.

Francesco Marzoni: (30:40) So I still see that across many companies we still have this dichotomy, this famous dichotomy of business versus IT, which, when you think about a data journey and how you make data only present across the organization, is a big barrier because as long as a company still thinks in terms of IT organizations and business organizations — tech team and business team — there is somehow a normal divide that gets on the way of creating value end-to-end with data. When I think about the top opportunities that we have for the industry, it’s really about getting to a state where by having a company proficient on how to work with data, which means as well having companies proficient on how to make data a currency through which they build partnership with each other, we lend to a new business landscape or a new, I would say, societal landscape where different actors can join forces around specific problems in a much more seamless way.

Francesco Marzoni: (31:48) Because if data then becomes the new currency of how partnerships are made across companies, if you can unlock that currency in another way — like today we unlock, for example, financial currencies, right? If I want to partner with you and have a transaction with you, we do that instantly once we have decided why we want to do it and that it’s the right thing to do it. For me, the opportunity is to get to a place where we have a critical mass of companies and actors in society where this is possible also with data, not only with financial assets, because then it’s where we become a way more serious society because we can get together and organize around specific challenges and then tackle them more over time.

Satyen Sangani: (32:29) Yeah. That’s pretty aspirational, and I couldn’t agree more. I think the ability to be able to take pragmatic solutions to problems centers around scientific thinking and thinking with data. So I couldn’t agree more. Francesco, this has been awesome as I expected and forecasted and predicted. So thank you for taking the time, and it was wonderful to have you on the show.

Francesco Marzoni: (32:55) Big thanks, Satyen. It was a great pleasure.

Satyen Sangani: (33:02) FAIR data is powerful, and it raises a question. What if all data was findable, accessible, interoperable, and reusable? Think of the combinations you could create, the connections you could draw, and the insights you could glean. You’d be able to move fast.

Consider Francesco’s story about working at Nestlé during the pandemic. Time was of the essence. There were many teams they could partner with to address their supply chain problems. But, ultimately, they chose to work with companies who used FAIR data. The reason was simple. FAIR data was the fastest, most efficient way to establish and share data across organizations. So it doesn’t matter if you’re a startup or an enterprise. FAIR data empowers your team to glean new insights from data. I’d say that’s pretty radical. This is Satyen Sangani, CEO and co-founder of Alation. Thank you, Francesco, for joining us, and thank you for listening.

Producer: (33:59) Data mature organizations can effectively find and leverage high volumes of data. They're also more likely to acquire and retain customers, while also outperforming peers.

Other episodes you might like

Season 2 Episode 8

The Bazaar in the Cathedral

How can a software engineer create the next big thing? According to Matei Zaharia, creator of Apache Spark and co-founder of Databricks, it demands a single architect to build the cathedral – and an open bazaar to empower the masses. In this conversation, Matei shares his startup philosophy and reveals exciting advancements with Databricks Unity Catalog and Dolly 2.0, an LLM for enterprise.

Watch now

Season 1 Episode 25

Turning Data Librarians Into Supercomputers

During the eras of papyrus, parchment, and paper — as well as our current "paperless" age — librarians have been among the gatekeepers of information. In this episode, Alation's senior director of learning and communities, Deb Seys, brings a librarian's perspective to how data (and cataloging it) can tell stories and deliver unbiased metrics.

Watch now

Season 1 Episode 7

Making Big Data an Asset in Medicine

Michelle Hoiseth is the Chief Data Officer at Parexel, a $2.5B global provider of biopharmaceutical services. In this episode, Michelle uncovers the top data problems in the pharmaceutical industry, and shares her advice for promoting a data culture at scale.

Watch now