Michelle Hoiseth is the Chief Data Officer at Parexel, a $2.5B global provider of biopharmaceutical services, and has worked in the drug and device development industry for three decades. Today she uses data for the faster delivery of life-saving medicines at Parexel.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
Michelle Hoiseth (00:01): It is a myth that the average Joe doesn’t care about data. It’s not that they don’t care about data. They just care about their context and you have to speak to them in their context. It’s a myth that you need a burning platform to motivate people. I would argue that, instead, you need a better future.
Satyen Sangani (00:25): Those are the words of our guest today. Michelle Hoiseth, Michelle is the chief data officer and senior vice president at Parexel, a clinical research organization that works with pharmaceutical companies to develop new medicines. According to Michelle, it costs two and a half billion dollars, over two years, to bring a new medicine to market. And that cost is expected to grow. A few years ago, Parexel became a privately owned company, in part, to address this challenge head-on. This moment gave Michelle and her team a unique opportunity to invest in their technological infrastructure and modernize their data practices. More fundamentally, it gave them a chance to transform their culture, putting data at the heart of how they make decisions. It was necessary work and she’s not done yet.
(01:12): So today, on Data Radicals, Michelle is going to use her experiences at Parexel to do some myth busting. Creating an organization with a healthy data enablement was essential to Parexel’s bottom line. And she has so many learnings to share with us. When she encountered resistance, Michelle and her team didn’t insist it was their way or the highway. They didn’t stand on a desk and scream that the sky was falling. Instead, they listened to objections. They led with empathy and understanding. And Michelle can tell the story a lot better than I can. So let’s get to some myth busting.
Producer Read (01:53): Welcome to Data Radicals, a show about the people who use data to see things that nobody else can. This episode features an interview with Michelle Hoiseth, chief data officer at Parexel. In this episode, she and Satyen discuss the challenges of clinical research, creating a data culture , and how creating actionable results can create buy-in across an organization.
(02:15): Data Radicals is brought to you by the generous support of Alation. Alation gives enterprises the tools to make data-driven decisions and grow a data culture. Our data catalog can minimize the time workers spend searching for, and worrying about, the data they need to do their jobs, turning months of frustration into minutes of action. Visit alation.com. That’s Alation with an A, dot com, today.
Satyen Sangani (02:41): Michelle and I began our conversation by discussing Parexel’s biggest challenges.
Michelle Hoiseth (02:45): Right now, it takes about 12 years and about two and a half billion dollars to bring a medicine to patients. And that is anticipated, over the next 15 to 20 years, to be another five years longer and 20 billion. It’s an unsustainable trajectory. So, as an industry, we have to find a way to leverage the data that’s available to us on populations, on patients, and design better studies, perhaps conduct fewer studies because we’re using healthcare data, and do a better job.
Satyen Sangani (03:23): What is it that, in your industry, if you were to describe, from your management team’s or executive team’s perspective, what are the biggest challenges that Parexel faces, as a provider, and the industry faces, at large? Where are the opportunities and the risks, moving forward for your organization?
Michelle Hoiseth (03:45): We have to do a better job designing studies that fit within our patients’ lives, that really address their needs, that capture data around the impact of a treatment on them, that matters to them — not just matters to us or to regulators, as to whether that drug is associated with a certain treatment effect. But their sense of wellness from the therapy is also captured. So, that requires us to use data differently. It requires us to engage with patient communities differently. If I expand that to the industry, we’re all competing for the same patients. So protocols that really balance the needed scientific rigor, and regulatory rigor, with being very patient-centric, when those are much more accessible studies to patients. So that’s an issue we all face, and one that we’re all trying to solve.
(04:40) The other point that derives from that is that we have to be able to use more data and more analytics to do a better job. Then, the last part, there’s a lot going on in the world, right now, and therapies are developed worldwide. We don’t develop therapies just for the U.S. Protocols are always in eight, 10, 15 and more countries. And between COVID and geo-politics, and other things, we have to stay very nimble in today’s world, and do what’s best to get the study done well, with the integrity we need.
Satyen Sangani (05:22): How does data come into play in that environment? I mean, you would think that an institution like Parexel would be data native, day one. So tell us about, what did it look like, when you started at the company? And how have things evolved?
Michelle Hoiseth (05:38): So, remember that the data that you generate through the course of executing a study is associated with the product that you’re studying, So you’ll have the investigational product that you’re controlling for, and you’ll have placebo or some other control arm. Well, that data is all part of the intellectual property, so to speak. It belongs with that asset. It really is talking about the performance of that investigational asset in the target population, which means that that data was always owned by Johnson & Johnson or Gilead, whomever we were executing the research for. (06:17): So, as an industry, CROs, I would argue, grew up as always temporary stewards of somebody else’s data. And, as a consequence, we were a little bit slow, as an industry, to realize we were developing data in our management systems that actually had value and could better inform the design of clinical research, for example, for that next generation of compounds in that therapy. We didn’t invest in the way that we needed to, early enough, with respect to systems architecture, enterprise data models, controls. We were a little bit behind, I think, other industries. We, in Parexel, were actually really behind. We were slow to realize that. So, by the time we realized that we weren’t treating our data as an asset, we had a lot of ground that we had to make up.
Satyen Sangani (7:09): Yeah. And I would imagine that, in this world, every project is a snowflake, and every data asset is effectively owned by your end client, which is the pharma. This idea of being able to create learning and structure across experiments probably was really hard because, in a $2.5 billion project, people’s careers live and die probably during the evolution of that singular project. So this idea of crossing across is very limited, and there’s just probably not a lot of people who see the lifespan of multiple projects over time. Or, at least, you only see a few in your career.
Michelle Hoiseth (07:44): Each protocol is a snowflake, but the data does really kind of group into categories of data that have different utility. So, while the data associated with all of the patients that enrolled in that phase three diabetes study are part of the differentiation of that particular asset and go with that — all of the data that forms around the execution, how long it took you to enroll that study, what the eligibility criteria were, what countries enrolled well versus not — that builds over time. And that becomes the basis for operational modeling that helps you improve the next time you do another diabetes study. That’s the piece that we had to get together and master. It was, for us, all of our systems, anything we licensed in our past, we licensed for a particular business function.
(08:34): That business function had become the de facto administrator in the system and set up the data definitions, the data models within the system, set up the architecture. And, as we began to flow data from one system to the other, we would have the inevitable clashes and collisions, the inability to get to an oracle of truth around start dates, stop dates, you name it. That was the stuff that we’re actually still, even today, we’ve come a long way in a few years. But we’re still really wrestling aspects of that to the ground.
Satyen Sangani (09:10): I would imagine a huge source of competitive differentiation because, yes, it’s true that, given one therapy or another, you can’t have the implicit data or explicit data around how that therapy works. But being able to say, “Look, we can scalably, and economically, and repeatedly, get you to deliver trials that are successful and cost-optimal, in a world where this could cost you $2.5 billion, is incredible differentiation. What did that process look like for starting to collect that data internally? Did somebody come along and just say, “Hey, we should just be collecting all this data.” How did that process evolve?
Michelle Hoiseth (09:48): The data was forming in the management, the study management system, Satyen. It just wasn’t governed. So it wasn’t forming in a way that it was being prepared for additional use. So all the actions around that data would be taken on the basis of local reporting, for example. So the idea of seaming data from a study management system to data on our people resources, for example, we just were slow to anticipate that kind of future. We had to create enterprise data models. We had to create standards, definitions. I would argue that our master data management system was optional, not required. We had a lot we had to do to get the data to a common base where it could work together.
Satyen Sangani (10:42): Tell us a little bit about that emotional journey. Did everybody in the organization really understand that this change needed to happen? And what did that look like, in terms of socializing within the organization, that this evolution and maturity needed to take place?
Michelle Hoiseth (10:56): I use a cartoon, still, where the top frame of the cartoon is, “Who wants clean data?” And everybody in the crowd has their hand up. And then the next frame is, “Who wants to clean their data?” And everyone’s hand is down and they’re looking at their shoes.
And that was us. Basically, people wanted things to work, but they’re heads down in the business, delivering this research. They don’t really understand why the data isn’t accurate, can’t work together. But they don’t really care. “Fix it, go fix it! I’ve got to conduct this research. I have to make sure these patients are safe. I have to make sure it’s getting on time. I have to take care of these regulatory inspections. Go fix the data.”
(11:45): So, right out of the gate, we recognized that our biggest hurdle was going to be enrolling people from the beginning and helping educate them as we went, as to why their own actions were important and how it couldn’t be done around them. They had to be a part. So, there was initial talking about some of our high-profile advanced analytics initiatives, and how they depended on interoperable data that met a certain standard, and tracing that back to show them where that data was created and, therefore, why their own actions mattered, a lot of using examples of where things broke down, and why that was, and so, therefore how it would happen differently in a governed world.
Satyen Sangani (12:31): Michelle and her team also made sure they used language that encouraged buy-in across the organization.
Michelle Hoiseth (12:37): We were, very much, very careful about using language around data enablement, not data governance. Yes, there’s a certain amount of this, of course, that’s about control. But it’s control and service to the aims of the business. It’s not a gate control for the sake of pure defensive posture. As a business, we needed to be able to do more with our data. It’s been an educational journey and it had to be timed with the advanced analytics initiatives. Everybody was aware of what we were investing and why, and the potential they held. And, if we failed to bring them along on the data story with it, they were not going to succeed.
Satyen Sangani (17:29): This idea of organizational literacy, departmental literacy, and then obviously individual literacy: What do you think the characteristics and the hallmarks of each of those levels would be like? What does it mean for an organization to be literate relative to a department? And how do you think of those differences?
Michelle Hoiseth (13:35): It’s not an army, unfortunately. It’s more like a very small tactical team, which was built out as we went. There were a lot of competing priorities, as we started in the business, across the business, as we started. So we grew as we went. And I would tell you that the central team is still only about 15 people.
(14:00): Basically, we’ve got a hub-and-spoke kind of a model here. And the data cuts across a variety of domains. But the domains transact around nodes of data. The node being the patient. The node being a clinical site. The node being revenue for project management or project milestones. It’s a multi-stakeholder environment across all those nodes, and those systems. Remember, they all grow up independent of one another. They all have to work on a profit data model that becomes harmonized off of our common data store. So they all had to be at the table. If we were going to change the definition, or if we were going to … let’s not say change. Let’s say we were going to establish the first project start definition that was enterprise-wide, because that’s where we were, they all had to be determining the impact on whether it could change in those systems, or we had to do it in flight, in transformation, or whatever would need to happen.
(15:00): So, basically, the central team is surrounded by a group of senior leaders that are accountable for the successful use and enablement of our data across those domains. Then, they’re surrounded by a ring of technical data stewards and domain data stewards that are actually transacting, working with the data generators, et cetera, to bring the data to the enterprise-level profiles and requirements that need definitions.
(15:32): So that’s how we act across the business. Our team is also small because, while we’re focused on policy process prioritization, in the business, addressing the needs, adjudicating requirements, et cetera – when we started the journey, the first realization was that we really needed to have three levels of conversation across the business. People just needed to be aware of what it was, why we were doing it, why it was important, what was in it for them, why they needed to come on this journey with us. Then there was that next wave level of operational engagement, people in the business who, maybe, are not technically data people. But they have to deal with processing, implementation, and adoption, and compliance.
(16:22): And then, there was a very specific set of trainings, et cetera, onboarding, that really dealt with the technicians. We still, today, have to maintain those three levels of conversation so that we’re speaking into the listening ear, depending on who a person is in the organization.
We might’ve been temporary stewards of other people’s data for a long time in our past. But what do we deal with? So many of us. There’s 18,000 people worldwide here. Pretty much everybody is either a data generator or a data consumer. So one of the biggest challenges, the things that kind of kept me up at night in the beginning, was how do we address… how do we kind of enroll people so broadly across the world, so many roles, and begin to really affect change, in a very targeted way? How do we balance those things? And it was definitely the level of conversation that we were having.
Satyen Sangani(17:23): So day one, did you have… I mean, you basically highlighted two structures. The first was sort of a 101, 201, 301, kind of educational structure around the program, and what that it looked like. And I think it’s helpful to get into details for the listeners all around that. But, also, there was kind of this other organizational structure around domain ownership for the key data assets within the organization. Was that organization an operational structure in place, or did you have to put that in place?
Michelle Hoiseth (17:54): No, it wasn’t in place. When we started, people wouldn’t know where to go with that kind of request. It would come over as, a request into IT. But how it would get shuttled and serviced could be different from one to the next.
We did have data marts. We did have data warehouses. It wasn’t as though it was completely the Wild West. But they were managed in the business rules that even today are executed and where those things are still being used, were self-contained. They kind of grew up within that team, or in that system. You had asked me on day one, what did it look like. Well, day one just began this investigation of the current state, exactly what was going on. Did we have an MDM? It was like a closely held secret. I think if you weren’t in Corporate IT, you didn’t know at an MDM.
(18:50): So, trying to figure all that out, trying to take account for the data models that were in place, trying to understand how decisions had been made historically, in terms of flowing data from one system to the next, whether we could even see lineage. Just, affectionately, our systems architecture was a bit of a Frankenstein creation. And we had to spend a lot of time, manually, with people, doing interviews, just getting that base state defined and figuring out where to go from there.
Satyen Sangani (19:29): Yeah. Sort of like this kind of archeology-cross-anthropology around all of the data relics that existed in your organization. How long did that process take you?
Michelle Hoiseth (19:41): It was probably, I want to tell you, it was the better part of nine months. I mean, it was just sheer force for six months and then a lot of refinement from there. Then, we started moving into working with the domains to create the enterprise definition. So they had never created an enterprise definition. They didn’t even understand why it was important.
Satyen Sangani (20:06): Who else did you partner with in the organization to shepherd it through that hard time?
Michelle Hoiseth (20:12): One nice thing about coming from the Operations side of the business is that I’ve grown up in the leadership track. There’s been a lot of baptism by fire. Bonded over fire, I think, is the saying I’m looking for. But, I have a lot of peers who, we understand what needs to happen. We have accountabilities. We have equal responsibility. But we talk the same language.
So, the ability to address what is breaking down in the data, and why we need to change, in terms that they understood and, regardless of where they sat in the business, was really helpful, and helped enroll them and kind of get us more aligned.
(20:55): So, it’s different. It’s very different. Nobody wants to hear about your enterprise data model, but they do want to hear about why they can’t get this same headcount forecast from two different systems, and which one do they trust? You know, you just really have to speak to it in terms that matter, and just translate it back to a set of root cause changes, or root cause issues with some mechanistic changes.
Satyen Sangani (21:23): When did you get the first win? When did you sort of say, “Oh, there was like something here tangible that we could look at that everybody then said, ‘Hey, this is really adding value,'” or some people, at least.
Michelle Hoiseth (21:34): Yeah. It was really some of the launches of, well, one advanced analytics project, in particular, to consume data off the data lake and, and the results mattering, the dashboard mattering, that teams were using to understand the risks and progress, the status of their studies.
And it looked the same. If you were a member of the team conducting the study, the data you saw was the same data that our CEO saw for the first time, as it rolled out. So there wasn’t a hundred separate reports around that study which, on any given day, we’re doing 2,00, 2500 studies. Think about the cost to the business. Not only is there an issue with respect to reporting accuracy that we have to address, but think about all that separate manual reporting that was going on. And multiply that by 2,000, 2500.
Satyen Sangani (22:35): The opportunity is incredible. And do you look back on that experience and like, now, having had the wisdom of looking back to that moment in time, is there anything you would’ve done differently or any advice you would give to somebody who might be entering a situation that is similar to the one that you entered?
Michelle Hoiseth(22:50): So I don’t feel that I’m through it. So I still feel the anxiety that I felt then, although it shifted. The things that provoked the feeling are a little different. So, then, we didn’t have a notion of an enterprise data model. And we had to start from scratch. Now, it’s showing up in, sort of, more downstream problems, where we go to make a change in a system to a new data scheme. And you discover that, in IT, we don’t have a testing environment.
(23:25): So, now, we have the ability. We can see ahead. So we want to make this change. And now we’ve never been in a position to work that way. And we have now another fundamental thing that we’ve got to fix. But it’s a happy problem, I would say. But still, it provokes some angst still. I don’t know. If we looked at this on four or five levels of maturity, I still feel like we’re maybe kissing two, yet. We’ve got a ways to go. So, with that said, what do I wish I did differently? I wish I was more aggressive. I wish I’d pushed harder. I wish we went faster. I would like us to be in maturity a level of three, right now, the better part of two concerted yours in.
Satyen Sangani (24:10): I can imagine. And how would that have looked? Would that have been more analytical outcomes sooner? Would that have been driving standardization more aggressively? Would it have been more broadly publicizing of initiatives, or all of the above? I mean, how, how do you think about where you could have gone faster and where you would’ve wanted to go faster?
Michelle Hoiseth (24:29): I think I would’ve been more aggressive in two areas, and one would’ve been driving closer engagement with corporate IT, more directly. We did have a change in our CIO, part way through that this journey, which has made all the world of difference, has made all the world of difference. I feel like I have an unbelievable partner, in our CIO, to succeed together in what we want to do with our data. So, that was a little bit outside of my control. But what would’ve been inside my control is how our teams partnered beneath the old leadership. I would’ve been more aggressive there. I would’ve probably been more aggressive with respect to funding, and the rate at which we grew the team. We took a very pragmatic approach. I was very respectful of everything else that was going on here. I wonder, some days, whether I was a little too respectful.
Satyen Sangani (25:23): There are two other topics I’d love to get your perspective on. This first is… This is a podcast about data culture. So, is culture something that you talk about internally? And how do you think about the cultural evolution? And obviously that’s something that even your executive leadership team would be involved with. So how do you think about driving that internally?
Michelle Hoiseth (25:42): It’s culture within a culture, really because, as we’ve gone through these last few years, going private, et cetera, we have worked very hard to reestablish our Parexel culture and how we operate. And there’s really three major pillars to it. One is keeping the patient at the center of everything we do. The other is being easier to do business with, so more accessible, doing a better job, just ease in the way of working. And then the third is leveraging our expertise. Well, think about those three things. They lend themselves greatly to what we need to transform with respect to our data culture, and how people need to think about that. We cannot be easier to do business with, if we can’t better access and utilize our own data. We can’t leverage our expertise if we can’t find it.
(26:36): If I go back to the diabetes example, Satyen, for a pharma company that’s developing the next-generation antidiabetic treatment, they have one, maybe two, maybe three, compounds in development. Any single CRO is probably seeing twice, three times, that number. The amount of experience that we sit on, in terms of what worked in studies and what didn’t, is far greater. We have to be bringing that to bear, to do our part, to design the next studies better.
And then, last but not least, to be at a state of readiness, to be able to safely, and in a trusted way, deploy new methodologies so we are getting therapies to the patients that need them more affordably, faster, it requires a change in the way we treat our data as an asset. So, we do talk about it like that. We try to make the connections.
Satyen Sangani (27:37): To your values and to your operating principles, do you feel like, now in year three year of the journey, do you feel like your work is accelerating?
Michelle Hoiseth (27:46): Yes. All of a sudden, on both sides too, and with the real world data and data governance. It’s a hard case to build. We need to improve our data quality. Like what do you mean? We need to have data standards. What do you mean? Why? For a lot of people who are distant from the requirements, I would say another challenge that people in roles like ours face is that the first 12, maybe 15, in our instance, maybe 18, months of foundational work is all cost with almost very little value. Or the wins are small compared, to what you’re building. And then you hit this tipping point, where you can begin to open up access, and you begin to open up utility. And then you can really like, feel that momentum start to build and you’re not pushing people. They’re pulling you now.
So I feel like we’re at that point, in both areas, right now.
Satyen Sangani (28:49): Michelle’s experiences at Parexel can be a guiding light for anyone leading transformation at their organization. A lot of her experience was based on her willingness to challenge conventional wisdom and bust some myths about how to use data well.
Got any more data myths that you want us to bust? Post them on social media, with the hashtag #dataradicals. We can’t wait to hear what you have to say.
Thank you to Caroline for joining us on this episode of Data Radicals. This is Satyen Sangani, co-founder and CEO of Alation. Thank you for listening. Thank you to Michelle for joining us on this episode of Data Radicals. And thank you for listening.
Producer (29:21): This podcast is brought to you by Alation. Are you a CDO or aspiring leader in data? Learn how you can cultivate a data-driven organization in this white paper from [Gartner 00:29:31]. Get it at alation.com/gdc
Season 2 Episode 22
Guy Scriven, U.S. Technology Editor at The Economist, offers insights into the evolving landscape of AI adoption and implementation. He explains the cautious optimism surrounding AI applications — emphasizing the need for robust data governance — and shares his perspective on AI’s opportunities, challenges, and future trends.
Season 2 Episode 11
Generative AI is so new — and there are so many ways to leverage it and misuse it — that it can feel like you’ll need a separate AI to figure it all out. Fortunately, Frank Farrall, who leads data and AI alliances at Deloitte, is here to tell you about the decisions, variables, and risks that companies need to consider before they invest in AI.
Season 2 Episode 9
Ashish Thusoo has been on the leading edge of a data culture, whether it’s as a founder of a data lake startup, developing the Hive data warehouse at Facebook, or in his role as GM of AI/AML at Amazon Web Services. This discussion traces the evolution of data innovation, from big data to data science to generative AI.