Ari Kaplan is head of evangelism at Databricks and a leading influencer in AI, databases, and analytics. He is most known for innovating sports analytics by creating the Cubs and Dodgers analytics departments, which inspired the book and movie, Moneyball.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
Producer 1: (00:01) Hello and welcome to Data Radicals. On today's episode, Satyen sits down with Ari Kaplan, Head Evangelist at Databricks. Ari is a leading influencer in AI, data, and analytics, and has established analytics functions for several major league baseball teams. On the side, Ari serves as president of the Independent Investigation and to the fate of Raoul Wallenberg, Sweden's humanitarian hero, where he leverages data analytics to find missing POWs. In this episode, Satyen and Ari discuss data analytics and sports, how data intelligence is shifting the landscape, and the concept of generation AI.
Producer 2: (00:37) This podcast is brought to you by Alation. Successful companies make data-driven decisions at the right time, quickly, by combining the brilliance of their people with the power of their data. See why thousands of business and data leaders embrace Alation at Alation.com. That's A-L-A-T-I-O-N.com.
Satyen Sangani: (01:02) Today on Data Radicals, I'm speaking with Ari Kaplan, head of evangelism at Databricks. He's also known as "The Real Moneyball guy," in which his analytical and scouting experience partly inspired the 2011 movie starring Brad Pitt. Ari created the analytics departments for the Chicago Cubs, the Los Angeles Dodgers, Boo, and the Baltimore Orioles. He's also served as the president of the Worldwide Oracle Users Group, and has traveled much of the world with McLaren Formula 1. Ari, welcome to Data Radicals.
Ari Kaplan: (01:30) Satyen, thank you so much. I love being here.
Satyen Sangani: (01:33) Yeah, I had to boo the Dodgers because I'm not a big baseball fan, but as a lifetime San Francisco Giants follower, it's a little tough to see their recent signings and acquisitions. Switching gears a little bit, you are best known as "The Moneyball guy." And obviously, there's so much in that movie around sports and analytics and performance measurement that revolutionized not only the sporting industry, but others. Tell us a little bit about that nickname, where it came from, and how you participated in the advent of analytics and sports.
Ari Kaplan: (02:02) Great. Well, that nickname comes from having been one of the early five people working with a sports team in what we would now consider to be data analytics. And so from those five people, now there's tens of thousands of people working directly at teams, vendors, gaming, betting companies. It's actually over a billion dollars for sports analytics, specifically in the US alone. And that moneyball storyline, that was based partly on a composite, including my own experiences from a technical standpoint. How do teams implement data, engineering, insights, and really helping shift that culture of the whole industry and Moneyball specifically was the movie and the book on that shift from gut-driven to more of a data-driven decision.
Ari Kaplan: (02:47) And I think what resonates well is that's really applied to every single industry out there. Every person, every business, every company is being affected by the data and analytics revolutions.
Satyen Sangani: (03:00) But at the time, I've got to imagine that when you started it, five people thinking about being able to empirically measure something that feels like talent in sports feels like something that people... There wasn't maybe a growth mindset around it. And so I would imagine at the time it was quite revolutionary. Tell us a little bit about the germ and how it evolved.
Ari Kaplan: (03:19) I'll elicit a yay and a boo from you talking about the Giants and the Dodgers, but it really started when I was an undergraduate at Caltech, the California Institute of Technology, and just saw that as a fan, some of the players that I thought had good statistics sometimes would be poor performers in my eyes and vice versa. And a lot of people felt the same way, but in addition to complaining about it, I was able to come up with succinct measurements and metrics to better evaluate these players by taking the luck as much as I can out of it and really attributing the skill for the forecasting techniques. And that caught the attention of a lot of media, the Today Show, LA Times.
Ari Kaplan: (04:03) And at the time, the general manager of the Dodgers, Fred Claire, just heard of me, saw me in the LA Times, and picked up the phone, called my dorm room, and made a quick introduction saying he thinks that the better ways to evaluate players might potentially help them. And could I come to Dodger Stadium the coming weeks? And I'm like, I will be there this afternoon before you change your mind. And I drove up there within the hour, and he was very gracious and went from being just a regular fan to sitting in the dugout with Eddie Murray, Orel Hershiser, Kirk Gibson, and many others.
Ari Kaplan: (04:39) That was really the main entry point, but once I got in, I had to keep redefining everything that I would do every couple of years. At that point, he was one of the forward-thinking general managers that was humble enough to realize there are potentially better ways to do things and quickly found it was helpful to help recruit better and better players. And I did want to get a yay from you, so along the journey, Hunter Pence, was one of the players, not just from an analytical standpoint, but I also have had the honor to be a Major League scout to look at him as a person that could adapt, was willing to listen to constructive feedback and change his approach, which is so needed as a philosophy in life, but especially in Major League baseball. And for the listeners who don't know him, he ended up being the captain of the San Francisco Giants, but I helped make the call with the Houston Astros to recommend he get calls up to the majors and get his debut, and a lot of people did not believe in that same way, did not think he had Major League potential.
Satyen Sangani: (05:39) What did you see in him that gave you the clue that that was somebody that you thought was worth investing in?
Ari Kaplan: (05:47) Yeah, so there are a couple aspects. One is, do you have the physical ability? Can you consistently hit a Major League baseball? And then the mental ability to make that adjustment, and the physical ability, fairly easy to measure. There are ways even back then that you could look at, for example, the challenges he was in the minor leagues, so you don't face the same level of competition of the major leagues.
Ari Kaplan: (06:09) Players aren't quite as skilled in your opposition, so I was able to filter out just Major League quality opponents. How did he fare against them? So would he be able to hit a Major League fastball curveball? So that was, simple as it sounds, not easy to do back then when you didn't have the data readily available. And then from the non-standpoint, there's a big question. If you haven't seen him, he's kind of like a tall, skinny, they call it lanky player. He didn't appear to have the strength, so a lot of professional scouts did not think he had physically what it takes. But the great thing in life, as in sports, is it's not just about strength, which he does have, it's about agility, it's about recognition, about being able to predict where a pitch will be. So we call that athleticism, and I saw that in him as well. But really what won me over is I did do a data science analysis of each subsequent time he saw the same pitcher in the same game or throughout the season.
Ari Kaplan: (07:10) Did he get better or did his opponent get better? And how did that compare to his peers? And it turned out he was in the top just 5% of AAA players that would adjust as a batter against the same pitcher. So that was my recommendations, he was willing to listen, he's willing to adjust, and he did have inherent athletic ability.
Satyen Sangani: (07:27) Yeah, super interesting. He's such a quirky character too, right? He's wide-eyed and grew the beard before that was very fashionable and was somewhat of a character. Rewinding a little bit though, back to the time that you were at Caltech, at the time, like today, if you have a good idea, you can put it up on the internet and somebody might pick it up or more ridiculously, something like TikTok or Twitter. But at the time, there was nothing of the sort. How did they pick up on your idea in the first place? How did they even know that you were doing this work?
Ari Kaplan: (08:00) Yeah, so I did the work as part of my Caltech, it was called a SURF, a Summer Undergraduate Research Fellowship. So I had to go around and try to get funding from Caltech. And then when I did the project, I pitched it to the president of Caltech, and he liked it. He had me come out to the board of trustees of Caltech, and then their PR machine, like really liked it. They were like oh wow, this person, the students onto something potentially big, he could change a billion dollar industry. So it was really Caltech PR that started it. And then it just caught on its own. And I had television stations and print, radio, everything, kind of wanted to hear the storyline.
Satyen Sangani: (08:43) It must have been an insane ride as a 20-something, maybe 19-year-old to go on. And it's particularly sitting the dugout with these historic greats. So you did that work, and that obviously carried your career forward for a reasonably long period of time. As you reflect on that, there's so many data scientists and people today who, like obviously, have an insight, but tend to be maybe a little bit more introverted. What advice would you give them in terms of your own experience about how to evangelize their ideas and get out there?
Ari Kaplan: (09:13) Yeah. Well, if you're an introvert, you likely also have an inward passion for what you're doing, whether it's writing code, whether it's finding data science ideas, whether it's the new generative AI, whatever that passion is. If you have it internally, people want to hear that. They want to learn from you. You oftentimes have things to offer other people. Just getting over and getting out there and talking, whether it's your family, your inner circle, and then you try speaking at a local meetup or try just recording your own video and putting it out there somewhere on social media. And once you get that and you see there's a positive reaction, then hopefully you're the personality that you catch that spirit and keep it going. And if you already are kind of an evangelist, the recommendation is just building your own brand and constantly be a learner. I personally try to reinvent myself every couple of years and it's something everyone can and should be doing.
Satyen Sangani: (10:11) Yeah. Hard to do. And obviously commendable that you've done it. You now are at Databricks and you are responsible for evangelism. And in that context, one of the stories that you talk about very relevant is the Texas Rangers. They just so happened to win the World Series. What did they do uniquely different that helped them win the World Series? And how much of it do you think was due to the analysis of data and relative to everything else that they did to obviously build the organization and the culture and the staff to win the title?
Ari Kaplan: (10:45) Yeah. So first with my role at Databricks, it's a big honor. People don't know we're known as the data and AI company. We've grown to over one and a half billion revenue and private companies still, but the latest round was 43 billion, making us one of the, I think, fourth largest private tech companies in the world, creating what's called the lake house architecture, which is a paradigm where you can get structured and unstructured data. And this will lead into what the Rangers and other companies are doing. But now almost 75% of major companies have adapted this lake house architecture.
Ari Kaplan: (11:20) So I'm personally excited in what I believe will be the next major market, which is the data intelligence platform, how to get intelligent and AI driven insights on your company's data assets. So Texas Rangers, they are one of our more visible customers. We have customers, every single industry, 10,000 plus finance, healthcare, and so on, but they were one of my favorites. I've been on the Databricks team as part of that, and they did win the World Series for the first time. I love seeing that. It was great for their fans.
Ari Kaplan: (11:51) And by the way, if you Google it, they have a lot of videos and blogs and case studies, how they did it. But it was really a journey of enabling their own staff to take data that they collected, data that Major League Baseball collects and shares on biomechanics and this technology called Hawk-Eye, and synthesizing it and then having a culture in place, whether it's their general manager who comes more from the baseball background or what some of their more analytically driven business people and strategists to make these recommendations.
Ari Kaplan: (12:28) So it's really like a collaboration and a partnership. And with Databricks, think of us as like the underlying plumbing. We enable them to ingest, transform, do data workloads, create predictive analytics. And the other nice thing is the democratization story. So before Databricks, only a subset of their staff was able to access all of their data. And now even non-technical people have access to it. So really, all the credit, I think, would go to their analyst R&D team, the people who develop those insights off of the data to help make it happen.
Ari Kaplan: (13:12) And Databricks was like the platform that made it lower cost, higher performance, and like a collaborative environment. So all different personas can work.
Satyen Sangani: (13:17) And so you've seen a lot of these baseball teams and obviously professional sports organizations. What was it about them that was unique? Did they invest in more analysts? Did they apply data to coaching as well as player selection? Are there a few things that they did that were different? Or did they just do it more relative to their baseball peers? Or how much did the analytics actually matter in the context of this story?
Ari Kaplan: (13:42) Yeah. So I should say every single team and in every sport is building and growing out their data and data analytics department. Some have two, some have five, some have 10, 20, 40 or more. But I think what was the winning combination for them was having this culture of natural curiosity, this culture where you could ask questions, you could challenge people. You're not just doing work, but you could come up with your own proposals for doing your own work. So I think it's really like that data-driven culture that helps set them up for all the success.
Satyen Sangani: (14:27) Yeah. That makes a lot of sense. And are there patterns that you've seen have worked well for certain teams relative to others or the ones that sort of use it the best?
Ari Kaplan: (14:31) Yeah. And going across all different sports. Yeah, it's an interesting challenge. So now every team is asking the same types of questions, getting generally the same source of data, especially at the Major League level or the professional level. So it really comes down to the talent evaluators, people who could synthesize these insights, merge it with what they see of players as a human. So when you're talking on the baseball and strategy side, people who are the players who are willing to learn, and that's getting more and more prevalent, and managers, the people who are implementing it on a game by game basis. And that has great analogies to all businesses. So if you're a non-sport business and you have that culture to try to improve yourself and innovate your business, whether it's at the data level or supply chain level, these are all great lessons learned from all of this.
Ari Kaplan: (15:23) That's one of the key cores to it. Then from within there, you have public data, but then you also have some of your own proprietary data. And then there's this concept called feature engineering, which is taking data you already have and making new combinations, almost like a business logic expert system. And companies that do that often have better predictions.
Ari Kaplan: (15:49) So in the financial world, one great example is you have price as a column, you have earning as another column, and it took somebody like a little creativity to say, what if we do price per earning? In many cases, that's even a better predictor than either of those data separately. Similarly, in baseball, you can create new combinations of information to get situational data or data that's better in context. So when you run these predictions, like who should we draft? What is the monetary value of a player? They align better with the complexities of real life.
Satyen Sangani: (16:29) In baseball, and I think in lots of sports, I would imagine Gen 1, there's some progression. And on some level, if you take calculus as an analogy, there's the first order effects, and then there's second order effects, and third derivatives, and fourth. I would imagine that now the competition is, everybody has the same data on the talent side for the newest draft prospect that may be coming up. And then you're starting to watch second order things like how do they learn over time? Where do you see this going? What's the next evolution of where people are gonna start being able to differentiate in prediction so that they're able to get to more high fidelity, higher quality differentiation relative to their peers? Because if everybody's doing it, then the question is, now what do you have to do to actually stay ahead?
Ari Kaplan: (17:08) Absolutely great and fundamental question. Throughout history, I started with Oracle back, think database structure data. At one point, I was the president of the worldwide Oracle User Group when we acquired Java and MySQL and others. So that was like one evolution. Kind of got you to a scale of hundreds of millions, if not billions of records. And then the next evolution in these insights was unstructured data. How can you take in and synthesize video or spatial data or even real-time streaming? So like in the world of sports, that's biomechanical. How is the sequence of pitching or in the seam orientation affecting the physics of a ball moving? Which sounds pedantic, but it's crucial to have that movement deceive the batter. It's really that's what comes down to a lot of winning the game is the deception and how you can vary that movement. So by being able to incorporate unstructured data into your same predictive models and strategy is where we've been like the last decade.
Ari Kaplan: (18:12) And now, the next thing is data intelligence and what data intelligence is, is you're making more intelligent insights, could be generative AI, for example, building LLMs. So if you're the baseball team and you have 20 scouting reports on a player, it's kind of tough. I've written them, I've read them. You have some, this player is great, but on the other hand he has lacking this. You just want to know what is the summary sentiment of 20 scouts on a scale of one to 10.
Ari Kaplan: (18:43) So this is like where the next thing is. If you're a manufacturer of an airline and you want to know where in your supply chain are the mechanics breaking down, you want to be able to use this data intelligence to help just say of all of the data that we have, answer this question, where are our supply chain bottlenecks? Or you're using the real time streaming, you could say which customers are most likely to churn and why and how.
Ari Kaplan: (19:09) So that's part of data intelligence is kind of the question and answer standpoint, but then also data intelligence is being able to just understand what are your assets telling you.
Satyen Sangani: (19:20) I come at the world from an economics background and you think of this kind of domain of industrial organization and you think of how firms grow and in many cases you can horizontally integrate or you can vertically integrate and in a vertical integration strategy you're trying to pull everything together and instead of amassing all of the compute across all of the world, which would be like a very horizontal strategy, like we're doing every workload that we possibly can. You guys are trying to taking like a very vertical orientation to the strategy. And then there's this question of like, okay, well, how do we then partner with these other people who are along that spectrum of vertical swim lanes?
Satyen Sangani: (19:58) What is the advice that you give to your sales reps, your customers about when to use the vertical platform that you provide, the straight through data intelligence platform versus other things, whether it's like in Alation or a DataRobot or anything else. How do you navigate that complexity of tools or that complexity of use cases?
Ari Kaplan: (20:15) Sure. Well, the good news is, it's a easy answer since there are so many partners out there that are better together story, companies that are experts in a certain industry or experts in a certain technology or in a certain capability. And you want this whole ecosystem where is that better together story. One thing I was super excited coming into Databricks is I've been at other companies where you have partners on the website, but in the end it's just a logo and nothing happens. But here I'm like blown away by the better to get the other story truly works. We get so much better value added to the customers when it's partners, not just system integrators or ISVs, but people who offer these capabilities like Alation.
Ari Kaplan: (21:06) And we have dozens and dozens of different examples and every way from pre-data to data to the AI aspect to visualization and semantic layers and security and all of that. But really it's like an ecosystem. That phrase that takes a village to get what you want to get value out of all of these great data assets. Partners are a wonderful part of that ecosystem.
Satyen Sangani: (21:33) Yeah. Often you would like for there to be black and white differentiation between a lot of what happens in a lot of these tools. And to your point, there is quite a bit of overlap. Often what I say to people even 'cause we also like our building an ecosystem and have things where we have over bits overlap with partners that we've got. And often what we say is like, oh, well, there's like good, better, best and we are gonna provide sort of good enough experience in some cases for you to get your work done, but often you might need better or you might need best. And in those cases that's when you wanna fragment off and use something else that maybe is more advanced than what we've provided.
Satyen Sangani: (22:06) And what we're trying to do is really just, in our particular case, just maximize the value that comes out of our platform for any given customer or user. And it sounds like that's kind of similar to the strategy that you are evangelizing and maybe that Databricks is taking. Is that a fair summary or did I bastardize it in any way?
Ari Kaplan: (22:22) No, no, that's great. And just me personally, about a third of my job is helping both enable our partners, but help them, like for each individual partner, what is that better together story? What can we take out to the market? Can we do joint customer use cases or joint webinars to explain that. So people out in, who are... And customers understand where that value add is. And the result has been tremendous. We get huge amount coming to and from our partners and we have our data and AI world tour in our summit and each one is filled with partner exhibits, presentations, dinners, local events. And it's really remarkable to see that whole ecosystem come to life.
Satyen Sangani: (23:06) Yeah, and I think also to get customers who can sort of speak to those examples and really say, hey, this is like what I got. 'Cause I think there's the story and then there's the realization of the story. And it's funny, I mean, when we talk to partners, one of the things that we've always try to do is say like, look, that's all great. How do you actually, do you have evidence of that it's actually working together. And those are really the fulfilling moments that I find arbitrate success. I actually want to circle back to baseball though, or at least sports because I think it's such an interesting topic. And to me one of the interesting things about it is that there's this element of almost a zero sum game like every sports team has.
Satyen Sangani: (23:42) I mean literally it's one of those places where every... We all talk about differentiation and we all talk about differentiation based upon analytics. And we pro... Like you evangelize that every single day and you wanna get people to buy into the business case in the sports industry. Like if you are not smarter longer term, because you're playing on the same set of rules. There's not very many much differentiation. I mean there is I guess big market team, small market team dynamics and so maybe there's a financial set of differences, but what I think it... It's almost a Petri dish for how useful analytics is. Have you ever thought about the meta on that? How do we actually figure out how analytics is affecting this industry at large? Is that something you've ever given thought to like almost even analyzing like which teams do it the best and assessing how much impact it has on their business? 'Cause you really couldn't do that in any other industry.
Ari Kaplan: (24:31) That's a awesome question and yeah, at one point I created and led the Cubs analytics department and that was one of the key driving questions is, how can we get an advantage? What are other teams doing? Who are the people at these other teams? What are their backgrounds? What type of data might they get? And now a lot of teams... One of the challenges is that there is still a lot of room for innovation, but talent switches teams and sometimes they take that knowledge with them. So really you have just a couple of years advantage. And I was also assistant to the GM of the Baltimore Orioles when we made the playoffs three times. But yeah, a lot of it was like, how can we get either that feature engineer and create new data or get creative. Earlier on, like one example before they had cameras to collect everything to capture where the catcher's glove was being positioned to do what's called framing, which is kind of tricking the umpire to call more strikes than balls as well as, if the glove is here and the pitch keeps coming up here, it's a development tool to help the pitchers better evaluate themselves.
Ari Kaplan: (25:37) So for example, I worked with Kerry Wood listeners may not know who he is, but phenom at 20 years old, he had like a game where he struck out 20 players, which was like a record for the league at the time. And he was getting older. So his brain would say the pitch is gonna end up here, but his body was a little bit slower, so would end up a couple inches higher and we would have to have interns physically on a screen with a pen touch where the glove was and where his pitch was.
Ari Kaplan: (26:03) That was unique at the time. Now everyone does that, now it's actually automated with a use of cameras and AI. But that's like one advantage is how can you collect proprietary data and then how can you detect using data science, like it's called feature importance, which variables are most important?
Ari Kaplan: (26:21) And this will still be 10 years from now, an ongoing race. And when I say 10 years from now is this Hawk-Eye data, which is a technology Major League Baseball introduced, captures hundreds of times every second, everything that goes on in the field, that only two years ago was at the Major League level and that's it. Then this past year they introduced it to AAA which is minor league level and that's it. But to really evaluate what are the skills that start at age 18 when you're the minors and progress until you're age 25 or 30, how is that aging curve? What skills improve, what don't? That's gonna take 10 years of data, even if we change nothing else, to be able to make better predictions of player development, finding what skills are better in the draft, predicting injuries and so on.
Ari Kaplan: (27:11) So that's part of the competitive advantage is, how can we ingest this data and it's a ton of data, terabytes of data every game, multiply that by dozens and dozens of teams at all levels around the world of right now teams are struggling to store it, process it on a daily basis. So teams that could do that faster will be an advantage. People that have that creativity and still being able to potentially even manually find new innovative types of data and see what's actionable, that's where your advantage is. And to your point, you still need an owner that would be willing to spend more money for better players.
Ari Kaplan: (27:50) Tampa Bay Rays is one great example where they were creative and they have a very low budget, but they've been competitive. The Montreal Expos, when I worked with them, they had the smallest payroll in baseball, but the best record of baseball since we used a lot of creativity to find players that were injured, that others had given up with, that were good at one point and then went sour.
Ari Kaplan: (28:12) Who could we fix? Jake Arrieta, when I was with the Cubs, one of the other examples we thought we could fix or change one thing in his approach and he did and became an elite pitcher after that. So those are some of the ideas. And for listeners, if you're not in the baseball world, same idea if you're in retail CPG healthcare, it's finding new sources of data, proprietary, non-proprietary, how could you synthesize it? And then how can you start working with the business people to get better and better and better insights?
Satyen Sangani: (28:46) Well, I think the interesting thing about it is, it's not so much you have to be a sports fan. My interest in this is actually less as a sports fan, I am one, but it's not really so much that. I think what's interesting to me is that often what you find is that people have a hesitancy to invest in data often because they just, it's a speculative investment and you don't really know the counterfactual. It's like, okay, I don't know what would happen if I didn't know what I didn't know, and so I'm just gonna keep on doing what I'm gonna do. And so there's never a urgency to it. There's always an importance to it. And so when you flip and you look at sports leagues, you've got almost the exact opposite phenomena where it's like, look, there's 32 teams in the NBA or whatever it is.
Satyen Sangani: (29:26) And if we're not doing it and we're not using it, then we're clearly at a structural disadvantage and it's gonna show up in wins and it's gonna show up in gate revenues and it's gonna show up in TV deals and it's gonna show up in players that we can go get. And so that's, to me, it's super interesting because it's almost like a forced march to it. In some ways like, there's a little bit more distasteful, but like porn is like this flagship industry that propelled the internet forward and a lot of the streaming technologies that were developed were developed because of the fact that it was pushing so much of the technical innovation. And I think sports has this really interesting characteristic and I think it also maybe has the opportunity to teach people about sort of this value that company companies get from data.
Satyen Sangani: (30:10) Like it can teach you like, oh, now like if I have to prove the value case, here's how I might do it because I can look at this analogy and I wonder whether there's learnings there. When you think about the general industry, 'cause you have this role where you've grown up in sports, but now you have to go out and pitch like, hey, invest in this data intelligence platform and it's a lot of money. And so how do you use that knowledge to sort of push people to actually to change and evolve and how they do what they do?
Ari Kaplan: (30:38) Yeah, and going back to the Moneyball, it is a universal theme. All of these points you have are great that it resonates that you have realtime decisions, multimillion dollar decisions succeed or fail. You have the themes of teamwork, people collaborating together. You have all data-driven themes versus gut feel themes, which is also very relevant. So everyone could value from these types of insights, but also there's a little inspirational as well. I think when I speak, people do resonate well with me having been in that seat to make recommendations based both on the gut feel and on the data science since when it comes down to it, every industry, its people being involved in human behavior and imperfect systems dealing with uncertainty.
Ari Kaplan: (31:24) And even within baseball, it's not just we make a prediction and that's it, it's about risk management as well. We want a player with these characteristics, but what is the risk involved and how do I hedge my bets against that risk? These are like themes of human nature that everyone resonates well. So yeah, I love in the evangelism role that I have, being able to tell those stories personal stories having been in the hot seat, so to speak, and just being empathetic to the audience.
Satyen Sangani: (31:54) Yeah, for sure. Maybe switching gears a little bit, you've done some interesting humanitarian work with a person by the name of Raoul Wallenberg. Can you tell us a little bit about that? Because I think it obviously, sports is obviously something that captures the imagination, but there's also like really altruistic and somewhat impactful work that you're doing in another domain.
Ari Kaplan: (32:16) Well, I appreciate you bringing up Raoul Wallenberg. It's a point of my life that I take great honor and humble and humility to have been part of. And if you have not heard of Raoul Wallenberg, he was a Swedish diplomat during World War II, during the Holocaust, that was from a very wealthy family. I actually just saw his relatives, the Wallenbergs, they're now investing in AI, but the family owned Enskilda Bank, Ericsson phone, ADI, Volvo, oil fields and so on.
Ari Kaplan: (32:46) So he left what could have been a life of luxury and ended up moving, becoming a diplomat in Budapest, Hungary during the Holocaust and risked and actually gave up his life trying to save as many innocent civilians as possible. So it could be a whole episode in and of itself. There's been movies and documentaries made on his heroic efforts.
Ari Kaplan: (33:10) He actually led a team of hundreds of people that would issue fake passports or a counterfeit visas, setting up safe houses. One of the few people that stopped trains transporting people and marches of people through cities to give these fake passports. And then right as the war had ended and the Soviet Union took over, they arrested him and thought he might be a spy or a double agent. And after a couple weeks of interrogation realized they made a grave mistake. And then this is where the interesting part happens in that they claimed he was never in their custody and he basically disappeared to history.
Ari Kaplan: (33:52) So the History channel made a documentary, it's called one of the two Biggest Mysteries of the Last Century, Amelia Earhart and Raoul Wallenberg. What happened to him? So where I jump in is, I'm now president of the investigation into the fate of Raoul Wallenberg is to use data, evidence, eyewitness accounts, transportation records, cell information of various prisons to find out what happened to him. His family who I'm been close with, I'm still close with, they need to know, the world needs to know, that is... Now numbers hundreds of thousands of people of the descendants of people he rescued, a small trivia point. Kofi Annan, who won the Nobel Peace Prize and became secretary general of the UN, Raoul, is his uncle in-law.
Ari Kaplan: (34:40) And so, as a result of all of that, I and a handful of other people have been given clearance to go into the Russian archives and make databases of every cell and every eyewitness and trying to reconstruct on a day-by-day basis where he might have been. And if people are interested to learn more, there's a website, raoulsfate.org, where they can learn more. But one of the big things where Russia at one point said he was never in our custody, and then they kept changing their story that he had a heart attack, and then he didn't have a heart attack, he was executed. And then we found interrogations.
Ari Kaplan: (35:19) We have every interrogation at Lefortovo and Lubyanka in the time period that somebody with the same number he was given was interrogated for 16 hours after his supposed death. So you can't interrogate a dead body for 16 hours. And, yeah, the work is going on if people want to donate or get involved or what have you, always appreciated. But given the current situation in Russia right now, I haven't been back for a couple of years, but there's a ton of research that still can and will be done and hopeful that the ultimate fate absolutely out there in archives that we are trying to get access to still, we know specific document numbers and case numbers and where these files could be.
Ari Kaplan: (36:03) So once we get that, we think the case will be pretty much closed and the world will know the fate. But it's been a big honor to be part of this humanitarian opportunity.
Satyen Sangani: (36:12) Yeah. And it's sort of more of a forensics. I mean, you're almost a detective more than you are essentially an analyst. Obviously, your analytical skills come into play, but getting these nuggets of information, you're doing a little bit of archaeology and a lot of discovery, it's hard work. And I would imagine has required you to acquire a whole bunch of skills that you otherwise didn't have.
Ari Kaplan: (36:34) Yeah. Able to drink with Russian guards until 3:00 in the morning so we can have conversations. So built up my alcohol tolerance. But other skills, being able to try to make persuasion, determination, since this is like decades long investigation with some heads of state and ambassadors and presidents. Even Pope Francis was on the board of the Raoul Wallenberg foundation before he became pope. So a lot of very important people, and you need diplomacy. You need to be trusted with confidential information.
Ari Kaplan: (37:07) So the Russians have entrusted that with me as has the State Department. So if we want access to additional information, it's building up that trust. But determination is probably the key thing, since we've been told no... We've been told yes a lot, but we've been told no a lot more.
Satyen Sangani: (37:25) It's incredible work. And maybe before we close out, no conversation in 2024 on analytics and data would be real without sort of talking about generative AI. And you referenced it a little bit. And certainly Databricks, as with many other companies, is trying to help people develop and sort of build their own models, use-models in order to do the work that they're trying to do. There's, of course, that, and then there's the large public models. And you hear increasingly this kind of world where people are like, you know, we may not be all that far from real AGI, sort of an artificial intelligence, a true sort of sentient-ness, if you will, or something that approximates that.
Satyen Sangani: (38:05) How do you think about this world of private models, public models? I mean, I imagine this is something that people ask you about all the time. How do you think about sort of what role we all need to play? How much literacy people need to have about this topic?
Ari Kaplan: (38:19) Yeah, this is great topic. And I love being a futurist as well, having gone to Caltech, and naturally curious, how can we all build a more intelligent tomorrow for the future? And one inspirational thing, having joined Databricks, was we just did a year of world tours, and the theme of the world tour was generation AI. There's generative AI, but generation AI alludes that everyone listening, every company out there, we are all part of this next generation of building really the next, how humanity and how society will all work.
Ari Kaplan: (38:54) The point is that everyone is involved and can and should be involved, whether it's ethics, whether it's governance, or whether it's taking things to the next level. So that's like the future part, something more practical in the short-term. There's public ChatGPT, that has great purpose for the masses. There's GenAI that can make new music and art forms. For businesses, there's that whole GenAI that can take vast amounts of their data and enable people to ask questions or to do that summary.
Ari Kaplan: (39:30) And what companies are starting to find out is easier said than done, you want to be able to do that where if it's based on your own data, you want to make sure your data that's proprietary, whether it's HIPAA controls or anything else, privacy information, does not get out, not just in the data, but not just in the LLMs, but in all the prompt engineerings. You have this concept of rag or fine tuning where you could ask a question saying, here, let me give you this set of documents and I want you to do something with that.
Ari Kaplan: (40:03) And if you use ChatGPT, those documents that you kind of inject into the prompts, could get out there in the open. Samsung, everyone uses that example was one, but it's happened a lot more. So companies, for the business perspective, want to make their own, either fine tune or made from scratch, their own GenAI, such as LLMs.
Ari Kaplan: (40:24) And then the next paradigm will be kind of like lang-chain. How do you chain together multiple... The input of one model is the output of another model, and each smaller specific purpose model might be able to out-perform a general, like, just one winner takes all type of model. So depending on the use case, you're going to get general generative AI, you're going to get specifically trained for different use cases and merge them all together. And yeah so, 2024 absolutely is going to be a year of incredible growth in this area, especially among enterprise adoption.
Satyen Sangani: (41:00) Yeah, for sure. Well, Ari, listen, this has been a phenomenal conversation, and I appreciate. I mean, it's just a breadth of knowledge ranging from Raoul to sports, to all of the evangelism work you do. So it's been a real fun time and a pleasure. Thank you for taking the time.
Ari Kaplan: (41:17) Thank you. I'm a big fan of your podcast and looking forward to hearing more from you, and this has been great. Thank you.
[music]
Satyen Sangani: (41:46) Having been at the forefront of many MLB analytics departments, Ari has leveraged data for player selection, prediction, and strategic planning. His work empowered sports teams to generate insights, develop a competitive edge, and even helped the Texas Rangers win the 2023 World Series. Sports is an interesting case study for analytics because it’s a zero sum game. In economics, the principle of a zero sum game is where there can be only one winner. If I win, you lose. If my analytics are better than yours, you lose. But a lot of tech and capitalism isn’t played as a zero sum game. Rather, most people play a zero plus game. In a zero plus game, there can be multiple winners. The best example is trade. If I have apples and you have oranges, I can trade my apples for your oranges and we all are better off. And the point to all of this economic theory is that – what sports tells us is that since analytics are amazingly impactful in a zero sum game – literally the difference between winning the world series and not – they have so much more potential in a world where insights lead to innovation that can benefit companies and their customers. Whether you work in retail, healthcare, or CPG, data analytics is key to making your business stand out. You’re able to find new sources of data, both proprietary or nonproprietary, synthesize them, and then work with business folks to get better and better insights. Even with all of the advantages analytics offers us, many organizations are hesitant to invest in data because they don’t see an immediate return on investment. We know there’s an importance to it, but there’s no urgency. Ironically, in sports, it’s the exact opposite. The use of data is felt immediately in game wins, player selection, and gate revenue. So, learn from the world of sports and let data evolve the work that you do. Thanks for listening, and thanks to Ari for joining today. I'm Satyen Sangani, CEO of Alation. Data radicals, keep learning and sharing. Until next time!
[music]
Producer 2: (42:37) This podcast is brought to you by Alation. The role of Chief Data Officer, CDO is more vital and challenging than ever before. Alation offers a vision for building a strong data culture that empowers people to find, use and trust data. Download the CDOs Toolbox, 7 Tips for Building a Successful and Sustainable Data Culture, a white paper available at Alation.com/cdo-tools, that's A-L-A-T-I-O-N dot com/cdo-tools.
Season 2 Episode 13
The heartbeat is human history’s earliest data tool, and measuring the progress of over 1 million Orangetheory Fitness customers begins with tracking heart rate data. In this episode, Ameen Kazerouni, the company’s CTO, explains how taking small steps can initiate a resilient journey — for both fitness and data transformation.
Season 1 Episode 12
Christie Aschwanden is an award-winning science writer, who was the first to report on key issues in reproducibility in sports science. In this episode, Satyen and Christie discuss the problems in science today (and how we can fix them!).
Season 1 Episode 8
In this episode, Amir discusses the Facebook Papers, controversies around self-driving cars, and the relationship between data-driven decisions and transparency. This one isn’t for the faint of heart, and you won’t want to miss it.