George Fraser (CEO) and Taylor Brown (COO) co-founded Fivetran, a fully managed automated data integration provider, in 2012 after completing the prestigious Y-Combinator accelerator program. Previously, George was a scientist at Emerald Therapeutics and Taylor — who popularized the term “modern data stack” — was a designer at North Social.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
George Fraser: (00:03) These data sources are very complicated and you have to write a lot of rules in order to deal with all these little corner cases. And so that was a heavy, heavy focus, just fixing things when they broke. And I would actually, every night just watch the log file when it ran at midnight. So it would run every day at midnight, and I would just watch the log files scroll by and if anything broke I would just fix it right then. So that tells you we had this fanatical devotion to reliability in the early days, which was super important because we were taking on the task of dealing with all the weirdness of these data sources.
Satyen Sangani: (00:38) That voice belongs to one of today's guests, George Fraser. George is the CEO and co-founder of Fivetran. Today he joins us with Taylor Brown, COO and his fellow co-founder. Fivetran is a pioneer in automated data integration and was most recently valued at $5.8 billion. They've also been at the forefront of many innovations in our field. In fact, Taylor himself popularized the term “modern data stack” and will get into exactly what that means. And yet all of this innovation was really the product of focusing on doing one thing and doing it extremely well: moving data. So let's get into it.
Producer: (01:26) Welcome to Data Radicals, a show about the people who use data to see things that nobody else can. This episode features an interview with George Fraser and Taylor Brown, the co-founders of Fivetran. In this episode, they and Satyen discuss founding Fivetran, the origins of the modern data stack, and creating a data culture. This podcast is brought to you by Alation. Alation enables people to find, understand, trust, govern, and use data with confidence. More than 25 percent of all Fortune 100 companies use Alation to support data-driven decision making. Organizations like Cisco, Pfizer, and US Foods drive data culture and innovation with Alation. Learn more about Alation at alation.com.
Satyen Sangani: (02:10) Today I'm excited to welcome the founders of Fivetran: George Fraser and Taylor Brown. Fivetran is a leading provider of automated data integration, and George and Taylor launched the company in 2013 — right around the time when we launched Alation — after completing the prestigious Y Combinator incubation program and is one of their very few B2B successes. Today, Fivetran serves thousands of customers and hundreds of global brands, and it's super exciting to welcome both George and Taylor to the show. George is the CEO. Previously he was a scientist at Emerald Therapeutics. He received his PhD in neurobiology from the University of Pittsburgh in 2011. George, welcome to the program.
George Fraser: (02:51) Thank you. It's great to be with you.
Satyen Sangani: (02:52) And Taylor is the COO. He's passionate about helping good humans build awesome products. He draws inspiration from his time as a liberal arts student at Amherst College with experience in design and a lifelong set of athletic endeavors. Taylor, welcome to Data Radicals.
Taylor Brown: (03:09) Excited to be here.
Satyen Sangani: (03:10) George, you've got the CEO title. So I'll start with you. Tell us about what Fivetran is, and how do you describe it to somebody who maybe doesn't know very much about data?
George Fraser: (03:19) “Fivetran gets all of your businesses’ data into one place” is the very short version. It takes the data from all the places that it lives, all the tools that your business uses: systems like Salesforce, payment systems like Stripe, databases like Oracle. We have hundreds of data sources we support. So it replicates all the data from those systems into a single central database and then keeps it up-to-date. And people use that mostly to just understand what's happening in their business.
Satyen Sangani: (03:51) And I would imagine that there were tools or things or products or software that did this before Fivetran was invented. So maybe, Taylor, to you: why Fivetran and why does it exist now and what was different relative to the things that existed before?
Taylor Brown: (04:09) It's a great question. The biggest change that happened in the space was the cloud data warehouses were initially Redshift and then Snowflake and BigQuery. And those, that technology that was much more elastic and cloud-based was like 10 to 100 times better than the existing data warehouses that were on premise. And this fundamentally changed the way in which customers, companies were setting up their data warehouses; they were replicating everything which allowed for new types of automation — i.e., Fivetran — versus only replicating some pieces and actually transforming data prior to loading it into the database.
Satyen Sangani: (04:49) In hindsight, investing behind the biggest companies in data seems as clear as day, but as a founder in the midst of an early trend, the picture looked a whole lot foggier.
George Fraser: (05:01) Fivetran is actually a pivot. So the original idea was to make a vertically integrated data analysis environment. It took a series of forms over the end of 2012, 2013, 2014. And data integration was part of it. It was at the bottom of the stack, initially just part of the concept. And then we actually built some connectors to Stripe and I think Salesforce and a couple other things — Zendesk, I think. It was more like a BI tool for most of that time. It was sort of “good data” or something like that, like a vertically integrated cloud based all the way from data source to dashboard.
Satyen Sangani: (05:42) You mentioned good data, but there was also Platfora. And I thought there was this other company, I'm forgetting the name, that kind of had an Excel interface, but that was kind of what people were doing. How did you come to Y Combinator with the plan of building a vertically integrated analytics tool? Why did you start talking about that?
George Fraser: (06:00) Well, the very first version of it, the idea was that it was a data analysis tool for scientists, which we were basically making the data analysis environment that I had always wanted as a scientist. And that only lasted weeks. We quickly figured out that basically I was the only one who wanted this. And so it turned into: okay, data analysis for business people. And then we were kind of chasing a trend there as you pointed out; at that time, that was the thing a lot of people were doing. And we kind of got drawn into that and we were tilting at windmills a little bit because it was sort of absurd to imagine that we would — that's a big thing to build, right? That's hard to build as three guys in an apartment.
And we didn't have a pedigree to raise money on at that point or anything like that. So there was maybe not the best founder idea fit in that early beginning phase. But man, we worked hard at it and we stumbled on the idea that ultimately would work, which was the connectors. So the connectors that we built as part of that ultimately turned into the business.
Satyen Sangani: (07:06) And Taylor, how long did that take? Was it months, quarters, years to get the “aha” that the vertically integrated stock wasn't going to work? How long did it take you to convince George that his idea wasn't very good?
Taylor Brown: (07:19) We got out of YC in spring of 2013 and we started just working on the full-stack tool around then. And it was probably the next summer that I was out chatting with various folks that one or two companies kind of pointed out to say, “Hey, you have this Redshift cluster that you're loading data into and you have this kind of BI tool thing on top of it. Could I just pay you to move the data into my Redshift cluster?” And I think at that point George was like, “Nah, that's stupid, that's not, whatever.” And so we're like, yeah, okay, let's keep going forward.And we finally just honestly started running out of money. And so we launched this full product in December of 2015 and we then started really pushing to talk to customers in earnest and get them to pay us. And that's where we came across Zenefits who said, “Hey, I'll pay you a lot of money if you just move my Salesforce data into my Redshift without me having to do a whole lot of setup, configuration, maintenance, whatever.” And I think at that point George was like, “Yeah, we can do that.” And he was like, “Oh wow, okay, this is probably something we should focus more energy on.”
Satyen Sangani: (08:24) Did you have a lot of other customers that were also saying the same thing, Zenefits or was it just that one Zenefits contract became the transformational, like, oh, we're getting paid to do this and therefore let's try to do exactly the same thing with other folks?
Taylor Brown: (08:39) As I remember it, as soon as we realized that if it's really needed, we turned around and said, what other tools are in the space? And we kind of looked around and said, “Wow, this is crazy!” The previous set of tools like Informatica and such, there's just so much, they’re so hard to set up and configure and maintain and they're expensive and they're on premise. And we were just like, “Why is nobody else doing this?” And so then we're like, “This is great!” And over a week I think we changed the whole website, deleted all the code we've been working on and just were like, “Now we're integrations for Redshift.”
And then we put up a pretty big kind of SEO net on our website, and then we ended up actually having a good conversation with the folks at Looker and we helped them in a competitive deal against Domo who was the hot vertically integrated tool at the time. And that win — because Looker was not vertically integrated, they're just a BI tool that sits on top of the warehouse — that win was very competitive and we came in and helped them win this customer then kind of projected us into a lot of wins with their reps. So their sales reps started bringing us into a lot of deals early on because they were just competing with Domo head-to-head. And so it was really fortuitous for us.
Satyen Sangani: (09:55) George, tell us about the journey of the product post 2015. So you realize that you have this product hypothesis where you're moving data from one operational source to some central warehouse. How did that get better and how did that evolve over time?
George Fraser: (10:10) There were a few critical decisions that were made really early on there. I mean product market fit, in a sense, was moving Zenefits’ Salesforce to their Redshift cluster. That was zero to one. And a few key decisions we made, number one, we said we're going to have to move everything incrementally. And it's funny — I joke, we rejected our first feature request. They asked us to sync a snapshot of the data every night once a day, which is sort of the classic way of building ETL pipelines. And we said no, because we decided that would not scale. They might work now, but in six months as their usage grew it would become impossible. So we said, okay, we have to move the data incrementally. That was a really fundamental decision. It's a lot more complicated, but we said this is the only way that you can make it repeatable.The only way you can make something that will just work for everyone is if you always, you say, “I'm always going to move the data incrementally, I'm just going to accept the additional complexity and do that work and then I will have something that always works.”
And then the second decision we made is, we said we're going to have a normalized schema. And Salesforce actually did us a big favor here. The structure of Salesforce API makes this kind of the easy path. If you look at how the Salesforce API works, if you just sort of do what is obvious, you end up with a normalized schema on the other side that exactly matches the schema of Salesforce. And that turns out to be a really good decision if you're trying to make something that's automated and repeatable that every customer can just take it out of the box and use it.
The reason is that data warehouse schemas — and by the way, this is a contrarian thing to do, especially back then data warehouse schemas were typically de-normalized. And in fact we told the early customers, look, if you want a denormalized schema, you can go ahead and just do that after we deliver the data, we'll deliver it normalized and then you can write a bunch of SQL scripts that de-normalize them. We're not stopping you from doing that, but we're prescribing this pattern, right? And the reason that ends up being a good decision is because there is only one normalized schema, but there are infinite denormalized schemas. If you go down that route, every customer is going to want something a bit different. But if you say, look, I'm just going to make it normalized and then you can remix it however you like, then you have a repeatable product that you can actually automate.
Satyen Sangani: (12:23) As I've discussed in previous episodes, the act of structuring data is both an analytical decision and a moment of information loss. When you start building an app, the schema consists of what you choose to capture from the user, but of course there's so much more that's not captured. There's a lot of power and risk in these choices. So being thoughtful and transparent about how we make them isn't just the job of the data engineer, it's the responsibility of every stakeholder.
George Fraser: (12:51) And then the other sort of early evolutions were, initially it only ran once a day, so it was a 24-hour cycle in the first version. And the initial focus was heavily on reliability. So the concept of “Hey, I'm going to give you an automated data pipeline where all the details are hidden,” it was very different at that point. It's hard to actually fulfill that promise because these data sources are very complicated and you have to write a lot of rules in order to deal with all these little corner cases. And so that was a heavy, heavy focus, just fixing things when they broke. In the early months, Salesforce was the most important connector and Zenefits was the most important customer. And I would actually every night just watch the log file when it ran at midnight. So it'd run every day at midnight and I would just watch the log file scroll by and if anything broke I would just fix it right then.
So that tells you we had this fanatical devotion to reliability in the early days, which was super important because we were taking on the task of dealing with all the weirdness of these data sources, which traditional data pipelines back then — and even today, most data pipelines — they put that under the customer. They say, “Hey, our job is we just call the API, we take whatever comes out of it, we do something with it. And if there's a 500 error because there's a setting that you need to make that we have never observed before, that's your problem. We don't just don't view that as our job. As long as we're doing the actions that we say we'll do, we're up.” And then we said something very different, we said, “No, no, the data will get there one way or another, the data will get there.” And also it's not going to be powered by pro services or something, it's automated. And some of that automation had to develop over time for sure, but the vision was very — well, it was very firm on that point from the beginning.
Satyen Sangani: (14:44) There's so much in what you just said. So maybe let's get back to the normalization versus denormalization point because I think a lot of people may not totally understand that language. Maybe just describe for our audience what is normalization and what's an example of that and how does that relate to this ITFT normalization.
George Fraser: (15:02) So normalized schemas, they have one table for each concept. So in this example of Salesforce, you have an accounts table, you have a contacts table, you have many, many other tables, right?
Denormalized schemas have fewer tables and they often represent more like aggregated data. They'll mix multiple concepts together. So an example of a denormalized table might be revenue, right? Revenue is something that is calculated from subscriptions and discounts and all kinds of other concepts that exist in separate tables. You pull all that together and you make one revenue table. And usually when you’re trying to ask questions of data, you actually want a denormalized schema because it's simplified fundamentally from a sort of, I want to ask a question like, how much revenue did I do last month? A denormalized schema is simpler. And so typically a lot of the work building a data warehouse is coming up with a good denormalized schema, transforming the data into that format.
And so what we did where we said, “Hey, we're going to deliver a normalized schema to your data warehouse,” that was a little bit weird because then there's this remaining work that needs to be done to transform it after it arrives and we're putting that onto the user. But the upside of that was that it made it possible for us to truly solve the first half of the problem because we could build an automated pipeline that works the same way for every user. Because everyone's going to have a different denormalized schema, right? You fundamentally can't automate that part of the process. And so by actually cutting that off, by making our scope a little bit smaller in that respect, we were able to deliver a much more automated solution — and product, really. We were able to build our product rather than a solution actually.
Satyen Sangani: (16:53) As George and Taylor built Fivetran, they learned a very important lesson: the power of saying no.
Taylor Brown: (17:00) I remember doing a conference at Looker in New York on a rooftop and it was middle of summer and I was the only person there. So most people at their booth have three or four people and it was just me sitting there getting a massive sunburn, kind of manning the booth. But it was just scrappiness at the end of the day. It was like, I was doing a bunch of ... I set up this system to have someone who would scrape all of the job postings for anything that had Redshift in it. And then I had a team in Eastern Europe kind of find people's names and then, they would all get loaded into my Outreach and then I would blast lots of emails, let's say like 40,000 emails a month at that point — so I was just basically doing anything possible to try and get folks to come.
I think we set up a pretty good SEO net with almost 500 pages. Again, it was just as many scrappy things as possible and I was doing almost five demos a day and George was kind of like my SE. So he'd be like, “Oh, we're talking to X large customer and they have a bunch of questions about database replication.” Like, “All right George, let's go talk to him about it.” But it was good because we learned so much. I wouldn't have traded that for anything because I learned a lot about what the customers wanted and I learned how to sell it appropriately. We also learned how to say no. I mean I think the other thing George is not necessarily saying is that customers ask for a lot. They ask for transformations, they ask for a data lake in the middle, they ask to load to all different kinds of places.
They definitely ask for a lot more configuration and we just said, “No, we're going to just do the replication thing super well or as well as possible.” And that took some time and it took some effort to learn how to also sell that to our customers. Because effectively they would come and say, I'm right, I know what I need, I need this. And we'd say, “Well, then it's probably not the right product for you because this is what we do and this is why this is important.” And that “aha” moment for them would then be like, “Wow! This is really revolutionary!”
Satyen Sangani: (19:00) Where did that rudder come from? Where did you guys — when did you have that focus? Was it as soon as you had the “aha” from Zenefits in 2016?
Taylor Brown: (19:11) Well, Zenefits actually asked us to build a connection back into Salesforce, which we did using Lightning Connect. So we at first were like, “Oh! Big customer that asks for stuff, let's build it for them, kind of.” And then within, I don't know, three or four months we were like, “That's a bad idea. This is a bad idea.” We got rid of it and decided to really focus. So it took us a little bit of time.
George Fraser: (19:33) We also just had so little money at the time. I mean we were just running the company off of revenue at that point. The YC money had run out, we hadn't raised more money and we were just paying the bills. We had a handful of people. I mean we probably still had less than 10 people at that point in 2017 and we just didn't have capacity to not focus. It was not a choice, in some ways.
Satyen Sangani: (19:56) There's this concept of a modern data stack. You guys, and I think you and George in particular have been pretty vocal about it. Maybe first tell me about what the modern data stack is and then I guess secondarily would love to hear a little bit about who you most commonly complement and how you see that evolving?
George Fraser: (20:15) Well, I think Taylor was the one who really locked onto this term years ago, we even, when back when we were still in the apartment and it was just a three person company, I remember him saying “Modern data stack! That's what we're doing!” And it was not a commonly used term at the time. He's like, “That's the term we need to lock onto.” And then over the years it has really grown but I think you have a decent amount to do with the present day popularity of that term, Taylor.
Satyen Sangani: (20:45) Oh I'm so sorry for misappropriating credit. So you are the father of the modern data stack, Taylor?
Taylor Brown: (20:50) Well I used to have this slide that I did in every single pitch. As I was saying, I was doing an average of five demos a day and on the one side of the screen there was your classic ETL and it was literally titled “Classic ETL.” It was like, you're taking the data out, you're kind of extracting it out of the sources, you're transforming it, you're putting it into the data warehouse and then you have just kind of BI layer on top of it and the modern data stack in the way — and literally on the screen that's what it said. It was like you're just extracting the data directly and loading it directly so it's automated and then you're doing your kind of transformations and data modeling within the warehouse and then you're putting the BI layer on top of that.
And so that was the original kind of incarnation of the modern data stack, which I think has since then grown in terms of what the concept means. But that was where originally I pitched it quite a lot and I don't know if I originally came up with it, but that was just what made sense to me because I'm like, “This is modern and this one's really classic. So let's go with that!”
Satyen Sangani: (21:52) But then now that as you pointed out, the term has evolved and it means, or at least everybody's going to be a part of the modern data stack. So how have you seen that evolve and what do you think are the most important elements of somebody who's curious about the topic and interested in investing it and who is it for? Is everybody going to do the modern data stack?
Taylor Brown: (22:11) I'll let George take this one. I mean, what do you think?
George Fraser: (22:14) It has definitely evolved. The original modern data stack was, I would say, Fivetran, Redshift, and Looker. Looker was also a really important player in developing this ecosystem because this was a BI tool that had the ability to really do data modeling inside the data warehouse in a way that had not been emphasized before. So that was key. That was kind of the original modern data stack. And then this fundamental idea of, like, you move the data and then you model it in place in the data warehouse, that really took over. And so other BI tools adopted that, dbt emerged — which the precursor to dbt, I think, was LookML in many ways. That basically allowed you to adopt that workflow with other BI tools.And then we've seen the reverse ETL vendors as they call themselves or as some people call them, emerge. It's funny, when Taylor was talking earlier about how we had customers asking us for things left and right but we just said no. People were asking us for a reverse ETL for the whole first couple years. Constantly, constantly.
Satyen Sangani: (23:19) Yeah. “If you can get it out, why can't you put it back?”
George Fraser: (23:22) Yeah, exactly. They're like, “Can’t you just reverse the polarity? I mean, just turn it around!” We're like, no, no, that's not how it works. And now I think we've seen a lot of other tools attach themselves to that term, “the modern data stack,” but to me it's still just, it means that fundamental insight of “move everything without modification and then deal with it in the data warehouse.” It's fundamentally about, hey, these data warehouses are so much faster and so much cheaper, we can refactor the order of operations in a way that is going to be a lot more efficient for humans. It's not efficient for the machines but it doesn't matter because they're so cheap now, it's going to be a lot more efficient for people and so we're just going to be able to get way more done than we used to. We're going to be able to have way more sources, way more use cases. And that's what the modern data stack is to me, as well as a cool term that everyone attaches themselves to.
Satyen Sangani: (24:19) I think this idea of optimizing for the compute capacity in the cloud, which I think drove Looker's founding, certainly drove your guys' founding, seems like it also drove dbt's founding and obviously traction and utility is the core insight and obviously made possible by things like Snowflake and Databricks. Do you see — with HVR, I believe that's a more enterprise-focused product — do you see companies transform a lot of their non-modern workloads or is that a big part of the growth that you see moving forward is transforming what may not be totally modern today into more of a modern framework?
George Fraser: (24:57) Well HVR, which was acquired by Fivetran last year, has been for many years the best in the world at replicating big, bad enterprise databases like Oracle, SQL Server and particularly ones that are under extreme load with extreme requirements of throughput and latency. And they have some amazing people and some amazing code that is getting rapidly incorporated into Fivetran that makes that possible. And much like Fivetran, they have a very long history — or they had a very long history before the acquisition — which involved solving a lot of incidental complexity over a long time.
So very, very similar companies. But as you point out, they had much more of a focus on enterprise and much more of a focus on databases. We see a ton of those workloads. That's a huge part of our revenue growth now, is moving these old enterprise databases, replicating them into an analytics environment. But they're not getting taken out. Oracle's not going anywhere, SQL Server is not going anywhere, and they may not even move to the cloud, by the way. I think this is something that is widely misunderstood as there's nothing outdated about relational databases in general or Oracle in particular. These technologies are here to stay. Now you're going to want to integrate that data in a cloud data warehouse. A cloud has huge advantages for analytics, but these on-prem databases are not going away.
Satyen Sangani: (26:24) Obviously the modern data stack is a big thing now, but you still have both on-prem and sort of off-cloud data sources. So as you think about the evolution, then, of your product, what's fun is I think we are seeing you guys in the market more and it seems like there's more interaction points and intersections between the two companies and the use cases that we're seeing. And I think in particular this idea of normalization is really interesting because if you're a catalog and you're cataloging all of these schemas that are totally normalized, it gives you ground-level truth and so you have full fidelity of that data, and it means for discovery that you can do so many interesting things.
In this new world, where people can discover these sources in the cloud and can manipulate the data arbitrarily, what is your guys' vision moving forward? What do you expect to want to do over the next five years? Is it more scale, more reliability, or how do you see your company's evolution happening?
George Fraser: (27:26) Reliability is always — it should always be higher. It's a never-ending journey to make that better and better, and coverage is really important. One of the biggest things we need to be able to do that we still can't do today is to go to someone and say, “We can support all your company's data sources. We have a connector to that.” And there are some really hard engineering and engineering management problems that we are solving in order to make that possible to support thousands of connectors. Right now we have about 200, but the day will come when Fivetran has 2,000 connectors and probably 10,000 connectors, because you really need to be able to go — especially with big companies — and say, “Look, I can cover a hundred percent.” And so that's a really important vector for us. Databases are a really important vector, that's hence the HVR acquisition, getting really great at those.Because especially with older companies, often the most important data in the company is in an Oracle database or an SAP or a P system or something like that. So that's really important. And then we're doing some things around transformation and governance: some things ourselves, some things by integrating with partners like Alation and dbt Labs. But the way we see it is that connectors are the core, they're the core of the solar system for us and everything else is something that's connected to that, that's orbiting around it, but great connectors will always be the core of Fivetran.
Taylor Brown: (28:53) I may also just add that as in what we've seen over time, initially we were largely being used for internal analytical use cases and as the speed and reliability of Fivetran have increased and also as the latency has decreased on the data warehouses, I think there are more customers utilizing Fivetran or the modern data stack for more than just analytics. They're building applications on top of it, they're running their own kind of scenario planning or whatever else downstream, doing payments, doing fraud detection. There's just a lot of different use cases that we're starting to see and that opens up with more speed and reliability. So I'm excited to see more of that and I think that will just continue to evolve over time.
Satyen Sangani: (29:39) For the final portion of our conversation, we shifted gears to discuss how data impacts George and Taylor's own decision making.
Taylor Brown: (29:47) So that's worked, I think in terms of data. We've truly tried, especially for the big decisions, to bring in as much data as possible. And I think early on this was like, “We just need a dog food.” And then it just became so clear: “Hey, if you don't have the data it's really hard to make an educated decision.” And now it's got to the point where I think the decisions have gotten bigger and bigger over time, but the clarity of what we should do is actually a little bit easier. The data's like, “Hey, it's pretty clear you should do this thing.” It doesn't make it less scary because it's usually a big decision. But I think that's kind of shifted, and that's just really how I think we think about most of our big decisions these days.
Satyen Sangani: (30:28) Can you recount a decision where you used data where you might not have been able to do so five years ago?
George Fraser: (30:35) Well, I can think of just a recent example. We added another tier to our pricing model that was cheaper with some limitations — standard select — but it was significantly cheaper. And we were sort of testing the waters for a freemium model was kind of the thesis of it. And there were a lot of claims that, a huge number of, that we had seen a huge lift in the number of new customers, and I was skeptical of them because I thought people were doing kind of a naive analysis where they were comparing just raw accounts. Some of that would've happened anyway, we were growing. It's not all just because of that one factor. And I actually sat down and explained to one of our product managers how to do a regression discontinuity analysis to address this question more rigorously. And she did it exactly right and it was very convincing at the end. I'm like, “Oh wow, that really did work!”. So that's kind of a weird example that's more on the analysis front than on the data. This is the underlying data. It's pretty simple, it's just customer counts by month.
Taylor Brown: (31:28) I have another example which definitely probably was not utilized five years ago for multiple reasons. We've invested recently in a couple of nice offices. We have a nice office in Denver and an especially nice office in Oakland. And we want to understand how were folks utilizing these offices, how often were they coming in? Were the programs that we were running around lunch and breakfast and those starting to work? So we set up a pipeline to get all the information about when people are actually in the office. And it's like, it's been pretty enlightening to see which offices on what days people are actually coming in. And it's still early days on how we're going to optimize the programs around this. But I would say that probably wasn't something that people were spending a whole lot of time on five years ago. We certainly weren't.
George Fraser: (32:11) A somewhat exotic data source that I've been paying attention to a lot recently — that I think a lot of people don't use that more people should — is Google Search Console. So Google will tell you how often you appear in Google searches and if your name is something unusual like Fivetran, you will appear on the first page in every single Google search for that term.
And you can use this data, this data tells you something very significant is how many people, what we type into that Google search bar is a psychic straw going into our brain. It's just whatever you're thinking about. And so it's a very powerful metric of just awareness and how many people are thinking about you. And I follow it very closely. And we have the data for ourselves from our own plug warning — but we have a Google Search Console connector, of course — that will deliver these data to your data warehouse. So I have a dashboard of what I consider some fundamental trends driving the business. And this is one of them, is how many people type “Fivetran” into Google each week tells you something really important about how we're doing in terms of just growing awareness.
Satyen Sangani: (33:16) So how have George and Taylor built Fivetran's data culture?
George Fraser: (33:20) It is tough, first of all, to create a real data culture. It does not come naturally to people. People do not like being proven wrong by numbers. People do not like changing their mind. So I think first you have to acknowledge that it's very hard and that if you're making any progress, you're doing great.
Some of the things we have done are leveraging the All Hands, trying to incorporate data presentations into the All Hands, trying to make them good. One thing we did recently — which I think has been a big improvement — is I put in place that everything in the All Hands had to be a time series with a history on the x-axis and a goal. Unless it was really impossible, you couldn't have a goal because those are so much more useful for presenting things like sales results as opposed to just, what is the number this month?
It's like, well, what has our performance been for the last six months and what was our goal for each of those months? Because you can see trends. So I think the All Hands is a valuable tool. Making the, following best practices in the way that you present in the All Hands is a valuable tool.
One thing which we've done recently is we put a bunch of dashboards with key company metrics around the offices and they're fixed. So any given dashboard shows the same metrics all the time. And the theory there is that when you associate something with a place in the world, it makes it easy to remember. Going back to my psychology classes in college here, but it's actually a memory trick that people often use. So the idea is that you start to think about revenue — that's in this corner of the office — and then you actually remember, what does that trend look like over time? And a lot more things like uptime is a metric we track very closely, and that's in a particular place in the office.
Taylor Brown: (35:06) I would also add just that for our largest decisions, we have really tried to use as much data as possible. And there have been times when George and myself, especially George, have been against certain things and then going and actually really digging in the data and actually thinking about it and going back and forth. We've seen George change his mind or the person who is pushing for it change their mind based on what we actually are seeing within the data. And I think that across a bunch of different examples has set a precedent for, look, it's data's showing this, this is what we're going to do, even if the CEO is like, “No, I think this is really stupid or whatever.” If we can prove that, no, actually this is a really good idea or this is worth trying or whatever, that starts to really shape how people think about the entire business.
Satyen Sangani: (35:54) Well I guess maybe take us out and give us maybe one bit of advice. I think give our listeners one bit of advice on what you would tell them if they were trying to build a data culture. What's the one area they can focus on? What's the one thing they can do that you see people perhaps making a mistake of all the time? And so maybe Taylor, I'll start with you.
Taylor Brown: (36:19) Yeah, I guess I would just go with lead by example. I think especially the leadership team are the ones that have to dig in and request data and actually follow what the data says, not just what their gut is telling them. And I think that's especially hard for folks who are not used to working with data and maybe not used to having more of an empirical background. And so I'd urge them, those types of leaders, maybe sales leaders or otherwise, to bring other folks in who have a background in data and who will push them and trust them to make sure that they're right. I think that's the number one thing that leaders can do to change their culture — and then try and broadcast it to the rest of the company — so everyone knows this is how things are done.
Satyen Sangani: (37:02) Yeah. I love that lesson.
George Fraser: (37:04) I would say sort of a lesson from a little bit of the inverse perspective. Make sure you get the foundation right. There's nothing that's more toxic that will destroy your data culture faster than when the numbers are wrong. If people see numbers that they later find out are wrong, they lose trust for a year. So you really need to make sure that your systems are good, that your reports are good. Get the basics right in order to build trust in the numbers. That is really important.
Satyen Sangani: (37:41) To me, the two biggest lessons that come out of George and Taylor's journey are the willingness to be wrong and the willingness to focus. It's not just that they use data, it's that they're open to data teaching them new things — and even proving them wrong. And if they weren't able to focus, they would've spread themselves too thin across a data landscape that has a near-infinite set of requirements. It's a great lesson for startups and it's also a great lesson for data teams. Focus, learn, iterate, and do it again and again and again. This is Satyen Sangani, CEO and co-founder of Alation. Thanks to our guests, George and Taylor, for joining me. And thank you for listening.
Producer: (38:30) This podcast is brought to you by Alation. Are you curious about metadata management ? If so, then this white paper is for you. If not, then you should be, learn how to evaluate leading metadata software by clicking the link in the podcast notes.
Season 2 Episode 24
What does baseball have to do with data? Ari Kaplan, head of evangelism at Databricks, was instrumental in bringing a data-driven approach to a previously gut-driven sport and inspiring the Moneyball book and movie. Ari explains how businesses can learn from sports analytics, why a data culture is so critical to success, and how AI and generative AI are, literally, changing the game.
Season 2 Episode 14
If you know technology marketing, you know Dave Kellogg. The Kellblog author is an expert in tech marketing, sales, and how to evolve with your industry. Yet with all he’s learned, his advice for data leaders boils down to “Keep it simple.” In this chat, Dave offers his insights to simplify, from building frameworks to identifying your “crux challenges.”
Season 1 Episode 18
Enterprise Knowledge Graphs are having a moment. Learn how Kendall Clark’s unconventional path from philosophy PhD to data scientist prepared him to found Stardog, which creates a semantic layer for your data fabric and today counts NASA as a customer.