Declarative Computing in an AI World

Jeff Chou, Co-founder & CEO, Sync Computing

Jeff Chou

Jeff Chou is the CEO and co-founder of Sync Computing. He holds a PhD in EECS from UC Berkeley and was a Battelle Post-Doctoral Scholar at MIT. An Entrepreneurial Research Fellow at Activate, he previously worked at MIT Lincoln Laboratory, specializing in high-performance computing and optimization.

Jeff Chou

Co-founder & CEO

Sync Computing

Satyen Sangani

As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”

Satyen Sangani

CEO & Co-Founder

Alation

0:00:03.7 Satyen Sangani: Welcome back to Data Radicals. Imagine a world where computing isn't just about ones and zeros, but where nature itself helps solve complex problems. That's the promise of analog computing, a fascinating concept that has influenced everything from quantum mechanics to AI optimization.

Today I sit down with Jeff Chou, co-founder and CEO of Sync Computing, to explore how his company is revolutionizing cloud infrastructure. With some help from analog computing, Sync is tackling one of the most pressing challenges in the AI era: the massive inefficiencies in cloud compute spend. Jeff explains how bridging the gap between software and hardware can unlock game-changing optimizations, helping businesses slash costs while improving performance. From his early work in high-performance computing to pioneering automated resource allocation, Jeff shares insights that will change the way you think about cloud infrastructure, data workloads, and AI efficiency. Whether you're a startup, an enterprise, or just curious about the future of computing, you won't want to miss this conversation.

0:01:11.1 Producer: This podcast is brought to you by Alation, a platform that delivers trusted data. AI creators know you can't have trusted AI without trusted data. Today our customers use Alation to build game changing AI solutions that streamline productivity and improve the customer experience. Learn more about Alation at alation.com.

0:01:33.5 Satyen Sangani: Today on data Radicals. I'm excited to welcome Jeff Chou, Co-founder and CEO of Sync Computing. Sync is a startup that automates cloud infrastructure for data and AI, helping companies cut costs and meet performance targets. Their machine-learning models boost cloud efficiency for startups and Fortune 500 companies alike. Jeff holds a Bachelor's and PhD in Electrical Engineering and Computer Science from UC Berkeley. He's been a postdoc at MIT and an entrepreneurial Research Fellow at Acctivate. Jeff, welcome to Data Radicals.

0:02:06.3 Jeff Chou: Satyen, thanks for having me. Excited to be here.

Jeff’s background: From Berkeley and MIT to founding Sync

0:02:09.9 Satyen Sangani: We of course go a little bit farther back, but I would love to have you tell our listeners about how you really actually... Tell us maybe about your early start like at Berkeley. How did you get to the place of wanting to do a PhD in computer science and what led you to the place of starting Sync?

0:02:30.3 Jeff Chou: Yeah, that's a great question that goes really far back. Let me think about all the way back to undergrad. Yeah, I was at Berkeley, so I grew up in the Bay Area, so I didn't move far from home. So Berkeley is about 30 minutes from where I grew up. Yeah, I was a EECS major at Berkeley. I was actually at the intersection of software and hardware. So half my class were CS, half EE, but most of my concentration was like FPGAs compute like low-level coding, MIPS type stuff.

And then I went to a PhD to study also slightly more on the hardware side. I just got my PhD more in optical interconnects for like large data centers. Looking at networking and how do you improve like the optical interconnects in between systems, and nowadays it's a super hot area obviously when you want to interconnect a bunch of GPUs together. And so I've always been very interested in computer science and computing, large-scale computing, probably skewing a little more towards the hardware than the just pure software engineer is kind of where my interests have always lied. And then after that I went to MIT to do a postdoc.

0:03:31.3 Jeff Chou: I kind of switched directions a little bit to try something new. I did kind of energy research. And then after that, I went to MIT Lincoln Lab, where I did a whole bunch of stuff. But one of them was exploring new paradigms of computing. And our first thing was actually this paper that we published on analog computing that the thesis was that analog computing could do kind of what a GPU is trying to do, but much cheaper and lower power, much faster. And so that was kind of my journey. That work, that paper we published was like the seed idea for Sync Computing. That was the genesis at the time that actually got us kind of our first VC meeting. And that's kind of how we started that journey.

What is analog computing? How can it minimize energy consumption?

0:04:10.1 Satyen Sangani: What is analog computing?

0:04:13.0 Jeff Chou: This is a really interesting topic. So it's actually not a new idea, I would say in the hardware world it's a known thing for the past like 30, 40 years. But obviously today it's all about digital computing, right? Zeros and ones, all numbers are represented in some sort of binary format. In analog computing what you're doing is you don't represent numbers and zeros and ones anymore. What you do, it's actually more related to the field of quantum computing, where you have a physical system, a bunch of LC capacitive circuits that basically couple all together. And when you have all of these circuits, all of these resonators kind of coupled together, and you program the weights in between them, where it naturally settles is like the answer to your optimization problem.

It's actually a totally different way of thinking about math problems. The best analogy is like when it rains, water collects into puddles. And let's say your question was where is the lowest point on this pavement? Well, you could let it rain and then water will collect and you will see where water collects is probably where the lowest points are. So nature kind of helped you solve and answer that question.

0:05:23.6 Jeff Chou: And what nature always wants to do is minimize energy. And if you program the physical system correctly, what it settles to can be the answer to your kind of mathematical question. I don't know if that makes sense, but that's kind of the general idea. You're using nature, nature's desire to minimize stuff to answer your question. If you can program your question in the form of how nature in the rain analogy, instead of flat pavement, you purposely put in lumps and hills and valleys, and then where water collects is the answer. That's kind of the best analogy I've tried to, I've had to come up with to kind of explain.

0:06:01.0 Satyen Sangani: Yeah, these sort of quantum phenomena are so complex intellectually. And I think even now you're seeing a lot of sort of probabilistic computing in the world of LLMs and vector databases. And then the question is like vector math in sort of predicting these next tokens, which is a fundamentally probabilistic event. And so this like entire area of computing where you've moved from sort of deterministic and binary to probabilistic, and there's so many different trends coming together at the same time, which obviously perched on the advent of AI, but you have now, with Sync, found this new use case or a different use case or a different application of this computing. Tell us a little bit about that. So how did you get from this analog computing concept to what you're doing right now?

Sync’s startup story: bridging the software-hardware divide

0:06:53.3 Jeff Chou: It's a very interesting story. Like most startups, we had a very windy path. It was actually more on the business side, less on the technical side. So when we started, we had this algorithm and this idea. It's basically this optimization algorithm. But what we found, it was very hard to find a concrete business problem. Do our businesses actually bottlenecked by their inability to solve a math problem? Maybe a couple finance firms around the world, most retail companies and any online company, like, are they really bottlenecked by a math problem? Probably not. It's usually like, I can't get customers or I can't sell. That's usually what the problems are.

So eventually we had like a hammer looking for a nail. And then we're like, all right, this is really hard. And like, well, let's go backwards. Let's go actually find a problem. And since computing is our background, my co-founder and I actually started in high-performance computing. And there we found the problem of this gap between software and hardware. And this actually stems back to my days when I was doing large scale simulations. The big problem was resource allocation, which is, okay, I'm going to run this big simulation, a big AI model, some big giant query.

0:08:07.6 Jeff Chou: How many compute resources do you need? How many nodes, how much memory, interconnect, et cetera? I would say the vast majority of people, including me back in the day, I just asked, what is maximum? And I'm going to ask for that. I used to ask for like 2,000 nodes to run my simulations and I had no idea why. I was like, whatever's max. My logic was like, more the merrier and I'm not paying for it. Some like data center, it's not like coming out of my wallet. I don't care, just max.

0:08:38.5 Satyen Sangani: The, I'm not paying for it, is the key statement.

0:08:40.6 Jeff Chou: So there is this gap between software and hardware, which is like you have some software and obviously there's a bunch of hardware. And these two worlds don't typically talk to each other. You just kind of, the user is in the middle who has to pick the hardware, and then they click submit, and then it runs. And that was the source of incredible inefficiency that a human who doesn't really know, most people don't understand hardware, is picking the hardware. And so we thought, well, can't you automate this? Can't you profile the software and then automatically detect what hardware is best?

So that was kind of the thesis idea. And then if you can map it to like a math problem, that's kind of how it relates to our original kind of optimization thesis. So we started there, we did it in the government space, in the high-performance computing space. We got a couple government grants to do that. And so we had this kind of proof of concept idea. It was pretty cool. We could optimize computing, make it cheaper and faster. And then the VC world at that time was like 2020, 2021, when VC markets were quite hot, and we showed it to everyone and they're like, yeah, cool technology, but can you do a bigger market? And we're like, okay.

0:09:48.2 Jeff Chou: So probably as typical, most VCs want like the biggest thing you can go for. That's kind of when we made the choice to go after kind of a much larger, what's the biggest commercial version of this? Because high-performance computing and the government is very kind of niche. And so obviously, even then it was Databricks or Snowflake. We just picked Databricks because it's more open source, it had all the knobs accessible, like all the cluster stuff, et cetera. And so that's kind of how we landed there. And then we just had to kind of point our solution from high-performance computing. We were doing like air force simulations, fluid dynamic simulations, and we had to point it towards like big data and Spark essentially. And that's kind of how we evolved to where we are today. And since then we haven't had a major pivot. That's kind of how we arrived to the market.

0:10:35.6 Satyen Sangani: So let's talk about Databricks and Spark in a pre-Sync world. What does it do? Like, how does it work? You mentioned it sort of you as an engineer would just sort of say, okay, let's just go max resources. How does Spark work when you're sort of unoptimized and not really thinking about resource allocation?

How compute gets costly in an unoptimized world

0:11:00.4 Jeff Chou: Oh, great question, Spark, obviously it's code dependent, like what is your code trying to do? But in computing, fundamental computing standpoint, there's a few major things that can bottleneck or choke your system. It's either like CPU limited, storage limited, or network limited. These are kind of like the three main buckets of what can really choke your job. Even in Sparkland maybe the bottleneck is you just moving files from S3 into your cluster to compute. Then really it's the network that's kind of killing you, or maybe you're running some complex ML algorithm, and it is actually the CPU clock that's killing you.

In Sparkland, it's the same thing. You can obviously have a very complex DAG or a query plan, and it depends on what you're doing. There are a lot of profilers out there that you can actually see, well, what is the majority of my time being spent on? Is it IO, is it compute, is it storage, is it network? The same problem exists in Sparkland where it depends on what your application is trying to do. And then the other side is the same thing I mentioned before, where the humans have to select, so a human has to select which nodes, which instances on Amazon, how many instances.

0:12:16.7 Jeff Chou: And then Databricks has done a good job. They've kind of solved this part of the problem. But in old school Spark, you have to then select number of executors, number of workers, and then kind of the memory allocation of each worker. And that's where Spark is more famous for being kind of a very painful tuning problem. Databricks has done a good job of hiding and kind of giving you really good, best, average configuration, so you don't really have to deal with that these days in Databricks' world. But like old school Spark, it was just like a mess of configuration, memory, compute network, number of workers, executors, off-heat memory, on-heat memory. It was just like a bunch of stuff. I would say 99.9% of the population has no idea what to pick. So that's kind of like the old Spark world, I would say.

What is declarative computing? Scaling compute, minimizing costs

0:13:06.3 Satyen Sangani: And so I'm an engineer, I'm sitting in some Fortune 500 company. I may have some awareness of how to sort of turn some of these dials around, the number of workers for a given job, and how the job is working. And what all the steps are because I've just run some sort of an analysis of the application or the workload that I'm trying to drive, but I don't know all of it. And it's all kind of like dynamic and hard. And I'm by the way like spending a lot of money that’s not in my budget. So who cares? Like XYZ bank is just pretty rich and we've got a whole bunch of EC2 anyway, so what does it matter? And I'm doing this and all of a sudden now CIO comes along or VP of infrastructure is like, oh my God, we're spending blah like bajillions of dollars on this compute thing. And now Jeff, you appear with a cape and you say, sing to the rescue. So what do you do?

0:14:00.3 Jeff Chou: Yeah, so we wanted to make this problem go away. And we have this kind of hig-level thesis we call kind of declarative computing. Which the idea is, let's flip the story. Instead of a human having to pick the resources and pick all these configurations, that's really hard. Most people don't know any of that stuff. But what people do understand is the outcome. How long did it take? How much did it cost? What was the latency? These are very understandable. These metrics are tied much more to the business I would say. So our whole thesis of why can't you go? Why can't we flip the story? Why can't you declare the outcomes that you want? I want a one-hour runtime, I don't want to spend more than 20 bucks and I want this kind of performance on my job and then some magical system go figure out what hardware that is. That's kind of the big vision of what Sync is trying to do. And so that's what we do in the Databricks world. And so now let's say you have hundreds or thousands of kind of ETL pipelines running daily, hourly.

0:15:03.9 Jeff Chou: What most people do today is they just kind of have default cluster settings like 10 workers, 20 workers, this default instance type and then everything kind of runs on that. Instead, what we can do is you can say okay, for each one of these, these have a one-hour SLA, these have a two-hour SLA and then Sync, go find the cheapest cluster that meets my SLA. And then that's what we kind of do at scale. We'll come in and we'll using our ML algorithms, we can actually live in production, tune all 500 clusters, all 500 ETL jobs simultaneously so that each one is custom fitted to kind of hit their SLAs given whatever each job is trying to do.

0:15:44.9 Satyen Sangani: Got it. And you said that's what we're trying to do. How close are you to that exact statement? I mean how much does it look like a form or a SQL statement where I'm just simply choosing outputs and I guess the variant ultimately is going to be time and cost?

0:16:00.8 Jeff Chou: Yeah, so we're pretty far along now. So we have a full product out there. We had to fight the journey of making it easy to integrate into Databricks. Databricks is a very complex ecosystem. But now it's like a pretty, it's a full-blown product. We have companies actually running us in production at scale and so we're actually managing our customers’ infrastructure. And so today, people can come in and try it, but it's out there, it's real. It's not like a vision or some vague thing. It's a real thing for Databricks. And the goal after Databricks we have a new partnership with NVIDIA. We'd love to do the same thing to GPUs. There's nothing really specific to Databricks or Spark to us. We're just compute experts. Like that's what we want to tackle. And it doesn't matter what the compute is. We had a new large customer. The projected savings for example, for one of these large companies was like $600,000. And we've only been deployed on a small percentage of their production jobs. But it shows you the scale of what the problem is. These companies run hundreds or thousands of jobs daily, hourly. Their bills are in the tens of millions of dollars a year. So if we can save 50% of that, it's some serious cash that we're helping companies save essentially.

0:17:11.0 Satyen Sangani: Yeah. So today you do this for Databricks. And so it works across primarily repeated workloads. You're not obviously taking some query that's never been written or some job that's never been run and saying, okay, I can optimize this thing that we've never actually looked at. You're looking at the things that are running repeatedly and optimizing those things to run much more efficiently or optimally.

How Sync optimizes repeat workloads on a closed loop

0:17:35.8 Jeff Chou: Yeah, that's the one like limitation or restriction that we have is our focus is on repeat workloads. The big innovation of Sync is we have this closed loop and we are actually managing and controlling your production environment. So we're not just like a passive recommendation system. In my opinion, that's kind of the old world where it's like, hey, try this instance, give it a shot. And then an engineer has to come in and actually try it. And then they still have to do an A B comparison. Like was that better or not? Clearly that doesn't work at scale. Let's say you have a thousand jobs running daily. What are you going to do with 1,000 recommendations every day? That doesn't work.

What we do is we do have a closed loop we're actually managing. We have a machine learning algorithm, and each job that we're attached to has its own trained ML algorithm. One of the hard lessons we learned is that you can't just have global heuristic rules, like, here are 10 Spark rules that can tune all jobs. We found that fails pretty quickly because code is so different. Data size fluctuates all the time.

0:18:38.0 Jeff Chou: That's actually a huge thing that we're putting in now. But what we learned is that you cannot treat each workload kind of globally. Each one is unique. They could be IO bound, they could be network bound, they could be CPU bound, who knows? And it could change day to day because the data size can change. So that's probably the big thing about us is like each job, if we're managing a thousand jobs, there are a thousand ML models, each individually tuned for each job, custom trained and optimizing each one. That's at the core of what we're doing. And you need a repeat job to do this. So we don't really work with kind of ad hoc exploratory, you're submitting random code. We don't really optimize that because it's not repeatable, it's not predictable. It's also changing all the time. So it's not really quite, it's not really possible to kind of optimize something that's kind of random.

0:19:22.8 Satyen Sangani: Yeah. And unless you're running some like insane query or job that touches in tons of data across tons of different nodes, you're really in a situation where most of the cost is from these repeat workloads anyways.

0:19:38.4 Jeff Chou: Yeah, I think repeat, that's like the cost of goods sold. Any, like online SaaS company, it's the production repeat jobs that's the biggest cost. I think a huge use case that I would love to get into next is ML AI Inference. We've spoken to a lot of companies and inference even, obviously the LLM world's very hot, but like running LLMs in inference is that is where the costs are. Training, from the company I've spoken to, they're like training you do maybe once a quarter, once a month. It's not that big of a deal.

But inference is every time someone queries your system, you're spinning up a bunch of GPUs. It adds up, you're running it a million times a day. Inference is exactly that. It is a repeat workload over and over and over again. And so that's probably, maybe I'm jumping a question ahead, but that's kind of where I would love to see Sync go in the future. That's a huge market, in my opinion.

0:20:29.3 Satyen Sangani: Yeah, maybe let's stay there. There's so many possible applications of this technology or directions to this technology can go. One thing that, I think all CIOs would probably dream of would be this idea that allows you to, on some level, decide at runtime where if you think about sort of all of these mega scalers, all of them, like energy utilities have times when they're running at peak capacity, when they're running at low capacity. And you can imagine a world where it becomes utility-like, and you would need something like Sync in order, but then there's all those, all sorts of other costs, like egress fees in the same way that you have like transport fees and electricity like you... But you can imagine a world in which this is all pretty elastic and also that you're essentially saying, nah, today we should run Databricks and tomorrow we should run maybe not on Azure, but on AWS. And obviously, there are limitations to that. But is that something you've thought about or conceived of? And how far in the world are we from such a vision?

Optimizing cloud compute (and the challenges of vendor lock-in)

0:21:32.2 Jeff Chou: Yeah, that was probably one of my early investor pitches was this really big blue sky vision of like the cloud Router, right. You have a query, you have some workload and we'll pick Databricks, Snowflake, DuckDB, Azure, AWS, Google Cloud, some third-party cloud service. Some of the challenges in a vision like that is the devil's in the details with this stuff like where's your data located? And is it easy to move it out? Or let's say you're just using Databricks. Like Databricks has a bunch of custom functions. So you can't just move Databricks code and lift and shift and dump it in Snowflake, you have to rewrite the code. So there's a lot of, unfortunately, vendor lock in. We're not just all settled on just pure ANSI SQL then that is totally portable and I think probably strategically beneficial to these vendors like Databricks and Snowflake, et cetera.

0:22:25.4 Jeff Chou: They want, these custom functions, they want you to just live and it's hard to move. It makes total business sense for them to lock you in. They don't want to make it universal. Otherwise if I can just click a button and everything moves to Snowflake, that is not great. Or vice versa or going to some open-source platform. But I think ultimately that. And then when you get to enterprise, a whole bunch of security stuff, governance and talking about like US and Europe and all of the different GDPR requirements.

0:22:57.0 Satyen Sangani: You can't move data just like elastically. It's like super hard to get new.

0:23:00.5 Jeff Chou: There's like 10 layers of pain you have to get through to kind of move data, compute, code. That to me is the really the problem. It's not really a technical problem. Like we could do it today, but that's not really the problem. It's the realities of governance and security and all the vendor lock-in, vendor-specific code, that makes it really hard. So it's unfortunate it's not a technical blocker, it's more of a implementation blocker I would say.

0:23:29.8 Satyen Sangani: Yeah. Although all the custom function work is braille. Having to decompile and recompile that all into some other vendor’s languages. People have a hard time doing that from SQL Database to SQL Database, let alone doing it in other languages.

So let's go back to the notion of doing it for inference. Tell us a little about that. So obviously, everybody's using LLMs now. Everybody wants to use them more. Does it work better? In the sense that like you're using now this analog computing in order to be able to really address many of these stochastic computing paradigms. Like, do you think the impact will be greater in the domain of inference?

0:24:12.8 Jeff Chou: I should probably clarify. We don't really use the like analog compute stuff anymore. That was kind of like more of an inspiration innovation of the original idea. Our algorithm today is much more, I would say traditional. So we're not running like analog circuits behind the scenes. But perhaps the general idea of optimization is the kind of the thing that carry forward.

But in the inference space, I think tuning ML models is not a new thing, there's a whole field of hyperparameter tuners, there's a whole bunch of open-source packages that tune your ML model. And our big kind of difference in what are we bringing that's new to the world. And what we did with Databricks and our product gradient is we're doing it live, in production. Because normally what you do, like the old way to do this is offline. You train or you search for the best instances, or you tune your model offline, and you spend, I don't know, a week tuning that.

0:25:14.4 Jeff Chou: And then you find the good parameters and then you put that in production. And then when it's in production, everything is static. Nothing changes. It is done. And that thing is run a million times a day. And then maybe once a month you take it offline and you tune and make sure it's all good and you bring it back. What we're doing, and what's a bit provocative, I would say, is what we're doing is don't go back and forth, just do it all in production because you're running it millions of times a day anyways. And that repeat run is just fertile ground to train and to learn and do stuff that you want to do. So you can do hyperparameter tuning, for example, live.

0:25:55.6 Jeff Chou: That's what we do basically now with Databricks and their jobs is, as you're running hourly, we are using those runs, playing with parameters, tweaking and learning and exploring the space. Now when you're just running your production jobs in our UI, you can just see your costs go down, Monday, Tuesday, Wednesday, Thursday, Friday, and it just kind of goes down. And there we totally circumvent this old step of like take it offline, train it, figure out the best instances and then bring it back in. That's kind of what we would try to do in the inference space is, why don't you just do it live while you're inferencing millions of times a day? All the hyperparameter tuning that you're trying to do, but you throw in other quantitative metrics that you want to hit. Like there's like latency, accuracy, costs as well.

Managing compute SLAs with Sync

0:26:45.5 Satyen Sangani: Yeah, we just had a guest who's CEO of HumanLoop and they're a startup out of the UK and they're essentially helping companies manage the process of generating context, managing context, and also fine-tuning, into the world of starting to think about how to tune these models. Not quite fine-tuning them, but the point of course there, and the critical idea was that the context is, on some level, both data but also code at the same time. And you're always observing sort of what's coming into the model in order to be able to tune a better response. And so on some level, doing the compute optimization in production makes total sense because the context itself is stochastic. Like you don't know what's going to come into the model, you don't know what's going to come out of the model.

0:27:38.5 Satyen Sangani: And you don't know, like, what latency you want to achieve for, given the use case. And in some cases 10 seconds might be okay and others, in most cases it probably isn't. And so how do you think about like this constant optimization? Because on some level these apps are like requiring you as the person to run them, to constantly learn, and to constantly get better. Like, you're not just like putting a chatbot out there and be like, oh, it's in production now. I can just like let people use it. It's, you've got to constantly tune it and make it better.

0:28:06.1 Jeff Chou: Yeah, exactly. Like things are constantly changing. I think even these ETL pipelines that we're running, that we're managing, there's kind of a misconception, I think, that they're all very static. Every day they just, they go through these hundred gigs of data and they do everything the same. What we see in reality is that data fluctuates massively. And because of that, doing offline kind of optimization is totally invalid because, okay, you picked 100 gigs, great, you trained and then you put in production. Tomorrow's run it was a terabyte. Well, great. What you just spent a week trying to tune is totally out the window.

And so I think you're right in the inference land and kind of LLM chatbot land. Production is like a wild animal. You don't know what's going to happen. And so when you're trying to tune offline, you're not responsive, you're Like a month behind or you pick a random data point to tune for. And so I think what our product's trying to do is to have a robust optimization. And what we call it is more of actually a management system. So yes, there is optimization, but we also help manage it.

0:29:15.5 Jeff Chou: So for example, as data sizes are fluctuating up and down, you can still set a constant SLA. So even at when it's a really big data size, we can have a huge cluster. When it's a small data size, we can make it really small. But your SLA, if you look at it, is always one hour. In the old model, your runtime is kind of going up and down, up and down, up and down. But now you can dynamically tune. So now it's always one hour. So it's not just cost savings, but also management. This is that whole declarative computing concept. What outputs do you want to control?

0:29:43.8 Satyen Sangani: You've mentioned this data variance problem a few times and you kind of alluded to the idea that you're doing more work there. Tell us about that.

0:29:51.6 Jeff Chou: Yeah, this is something we learned in the wild. Well, we always kind of suspected it's a problem, but it wasn't until we got a bunch of customer data that we really saw it. But basically, like I said, the data size is constantly changing. Sometimes it's cyclical, like over a 24-hour cycle, sometimes it's random. Some of our customers have it such that they don't even know what the data size is because it's coming from their customers. And so that kind of randomness massively changes how your job runs.

If it's really small data size, it can all just fit in memory, and you probably don't need any workers because you just need one node to run it. But then the next time it runs, it could require 10 workers because you have a terabyte. And so what we had to do in our ML model was actually incorporate data size into our model such that now when we're managing these pipelines, we also are tracking the data size. So now we know, oh, okay, for example, just keep it simple. It's like, oh, it's 50 gigs today. That means you need this cluster. And then the next time it runs, oh, it's 800 gigs, that's this cluster size.

0:30:56.1 Jeff Chou: And we can predict, and we now know how your job scales relative to the data size. So now we can predict, oh, it's 800 gigs. We can project what cluster you're going to need for that. And so we found out this was actually really, really critical piece of kind of the core technology to help it run reliably at scale. Because we're managing like hundreds and thousands of jobs. And so you have to be able to support variable data sizes. It's pretty fundamental.

The need for high reliability in production

0:31:23.6 Satyen Sangani: Yeah, sounds super exciting. And have you figured out what the performance gains are as you incorporate data size into the algorithm? And I would imagine, there's also like a lot of cost in even just assessing the data size. Because if you're just having to do a scan of like, how do you know? Do you do have to profile it in advance and then use it or?

0:31:44.2 Jeff Chou: There are a couple of ways. The easiest way in Spark, it's actually included in the Spark event log. So you actually can see the data size, I think on our roadmap down the road, we’d love to kind of what you mentioned, like scan the data beforehand. Like you tell us where is the data and then before you even run the job, we can go and read it and see what the data size is.

But in terms of the impact, obviously cost savings is really very good. But I think actually stability in the algorithm is really where it also helps. Because if you don't incorporate data size into your model, again, as your data is jumping up and down, up and down, up and down, if you're not aware of that or you don't respond to that, your model doesn't incorporate it, then you could have some very bad results in terms of the cluster you're recommending.

And so our customers demand very high reliability. This is another early lesson we learned as we were kind of doing PoCs with design partners and trying to figure out what do people like? This is actually a good separate topic perhaps, but when you're building an ML algorithm, accuracy is really critical.

0:32:48.5 Jeff Chou: And what we found out early on in the beginning, we're like, well, maybe people are okay, we're like 50-50. Like half the time, we save you money half the time, maybe stay the same. And we quickly found out people did not like that. When they're buying a product that saves money or hits SLAs, they want it to work 99% of the time. And so we were like, well, maybe 50 is okay. No, people don't like that. 75? No. And we found out, okay, this thing has to be like rock solid. Because when you do a PoC, for example, you might do like five jobs. They'll kick the tires with like five workloads and if one of those five doesn't work right, they're like, nah, see you later. And we're like, okay, this is really hard. And so what we found out is like, to really to close the sales, like, we need five out of five to be awesome.

0:33:45.9 Satyen Sangani: But why is that the case? That seems a little odd if I understand the problem shape right. Because you're not changing the fundamentals of the underlying compute. And therefore, if you don't improve cost or reliability relative to the baseline for any given job. But if I do get significant improvement across other jobs, then, like, why would I care? Like, I'm still getting improvement every. It's like five projects and three of them make me a lot of money and two of them lose me money, but it's not very much. Like, I don't know, I'd rather just like run all five and like, make the bed. It seems still like it would have value. What is the psychology around why people need improvement around everything?

0:34:28.2 Jeff Chou: Yeah, I think what you thought was what I thought. I was like you sum over five, and if you're net positive, you should be okay. Like, what's the big deal?

I think the psychology that we found is that we cut really deep. We're kind of a provocative company in the sense that we're bucking the trend. Production is like a holy. It's like this shrine. No one touches production. It's in production. That thing is static. And so what we found out was like, trust is a really big problem. And so people's mindset of production is that production is always operational. It's on 99.99% of the time. And so when they try out a product that's touching production and the value prop is like 50-50, 70-30. I think it just like the psychology of the fear of like, ooh. Because they can get fired if production goes down or production costs too much. Like, people's jobs are on the line. And so I think the psychology of production has to be 99%. Any tool that touches production should also be 99%. It's kind of maybe the best way I would describe it.

0:35:40.3 Jeff Chou: It's like this mentality of ultimate performance for production because it's so sacred, is maybe the best way I could describe it, but I found it's pretty universal. I had the same thesis. I was like, as long as you average in the net is savings, you should be all right. But pretty much universally, all these heads of infra, CTOs, VP of Engineers, VP of engineering profiles, they had very high or very low tolerance for not delivering. And so that was a hard lesson learned as we're kind of on our journey here.

0:36:16.1 Satyen Sangani: Yeah, it's an interesting observation. One more question there and I want to go back to the sort of inference world. You mentioned that you have sort of machine learning models that are just deployed for every workload. I presume you're not. There's no person who's turning the dials on the weights for every model. So you sort of have like a model that basically optimizes all of the models, is that essentially right?

0:36:40.5 Jeff Chou: Yeah, we have like a base model and then each one is custom trained on kind of the as it's monitoring the pipeline every day. And so that's what I mean by like each one is different. It's the same base model, but the coefficients are tuned for each one.

0:36:55.1 Satyen Sangani: And how long does it take to do the train?

0:36:57.5 Jeff Chou: It takes, I would say about five to 10 iterations. So if it's an hourly job, by the end of the day you'll start seeing it starting to optimize essentially.

About the Sync-NVIDIA partnership

0:37:07.4 Satyen Sangani: So let's switch gears to inference. So in some sense, the mindset that you're talking about changes deeply. It has to fundamentally change deeply in the world of moving towards inference. And you mentioned this partnership with NVIDIA. So tell us what you're doing there, tell us when you expect to release product and like where are you in that process?

0:37:28.4 Jeff Chou: So we're part of this NVIDIA Inceptions program, which is kind of this, they work with startups, but we're working specifically because we're now in like the Spark Databricks world. We're working pretty closely with the NVIDIA Rapids team. And so that's their like Spark on GPUs solution. And NVIDIA is really great because they just want the world to use more GPUs. And so what's really interesting about working with NVIDIA versus like our conversations with Databricks for example, is that NVIDIA doesn't have a cloud service. NVIDIA just wants to sell more hardware. And so even their NVIDIA Rapids thing is open source, it's totally free. And so actually NVIDIA is like very well incentively aligned with what we're trying to do. Like they don't care if the cloud costs are cheaper for GPUs, they just want more usage. So they've been actually really great partners. But the overall problem that we're trying to solve exists there as well. Whether you're running Spark on GPUs or PyTorch, et cetera. There's just infinite knobs in GPU and competing on GPUs is much harder than kind of CPUs. And so NVIDIA has, they actually have their own kind of auto-tuners to try to figure this out.

0:38:44.3 Jeff Chou: But it's not what we do. It's not this closed-loop thing. And so this closed-loop innovation that we have is kind of the new sauce that we're kind of bringing to this market. And so that's the genesis of the idea right now is like there's just, I would say way more knobs and things to tune in GPU land than there is in CPU land. And we're bringing kind of our innovation which is like, we'll, plug it in and close loop in production, and we'll just, on the fly, tune all these things in production. That's kind of the basic idea.

The future of compute: Specialization

0:39:14.4 Satyen Sangani: Super cool. Super cool. What are you most excited about in computing right now? Sort of you've obviously seen some very fundamental stuff, having come from the world of hardware, what do you think is going to be most revolutionary? Where do you feel most of the gains are going to come from over the next five to 10 years?

0:39:32.4 Jeff Chou: Great question. I think on the hardware side, we're at a very interesting point in compute history. Like Moore's law has been kind of over for a while. And I think the easy knobs that we have to accelerate computing 10,000 million X faster, cheaper, smaller, those are all already gone or exhausted or soon to be. And now we're at an interesting point where it's like, well, what are we going to do now, the next 10, 20 years? Like, I mean you kind of see with intel, like what are we going to do? Like the intel clock speed hasn't changed in like 20 years. I think my like desktop intel chip had the same clock speed as like my latest MacBook. Like that's all kind of plateaued.

So what I'm really interested, I think where the next world is, where the world is going is like specialization. And obviously NVIDIA benefited from that massively, like GPUs, et cetera. So I see a lot of really exciting innovation in terms of like for example, if LLMs continue to dominate, what does a chip that only does LLMs look like? I think that's where the next frontier in terms of like big hardware breakthroughs, it's specialization.

0:40:45.8 Jeff Chou: Intel chips were super general. You can run Windows, you can play Doom, you can chat with people and then GPUs do one special thing and then as you get more and more specialized, you can then reap those like 10,000X gains. And I think it's all about the economics, like where is the business? And so that's why I think LLMs are really exciting because if it continues, this momentum continues, then you're definitely... There are probably already like 10 startups around the world that are trying to do exactly this billiardship for LLMs. LLM inference specifically. Actually there are a bunch. And so that's probably where the world's going to go. You can even see this with startups like Cerebras with their wafer scale chips, which are pretty exciting.

In the software world, I think what's really cool, obviously as a startup founder I have a new appreciation for product. For me it's not just about technology, but how do you actually make something usable by the world. And I think what OpenAI has done that's so cool is they took an army of PhDs to kind of create this crazy model, but then they made it so easy to use.

0:41:52.5 Jeff Chou: It's literally just a bar and you just talk to it. And so that, from a product minded person, is so cool because it's so accessible. And I love stuff like that where it makes me cry a little bit in the sense that all of the cool and hard PhD level innovation is like hidden and most people have no idea what's going on. But you can't help but appreciate the accessibility of it. Like anyone, my grandma can use it, I can use it, my kids can use it. That's really hard to do. To bring something super hard that's technical, make it so easy that literally any human on the planet can use it. I think that's really cool and very exciting. And that might tie into my earlier thing about what's the business that's going to drive the next hardware innovation. And I think this is the right scale. You need something at the global scale that's going to drive because hardware is super expensive. And so these two things kind of coming together I think, could be really exciting.

0:42:51.0 Satyen Sangani: Yeah, well, perfect inspiration to take us out. Thanks Jeff. It's been fun.

0:42:56.6 Jeff Chou: Cool. Thanks Satyen. It was a lot of fun.

0:43:00.4 Satyen Sangani: What a conversation. Jeff Chou gave us a masterclass on the inefficiencies hiding in cloud infrastructure and how Sync Computing is fixing them. He took us through the intriguing world of analog computing, where nature itself solves problems just like water finding the lowest point. He then connected that idea to Sync's mission, automating the selection of the best hardware for any software workload, eliminating waste, and making AI compute more cost-efficient. With AI driving unprecedented demand for cloud computing, optimizing compute resources is no longer optional – it's essential. Jeff's vision of automated intelligent resource allocation could reshape the future of cloud infrastructure, making it faster, cheaper, and smarter. I'm Satyen Sangani, CEO of Alation. Thanks for tuning in to Data Radicals. Keep questioning, keep optimizing, and keep sharing. Until next time.

0:43:55.5 Producer: This podcast is brought to you by Alation. Your boss may be AI ready, but is your data? Learn how to prepare your data for a range of AI use cases. This white paper will show you how to build an AI success strategy and avoid common pitfalls. Visit alation.com/AI-Ready, that's alation.com/AI-Ready.

Other episodes you might like

Season 3 Episode 8

LLMs Decoded: A Starter's Guide to AI

Curious about large language models (LLMs) but don’t know where to start? Join Raza Habib, CEO of Humanloop, as he breaks down key concepts like fine-tuning and data leakage, shares tips for learning AI, and explains how LLMs are transforming industries. A must-listen for AI newcomers!

Watch now

Season 2 Episode 11

Asking the Right Questions

Generative AI is so new — and there are so many ways to leverage it and misuse it — that it can feel like you’ll need a separate AI to figure it all out. Fortunately, Frank Farrall, who leads data and AI alliances at Deloitte, is here to tell you about the decisions, variables, and risks that companies need to consider before they invest in AI.

Watch now

Season 1 Episode 15

The Data on the Chief of Data

The “D” in “CDO” stands for “data.” But it could also stand for the dexterity needed to get boots-on-the-ground buy-in across the organization. NewVantage Partners founder and CEO Randy Bean shares his insights on how to set up the modern CDO for success.

Watch now