Oct 11, 202230 min
Share via:

S01 E05: Streaming Data and the Modern Real-Time Data Stack, Lightspeed Ventures

With the modern data stack evolving constantly, the next thing to look forward to is a real-time data stack, where companies are not just producing data in real-time but also consuming it on a real-time basis. In this latest episode of the Modern Data Show, we discuss the same with our guest Nnamdi Iregbulem, who has invested in a lot of modern real-time data stack tools.

Available On:
spotify
google podcast
youtube
Amazon Music
apple podcast

About the guest

Nnamdi Iregbulem
Partner at Lightspeed Ventures

Nnamdi is a coder, an economist and venture investor currently working as a Partner at Lightspeed Venture Partners, where he has worked with companies like Redpanda, Materialize, Matillion and Voltron Data. Before Lightspeed, Nnamdi was also an investor in Iconiq Capital a Product Manager at Confluent. Amongst the many deals that Nnamdi has sourced or invested in, some of the prominent names are Gitlab, Epic Games, Alteryx, Uber, Survey Monkey, Snorkel, and Fastly.

In this episode

  • Nnamdi’s journey from being a coder to investor.
  • Hypothesis for investing in data tools.
  • The real-time modern data stack.
  • Has ETL space matured ?
  • Advice to founders of data companies.

Transcript

00:00:00
Hello everyone and welcome back to another episode of the modern data show. For today's episode, we have Nnamdi Iregbulem joining us from San Francisco, California, and he's a Coder, an economist and a venture investor currently working as a partner at Lightspeed Venture Partners, where he has worked with companies like Redpanda, Materialized, Matillion and Voltron data. Before Lightspeed Nnamdi was also an investor at ICONIQ Capital, before joining actually as a product manager at Confluent, amongst the many deals that Nnamdi has sourced or invested in some of the most prominent names at GitLab, Epic games, Altryex, Uber, Survey monkey, Snorkel and Fastly. Nnamdi was also selected for Forbes 30 under 30 venture capital and venture capital journal 40 under 40. Welcome to the show Nnamdi it's gonna be fun.
00:00:47
Good. Thanks. Thanks for the invitation. Thanks for pronouncing my name correctly too. You didn't even asked me how to pronounce it. So I'm already impressed.
00:00:54
Yeah, I did my homework. I did my homework. So thank you so much. Nnamdi so let's first start with your journey from being a coder to now a VC, how's it going?
00:01:01
Good. Good. The things are quite related and intersecting. I see a lot of benefits from having, both skill sets. I've always been a huge technical nerd, was sort self-taught programming growing up. PHP websites back in the day try and monetize them using Google AdSense and different things like that. I get these checks in the mail from Google every month. The minimum payout was a hundred dollar. I usually just barely make the minimum. And I get, my parents would wonder why Google is sending me money in the mail. Wish I tried not to explain too much. But it was pretty clear to me that my career was gonna involve technology in some capacity. In a lot of way I feel like venture is the best way to for me to participate in this broader kinda Silicon valley ecosystem. And then again those skills come in handy.
00:01:56
That's super interesting because I see a lot of, you know, a lot of VCs are ex engineers and once you are in the game, that's how you really understand, stuff that's going around. So that's super interesting. Tell us a little bit more about your time at Confluent. , you were a product manager at Confluent.
00:02:15
Yeah. Confluent is just an amazing company. So an amazingly technical company. So one thing to understand about my background is I never ever worked as an engineer formally. It was always self-taught. And when you're self-taught it is easy to think that, you know what you're talking about because you've never been in a formal environment for anybody to tell you that you didn't know what you were talking about. And so the experience at Confluent was great specifically because Confluent a technical company, catching Kafka is such a technical piece of technology. For me, it was like a great experience to you basically verify that I knew what I was talking about and verify that there were some benefit to this combination of being both technical and business minded. Spending a lot of time in the valley, you can presume that everybody has that, but is actually not the case. There are a lot of folks who. Let's say engineers who are just focused on that. And frankly, don't want to have to think about business concerns. There's a lot of folks who are totally focused on the commercial side of things and don't have a of technical capability despite working in tech. So being someone who had like both sides and being able to kind of liaison between those different groups was actually quite valuable and so I cherish the experience both for all the things I learned there and the sort. It was a confidence booster almost like, yeah I have a skill set that isn't just like what everybody else has, you know, so
00:03:51
nice, amazing. And, you know, I, there, there is, There is a bunch of questions that I have around real time streaming and real time infrastructure for you. But before I get into that, let me ask you the first basic question that probably I would ask to any, you know, VC working in data. What's your thesis on investing in data companies. I see you have invested in lot and a lot, right? So what's your thesis around investment. When it comes to specifically around data companies.
00:04:18
There's all sorts of ways. You can like tell this story, but you know, I'll just give maybe one version of it. Typically within enterprise technology, the companies that have the largest market opportunities companies that can achieve the largest scale are the ones who sort of aggregate a enterprise relevant, modality of some sort. And, do that at scale, do that in a production context. And that drive all the different fundamental units of infrastructure consumption, particularly compute storage and networking. And one of the great things about data infrastructure is that it tends to do all three compute, storage and networking. You think about a snowflake. Obviously the compute, if you're doing transformation of that data within snowflake, the storage goes without saying and in the network in terms of data going in and out of snowflake you know, drives utilization. And so when you put dollar signs and all the different things and then multiply by the relevant quantities. And then add them all up. You get to large market sizes very quickly. And so as an investor, it's like one of the most perennially interesting and profitable categories to be investing in versus other parts of the enterprise tooling or infrastructure where you're only getting one of these things perhaps and maybe two but to get kinda full Triver it your data infrastructure is like one of the few. So that's like a very kinda like high level way of thinking about it. I think within data infrastructure, the companies that are even more interesting than the rest of the field are the ones that achieve this data gravity, just up the most data and holding it within that system, Splunk did a great job of this. Snowflake is doing a great job of this today. And then it's just super interesting. The technical problems that need to be solved around data infrastructure are some of the hardest within this entire ecosystem. It's like they, some people will say it takes 10 years to build a database, reason it takes 10 years, not cause V1 takes 10 years. Cause , V whatever enterprises actually want takes 10 years. These things have to be hardened. These things have to be tested. You have to be fault tolerant. They have to be able to handle your large scale production grade, transactional workloads. What have you. So it's super complicated. And so if you're able, if you have a technical expertise and you're able to build, you almost have an implicit moat cause of all that sunk R and D dollars that go into it. And then to my earlier point, you have a, basically a guaranteed market because there is enterprise spend happening for itself. It's not one of these things where. You're not sure if they'll come, if you build it, you're much more sure though that they'll be interested if you build a procurement high quality data infrastructure technology. So those are like some of the reasons why it's so interesting to me and I think most investors, but. Yeah.
00:07:26
Yeah. And another thing that, was coming across very strongly across. If you look at your investment portfolio is this good focus on this modern real time data stack, right? You have got Materialized, you have got Vectorized which is, which is basically Redpanda. What, What was so interesting about these companies and I think so you universed very heavily into these companies. Why? So,
00:07:51
Yeah, it was a little bit related to my, experience at Confluent, but, the this shift in infrastructure systems that is ongoing and it's not even remotely close to being done, but it's a shift from more like batch oriented systems to, what is referred to as real time. And, what does that mean? Traditionally data infrastructure systems operated in a batch oriented way, which meant that you operated on large sets of data periodically. So once a day, or once in a, whatever it was, you would ingest a bunch of new data. You would do some massive transformation in that data. You would extract a bunch of data and it was this like clockwork thing that would happen on a daily schedule or whatever it was, and, that worked fine for a long time. But what it meant was that in the interim between the big changeover data was effectively stale because you only had data that was as correct as resale change. And so if you're, if it's 12 o'clock on a Tuesday and the last thing was at midnight, then the data by definition is, you know, 12 hours old at least. And so that was a status quo that worked well for a long time, but over time, more and more businesses, partly driven by consumer demand, partly driven by technical innovations that have happened, have wanted to shift towards what is now referred to as real time. So data that's being updated, ingested, transformed. What have you on an ongoing basis throughout the day, throughout the hour or throughout the minute. And, it turns out that it's not just a matter of like turning some knobs and dials on your old systems to get them to work in real time. A lot of these things need to be rewritten basically from scratch in order to be performing enough for this real time setting. And so that creates a ton of opportunity to not just like recreate things that already existed, but with a real time capability to it but then also new systems that just couldn't have existed in any shape or form in the prior kinda batch setting. And so confidently the groundwork for a lot of this but there's a emerging set of other companies that are drafting off of that and creating their own opportunities. Your Redpanda is one of them, one of our investments. It's a real time streaming engine. You know, Rewritten in C ++ which is a lot performance than prior systems, it has a lot better developer experience. It's a lot easier to use the, that opens up to streaming ecosystem to a much broader group of developers, materializes an analytics database that specifically for your realtime data. Again, this is an area that tends to change when you wanna do it in a real time setting and you need to kinda rethink the whole thing. We're also investors in some other companies that some of which are installed and will be announced hopefully coming months but yeah heavily invested in real time. I'm personally very and again, there's these technical moats that you get if you do it right. And so all the people we work with in this realm are like I mean, just like the most technically brilliant people you'll ever meet. And so it's definitely a lot of fun for me.
00:11:14
And so if you talk about, specifically more about real time, you have got real time ETL stuff like Kafka, not exactly an ETL, ways to move data from one place to other, you have Kafka, you have got Redpanda, you've got Pulsar and then from a stream processing perspective, you've got stuff like, Flink and Samza and even a couple of newer, newer ones that are coming around the whole idea of stream process and complex event processing. And then you finally have real time analytics, databases, Materialize our KSQL DB, , again from Confluent, know, ClickHouse, having some of those materialized view functionalities now and then I saw in one of your articles, you also mentioned, the stuff about realtime machine learning, the, the stuff like Tecton. Do you think enterprises are ready to be able to consume the data in real time? We have, we have got a lot of progress in terms of building the infrastructure to be able to produce and store and process this realtime data, real time analytics and real time data's insights, but. Do you think, you think the enterprises are, from a broader sense, there would be few, kind of leaders who would be able to consume that, but from a general market perspective, do you think the enterprises are ready for real time?
00:12:29
Yeah, it's a good question. I think on the one hand, like I'm quite excited about this shift in real time and I definitely think it's happening and there, the certain pockets where very clearly happening, like in a lot of, consumer-centric companies where you have high leads, that transactional data, you have a lot of data coming in. There's a desire on the consumer end to have these updates or be, have their kind, these applications reacted in real time what's going on. And that improve the user experience different ways, depending on the setting, Netflix is a classic example of this where they've done a lot real time, Uber has a lot done, Lyft has done a lot of real things in real time to improve the user experience. So I think the case in more, consumer-centric companies has already been made. Yes, they are ready and they want this stuff. On the more B2B side, I think the, the market is definitely still evolving or emerging, for various reasons like B2B use cases aren't as real time as consumer use cases tend to be, but they are becoming more real time. So for example, in certain areas like financial technology there's almost every, trading technology, tech firm, which every, all these like crypto exchanges and whatnot, they all use streaming. All of them. They all use real time infrastructure because they want these real time updates in terms of what's happening in their market to be broadcast to their respective communities. And so that's one area that's been going quite rapidly, but I think the rest of the enterprise is coming along as well. Part of the reason that it has been tougher to date is that the tooling, I think, has lagged the interest in this kind of capabilities. And frankly, the skillset is still, there's very few people who really get this stuff. And if you're like a typical organization who doesn't necessarily have access to this skillset, it can be quite daunting to say oh, we're gonna spend up a real time, ML system or something. It's you can barely find. Just generic data scientists, let alone someone who can do this stuff in like an online, real time in context. So, there's still a talent problem and there's still a tooling problem, but the desire to do it, I think it's there. I think people wanna do this stuff. If they could only be enabled.
00:14:51
Another thing that we are seeing a lot is a lot of investments in the modern data stack has come in, in building the core infrastructure of, the producing the data, storing it into a data warehouse, putting it into a place that, where it can be consumed, but the mode of consumption of this data hasn't changed in the past 20 years. It was always dashboards, still dashboards. And what's you, what do you think what's next in when it come, when it comes to consuming this data consuming this huge amount of data that is being generated by the organizations?
00:15:26
Yeah. So I think, I think to your point dashboards are still a thing and aren't going away anytime soon. But if I had to highlight another kind of modality for accessing this data that I think is growing very quickly, I would describe it as ad hoc code based, querying and analytics. So particularly SQL and Python. If you think about the number of people who are familiar with those languages is only growing over time. And, and it's growing, quite rapidly, especially in the case of Python, but also in the case of SQL, there are more and more people who can plausibly just write a SQL query into your data warehouse, or write up a Python, some kinda, short Python script to do some data analysis and that's the end of the analysis. They're not, it's not getting sent to some dashboard. They actually just want the immediate result and that they have what they need. I think there's a lot of that happening. There's a lot more of it going to be happening going forward. And I think the barriers to learning those specific languages are low enough that you could actually see a large number of people kinda doing that more kind of technically savvy business users within your organization not just the data engineers not just the data scientists what have you. So I think that's emerging for sure. And what exactly do the kind of like products of that space will look like? I think the story is still being told, but I think that would be one thing I'd be able to look out for.
00:16:57
Yeah, I think so. We are seeing one of one of those early signals of that when it comes, if you look at the popularity of tools like Hex know
00:17:05
yes.
00:17:06
Allowing people to build those data applications in a very simple way. I think so you're right in that. The next thing that which is something that we ask very often is what's happening in the modern data space is there are lot of being lot of solutions that are being pushed to a lot of problems. On moderndatastack.xyz, we've got around like 30 different categories of products that are out there. Do you think that's happening? apart from the few obvious ones dbt or, maybe Kafka and, stuff like Fivetran. Do you think we are in a situation where we have a lot of vendors pushing out a lot of solutions that are overlapping positioning themselves as category creators around those categories. Do you think that's happening or do you think- no, are the real tools and technologies that are needed by these companies.
00:17:59
Yeah. It's a good question. I tend to be a little bit contrarian in the sense that. Once a term exists, it's almost certainly overblown .Use the modern data stack as an example like, you know better than I do what that term probably means at this point, since you, you named the whole thing after it. But I think it's still up in the air. What exactly that means to me, it means like a cloud dealer warehouse of some sort, and then things that plug into. But that's, you know, that's up for debate, I guess. So your actual like question look, I think, I love nerding out about all these different technologies and I think they're all really interesting, in terms of their business opportunity. I think it's tougher. Mainly because I think the poster child of modern data stack has been either snowflake or dbt and. Just most companies don't have, aren't going to be able to find that synergy between the data warehouse and some other technology as tightly, as dbt has managed to it's just there's something special about ETL and that it attaches to your data warehouse or they're very high rates. Like If you have had warehouse, you basically need ETL. Like it's just. That's just a thing. But if you are a company that's trying to do ML out of the data warehouse, I don't know, to what extent people like the fact that someone has a data warehouse means that they want to use ML is pointing at that data warehouse, a source of truth directly. I think it's up for debate. You can go through the list of all these different things where. ETL or operational analytics or feature stores or, whatever and ask a question like really the data warehouse need to be the kinda more connected tissue for this thing. So that's one. And then two is there's a competitive point amongst these different vendors and like, how do you differentiate? I think it's hard, you know, the typical data analyst or data engineering organization is just being inundated. With oh, use our X for Y or use our R Z for a or B for D. And it's just it's such a mess at this point. And most organizations can barely wrangle all the tools they're already using. It's like a very common story to your company, you wanna spin up a data stack, modern data, you buy, four or five of these tools. And six months later, you're like, You know what and you need help. And so, I think it becomes tricky over time. Again the nerd me doesn't the nerd me just loves more tools. More technology is better. The investor in me starts to kinda scratch his head after a bit but yeah, I'll leave at that.
00:20:44
Yeah. You talked about ETL and You've also invested in this company called Matillion and Matillion has been there for a while. And a lot of players that we're seeing emerging in the ETL space we've got the recent one being like Airbyte, you've got Meltano, what's your take on the overall ETL market space. Do you think it's a problem that's solved. That's already solved. There is already too much mature. And is it like the new CRM? You've got so many tools, you've got so many technologies or do you still think that there are problems that needs to be solved in the ETL space?
00:21:17
it's a good question to your point like ETL as a technology has existed for some time even before all these cloud data warehouses I think those systems had to evolve for the cloud. I think that was a big leap for ETL, but let's say to be fast forward, and cloud is a dominant paradigm are there more leaps that have to be made? I'm not sure. I tend to think that for most use cases for the kind of median user the existing tooling is basically good enough. But there's always these things around the edges, the edge cases where these tools start to fall over a little bit. So, there's sort, there's definitely more innovation to be had but I think the bulk of the distribution of use cases is being reasonably well addressed today.
00:22:10
Another thing that comes to my mind is what do you think the role open source play in the modern data stack? You must have come across a lot of open source companies pitching to you guys on the modern data stack. What's your take on that? Do you think, apart from the obvious benefits, the obvious business benefits of using an open source system, open source technology. If someone were to come and pitch an open source solution to an existing solution, let's say in the modern data stack, what thought comes to your mind?
00:22:40
The thought that for me comes to mind is like why? And not because I don't like open source, I love open source. But. It's become going back to my point that once the term exists, it tends to be overused or it tend to be overblown. Like its become this like default thing where people say oh, we're building an open source whatever. And they haven't really thought through like why it needs to be open sourced and what trade offs they're implicitly making by doing so it's funny because some of the core you know, most important companies in the modern data stack let's just pick like two like Snowflake and then like let's say like Fivetran, you know, those are not open sourced. Not really. And yet, right. And so I think it's I question ask historically, like in the past open source was , oh, we're having this like collaborative development process and people are committing to our repo and it's going to be great for the community and everything and less a thing. And then it's becoming more and more just sort of a, you know, unless we're open source people won't use us. And he's becoming increasingly the reason actually, why people are going open source. It's more of a way to open doors than it is a way to push code, and then actually develop the technology. And I just, for me, it's like either answer is fine. Like maybe you are focused on this more community thing. Maybe you're more focused on the go to market implications either can make sense. I just ask founders just know why. And a lot of people surprisingly dunno why they're open source. So it's more important that they have an answer than what crypto particular answer
00:24:17
That's a great advice. That's a great advice. And, what would be that one company in the whole modern data stack, apart from the obvious ones, let's say, let's ignore snowflakes and dbt for a while. What's that one company you wish you would have invested in.
00:24:32
This, the problem is this category is so hot. You just wish to invest in all of them a little bit. It's hard. It's hard to pick just, just one. you know, I'd be, I'd be lying. If I said there was one that was, so much more and I, and I have personally been lucky enough to be involved in some of these companies. Invested. So I'm sort of, yeah, yeah. Hard to pick one.
00:24:53
Okay yeah, no worries. Before, before we wrap up there's just one last question we would want to leave leave you with is. What would be that one advice that you would give to all the founders who are building for the data stack,
00:25:08
you know,
00:25:09
for who are building data companies, what would be that one advice for these founders from your side? The one piece of advise you know, obviously this is a fast growing, developing category. It's very exciting to be in from a go to market standpoint. Merely saying your the X or Y or Z for the modern data stack is not enough. This has become like the default pitch for tools in this space where a, B, C, X, Y, Z for the modern data stack, they think that's gonna build a business. It's not, that's just not enough. There's so many other tools out there at this point. There's so many folks in people will often be like, oh we're gonna hang out in the dbt slack channel and build our community via that. I'm like you and every other person it's like the town square at this point. So what I encourage like founders building this space is. Obviously get the tech rights that goes without saying, but then think real long and hard about what your good market strategy is going be and how you're gonna break out from what is quite noisy space. That's what I would. Yeah, that's a great one. That's a great one. That's a great one. And just last thing you have. So tell us a little bit about Lightspeed, at what stage do you guys invest? Tell us a little bit about what, what should a founder do if they want to come and, pitch to you guys?
00:26:34
We are a large tenured venture firm we've been investing for 20 plus years at this point, we're investors across all stages early leads across all categories, enterprise, consumer, FinTech, crypto, healthcare, we've backed some of the most impressive, Silicon valley stories out there. We've invested in some of the most interesting global companies out there. I specifically focus on new enterprise technology and it's a place where we've, been quite active over the years. It's one of the places that we're most zoom for in fact, , reach out, do something interesting on, build an interesting open source project. We're quite technical as a team. We love nerding out about this stuff build something interesting and we'll always be willing to chat as with our VC hat on. Or engineer hat
00:27:26
Do you always do warm intros?
00:27:29
I think we, we responded cold emails too, depends on the quality of the cold email, but high variance but yes.
00:27:36
Okay. Amazing. Perfect. So thank you so much for your time Nambdi. Such an amazing conversion that we had and I'm sure the audience would enjoy it as much as we did. so thank you again for your time. Awesome. Thank you so much. It was fun.