Nov 01, 202231 min
Share via:

S01 E08 On software engineering approach towards data observability with Shane Murray, Field CTO at Monte Carlo Data

For early-stage startups, sometimes bringing in full-fledged data observability can be overkill. Even if an established organisation starts monitoring their data quality, it's often hard to judge if it is a tech problem or a people problem. In the latest episode of the Modern Data Show, Shane Murray, who went on from being a customer of Monte Carlo to later joining them as their field CTO, helps us understand these problems and how the Monte Carlo tool, using software engineering principles, is addressing the issue of data downtime.

Available On:
google podcast
Amazon Music
apple podcast

About the guest

Shane Murray
Field CTO

Shane Murray is the field CTO of at Monte Carlo Data, a data observability platform that connects your existing data stack to monitor for freshness, distribution, volume, and schema changes continuously and immediately notify stakeholders once incidents are detected.Before Monte Carlo, Shane was SVP of Data and Insights at the New York Times leading 150-plus employees across data science, analytics, governance, and data platforms. Under his leadership, Shane expanded the team into areas like applied machine learning, experimentation and data privacy, delivering research and insights that improved Times, ability to draw and retain a large audience and scale the digital subscription business, which grew tenfold. Before joining the Times, Shane led data teams in startup called Memetric and Accenture helping companies build and scale experimentation programs within those organizations.

In this episode

  • Shane’s background and the role of field CTO.
  • How can small organisations start their data observability journey?
  • Is data quality purely a technological problem or is it more of an organisational problem?
  • Software engineering-related approach towards data observability.
  • Data lineage and why it is a hard problem to solve.


Welcome to another episode of the Modern Data Show. Our guest today is Shane Murray, who is the field CTO of at Monte Carlo Data, our data observability platform that connects your existing data stack to monitor for freshness, distribution, volume, and schema changes continuously and immediately notify stakeholders once incidents are detected. Monte Carlo has raised 235 million from Excel, ICONIQ Capital, GGV Capital, Red Point Ventures, IVP, and Salesforce Ventures since its inception in 2020. Shane has extensive experience in building online measurement products and working closely with the engineering teams to help build them. Before Monte Carlo, Shane was SVP of Data and Insights at the New York Times leading 150-plus employees across data science, analytics, governance, and data platforms. Under his leadership, Shane expanded the team into areas like applied machine learning, experimentation and data privacy, delivering research and insights that improved times, abilities to draw and retain a large audience and scale the digital subscription business, which grew tenfold before joining the Times Shane led, Data Teams are the software startup called Memetric and Accenture helping companies build and scale experimentation programs within those organizations. Welcome to the episode.
Thank you Aayush.
So first of all, Shane, tell us a little bit about your role as field CTO. That's an unusual title, and how's different from that of a CTO?
Yeah, it's quite different from the CTO role. A Field CTO is a newish role that we're seeing in, I think, especially in software companies google and Snowflake have field CTO offices and we're seeing some startups invest in this role. But the way I see it is, I'm leveraging a lot of my experience in this space as a data leader and in particular a data platform leader. To partner with Monte Carlo customers on their journey in observability. Data observability is a very new space, and so a lot of leaders that are leaning in here are charting new territory in their company, both technologically and organizationally, and Part of the role I play is connecting to their broader data strategy and understanding how we can accelerate that, that strategy through an observability program.
And how does it differ from the role of a traditional CTO?
Yeah, so for a start, I'm not leading the engineering team. and so I'm out in the field talking to CDOs and CTOs and other heads of data who are essentially running their teams. It's somewhat of a kind of consulting activity to provide my expertise to them on this journey.
So it's kind of like a CTO given to these companies on demand.
That's right.
And Shane, you joined Monte Carlo Data from the New YorkTimes. And you were there for around eight years working in data. Tell us a little bit more about your job there. Like what kind of data you dealt with and the data issues that you faced there, and what made you join Monte Carlo Data? Why Monte Carlo Data?
I'll start early on at the Times I joined in, in 2013 and arrived at a time when the New York Times was navigating this shift from predominantly an advertising business and a print business as well into the early stages of becoming the digital subscription business that it is today. And today it's the leading news digital subscription business on the planet. So it's been a significant shift away from a history of print and advertising products. I came in with a background, as you mentioned Memetrics at the start, a background in experimentation, software, and running large-scale experiments online um, and identified early on that the Times wasn't real, couldn't run these types of experiments or even analyze user-level data. And when you're faced with driving a subscription business and you have these mechanisms like a paywall. A balance between what's freely available to visitors to New YorkTimes and what they need to pay for you need to do a lot of experimentation to drive that. Um, and So we had a legacy data stack at the time. We had a lot of Oracle. We had, we were playing around with some on-premise Hadoop. I'd say we had a lot of vendor point solutions for product analytics. None of these allowed us to connect and provide a view of the customer Um, and, and they weren't very product-centric. They were designed to support, an ad sales business, so really didn't have a lot of usable user data, we weren't interacting with the newsroom, which is largely a driving force, but behind the subscription business. Um, and So it, it was then in around 2015 or 2016 that I was one of the leaders of this move to the Google Cloud platform. And obviously, Big Query is the central piece to that. Um, But both streaming and batch capabilities around Google and some of the tools there. And this was a huge unlock for us in just modernizing our stack, really throwing away a lot of this legacy technology and centralizing around a cloud data warehouse that enabled us to make different domains of data join away effectively and take on analytics and data science projects for the first time that started to unlock value for the business. I'd say from there just bouncing off that, that kind of cloud warehouse foundation we were able to do much stronger experimentation for the company across product analytics and marketing teams. We were able to move data into the newsroom. I had an early experience where, An investigative journalist from the newsroom worked with me to produce an insight report on the Times audience and coverage, which we then toured around the newsroom. But that was a foray into starting to build analytics and data science in the newsroom. And we can invest, in first-party ML-driven data products that drove. Both the way we approach subscription marketing as well as building segmented and tailored ad products for our advertisers. And then finally just kind of operating with the newsroom. We were able to apply machine learning in ways that started to scale our journalism and use judgment from the newsroom. And build a product experience that was much more tailored to the user. I'd say throughout this journey, it felt like a bit of a high-wire act to maintain quality and trust in data. As, and you've probably heard this from many people you've spoken to, but as a data leader sometimes, whether it's with the executive suite, with your consumers or with internal users of data, Sometimes one false move and you actually lose a lot of trusts, and then you have to slowly build that back in the organization with internal and external customers. And so for me, some of these challenges were complex commerce data that was transformed into financial reporting data that had to be kind of accurate to the penny, but also available for operational analytics. We had high velocity and high volume event data that was used in machine learning that had to be very available and on time and fresh. And we had content data that you have a newsroom at the New YorkTimes that manually tags articles with all this rich metadata. And then that becomes part of how we feed sort of algorithmic recommendations and other machine learning. And All in all, these data environments evolve much faster than you can keep up with manual quality checks. We had looked into some solutions and used some solutions that were testing for data quality, but you're tackling this small percentage of what you know at the time, and the data's evolving around you at a pace that, that requires a different solution. And so at that time, I was focused on moving into how we establish quality and trust across the breadth and the depth of Times data. And so launched an initiative and we actually brought Monte Carlo in as a partner, and this is probably two and a half years ago now. And really what we are looking at is how we identify more of those unknown issues. How do we have a solution that scales to our environment and how do we provide the ability to look upstream and downstream to resolve data issues? And I was just struck by Monte Carlo's team and their ability to deliver on that promise. And so then after leaving the, how I got to Monte Carlo was really after leaving theTimes and taking a bit of a nine-month break to do some travel I started talking to Barr about ways that I could maybe come into Monte Carlo to support customers on that same journey that I had taken at the Times. And I guess the rest is kind of recent history
I think that's an amazing story where you started as a Monte Carlo customer and then you ended up joining. So that's a big testimonial to the kind of work you guys are doing at the Monte Carlo data. So that's amazing to hear. Before we jump into the details about Montecarlo data and the kind of work that you guys are doing there what would be your advice to Companies who want to get started with their journey for data quality. Sometimes bringing in a full-fledged tool like Monte Carlo can seem overkill when you are at an early stage. What would be your advice, how would you advise companies to start with data quality in those early days when they can eventually move towards a full-fledged solution like that? Of Monte Carlo data?
Yeah and I think an observability tool isn't going to be for everyone. Sometimes when your data environment is small enough, when you have either a small set of tables to maintain or a small data team that, that are centrally responsible for the data, you can often cover that with some manual testing using SQL, and we built up some manual tests and a tool that we called snitch Which was measuring the deviation from metrics, and we're able to implement that on our environment. And similarly, there are open-source tools, like great expectations for engineers that allow you to test the data quality in your pipelines. And so I think starting with there, starting there when you. Known data quality problems that you can kind of get a handle on, but they're very clear to the business. I think often it's the things that trigger a need for really an investment in observability. One of them is, is that your data environment you're, you're moving to a central enterprise data warehouse that's managing different types of data, moving at different paces, and the complexity is just more than you can handle with manual testing. I think the other trigger we've also seen is organizational change. You're trying to democratize usage, both in terms of building data products, but also in terms of consuming data products. And there you need more of a self-service offering to users in terms of managing the quality of their data. And I think thirdly is just when you have, if you experience a loss of trust, I mentioned that earlier, but trust can be this sort of ambiguous concept that you have until you've lost it. And I think that can be perilous, whether it's with your consumers or with your executive team. And getting in before you've lost that trust is probably the key.
Yeah. Following up on that you talked about organizational changes that are happening and that lead to data quality issues, a bit a, kind of a controversial question, but is data quality purely a technological problem or is it more of an organisational problem? Or how would you balance that? As a vendor, as someone who is building day in, and day out in that, how do you think that data quality is something that can be solved mostly using technology or do you need a lot of organizational changes, and if so, what are those?
It's a great question. I think it is a mix of technology, organization, and culture. Right. If you follow the sort of DevOps movement, I think that has charted the course in both tooling change with iconic companies like Datadog, but also in terms of the cultural change around DevOps within companies around how you establish accountability and measurement and communication around reliability. And so I think definitely I tend to think at the very top of what Monte Carlo's trying to do is accelerate the adoption of data, whether that's data platforms or data products, and we're doing that by dramatically improving the reliability of an organization's data. So I think that's the real top-level change we're trying to drive, and what's happening both through shifting and I'd say even beyond Monte Carlo shifting the technology from data traditionally being more trapped in kind of warehouses and used in analytical dashboards to now data teams doing much more with customer-facing data applications, production algorithms, internal data products, and so just the level of reliability and extensibility required in these products is well beyond what they were 10 years ago or even five years ago. And we've seen some studies here that show that we're not even close to the reliability of systems engineering typically, you see about 70 data incidents in every thousand tables in the environment. So that's a pretty high rate of issues we're seeing. And to your point, this is both a technical problem. Do we have the right observability tools in place? The equivalent of Datadog for systems but for monitoring your data pipelines, do we have the right people in the organization who are owning data products, right? The right investment in data engineering and data product management is, in really treating data as a product. And do we have the right culture around data? I've found over my career that data is often treated like exhaust by these upstream software engineering teams, and it's up to the data team to collate and refine and manage that data into something useful for the organization. And I think in the most advanced companies, we're seeing that shift where there's a lot more responsibility on data producers to meet SLOs or SLAs around the data quality that they're providing, right? And with a tool like Monte Carlo, you can push upstream into those teams. And then there's also often data that, data product teams that are taking these sort of end-to-end ownership. Of a data product and they're responsible for the SLAs, either for internal customers or external customers. So a lot of what I'm finding, I'm working with Leaders on is how to make that cultural change and how to shift that accountability and expectation around data quality and the organization.
Yeah, that's an amazing answer. Let's dive a little bit deeper into more into Monte Carlo data, right? You guys have grown fast in the past couple of years, right? And one of the key things that we have also noticed is and I think, so you have touched upon this in your previous answer as well as you have drawn that an analogue between DevOps and tools like Datadog has done we see a lot of software engineering approach in terms of the way Monte Carlo solve things for data. I was going through some of the documentation around Monte Carlo. You could define your test in the form of YAML files you can do that, that's a very typical software engineering-related approach that you can take. So walk us through your product vision in terms of how you envision the Monte Carlo observability suite to evolve from here.
Yeah. I'd say at the core Monte Carlo has promised to minimize, what we refer to as data downtime, the period when data is unavailable late, erroneous, right? And so this is a core problem we're trying to solve. And to do that, I'll just frame the sort of way Montecarlo handles that is a platform to detect, resolve, and prevent these downtime incidents. And that's done through machine learning monitors. That allows you to alert according to incidents you mentioned early on for freshness, for schema changes for distribution in the underlying data for the volume of data. And we use lineage and root cause analysis, to fix these problems. And so where we see the product going, I'd one of one space for us. One of the real promises of Monte Carlo is the end-to-end lineage. And so we see ourselves doing more integrations into the modern data stack to meet that end-to-end requirement. Whether that's into areas like streaming or machine learning we'll see. It's often a question of where our customers are. Want to go and need to go? And secondly, I think we're doing some things to just make data quality more understandable. Working on this sort of level of reporting can be consumed by teams locally to understand the quality of their data and the reliability of their data, but also an executive at an executive level. I found at the Times it's useful for me to be able to look into different domains of data or data products and be able to understand how we're engaging with incidents, how we were reducing the downtime related to incidents and how we were meeting the uptime if there were certain SLOs or SLAs in place.
You mentioned data lineage, right? And data lineage as a concept has been around for a few years. And it hasn't seen that kind of adoption, which it should have probably seen by now. What do you think is the biggest barrier for any organization to have? Perfect lineage across their tables, across their data sources. Why do you think that's a hard problem?
I think from experiencing this firsthand at the Times, it's a tough problem because you have data products that were built in different eras of the company. On completely different stacks. At one point I think we were using six different, both off-the-shelf and homegrown ETL tools, depending on the type of data product we were building. And so with this mix of homegrown and legacy and some new tools, It's often very hard for companies to get that end-to-end visibility of their environment. And you might have all sorts of different warehouses with data sitting in them. Not to mention the ETL tools and so I think one thing I often encourage executives there is to focus on the kind of concept of a golden pathway, right? What is the future stack? And this aligns with what you write about in Modern Data Stack but what is the future stack you're looking to provide the organization, nevermind some of the legacies that are going to exist for the next two to five years, which you should, as a leader, have a focus on shifting away from that legacy and reducing the cost of it. But how can you outline to teams this golden pathway such that all new data products from there will be built on this stack? And that's where really where we can help with Lineage. Cuz we're very focused on the modern data stack from ingestion through lakes and warehouses, the transformation tooling, and then through to the BI layer, right? And so I think just having that focus on where you're going rather than where you are today is critical, to think about the lineage and the solutions around lineage that you want to provide. Another thing is like, why you, you care about Lineage? Is it lineage for lineage's sake or is it lineage for resolution's sake? And so for us, we have a very clear focus on using lineage so that analysts might be able to look upstream when they see a data anomaly and be able to understand where that breakage occurred so that they can more easily trace it. We do some things like correlation analysis between issues to try to understand where that occurred. And then you could also have data producers looking downstream to see the kind of blast radius of data issues and how many downstream consumers and reports have been affected.
She, there is recent that there are potentially a ton of data quality monitoring tools that have emerged in past few years. If you go to .There are things around 30 to 35 data quality tools listed out there. And Unarguably Monte Carlo data is doing an amazing job in terms of leading this whole moment around data quality. I think you were, one of the first guys to be able to, even started talking about this problem of data downtime, right? What, according to you is, Your biggest defensibility when it comes to both one product and your approach towards pro solving this problem of observability?
In terms of the product, I'll tackle that first. The enablement we have and the time to value for teams and that's driven by the machine-learning capabilities of the software, right? And so our models are constantly improving, but the idea. You can deploy Monte Carlo within 20 minutes and that's typically the time it takes for companies to deploy Monte Carlo. And then within, a week or two, you're collecting enough history on these pipelines on how fresh they are on the volumes that you're expecting on the schema, right? And also the distribution of fields that you can start alerting as to the issues based on the machine learning we're doing in the platform. And so I think the one thing I've seen over and over again is that time to value that customers get from Monte Carlo, from implementation to an alert that is business changing for them is dramatic. And I think we're the number one tool in that regard. I think in terms. What we offer as a company. Beyond that, I just look at the sort of range of customers. So we have anything from startup through to enterprise and the way we're able to help those customers is quite different, right? Then, at the smaller end, you often have, whether it's a FinTech startup or a healthcare startup that is looking at data as a consumer product experience, right? And so for them, Their entire customer relationship relies on that quality of data. And so we're able to work with them to make sure they're building up the reliability with Monte Carlo and the checks needed to ensure that they have a reliable and high quality and trustworthy product. And then I think at the other end of the spectrum, you have these large organizations. Roche or Jaguar Land Rover, are big enterprises that are on some form of a data mesh or other organizational initiative to enable people throughout the organization. And I think Monte Carlo is uniquely positioned to be a strategic partner on that journey. In terms of how you go about a data mesh within a large organization and also have the solution that meets those needs and can scale and be adopted across an organization of that size with different user groups. Both the sort of data engineer, but also the sort of analysts and data scientists that might be writing more of the quality checks around data.
Yeah. Shane, you talked about the time to value, but how do you quantify the investment required in terms of bringing in a tool like Monte Carlo Data by investment, I don't mean any commercial investment when it comes to buying the tool, but the people investment that needs to be done From an organization perspective, how do you quantify?
Yeah it's one of the reasons I love being in this space cuz it is very quantifiable. I'd say, at first, one of the things we see and we've done this through extensive surveys is data engineering teams are spending around. 40% of their time on data quality issues the earliest cases we make and is very clear to customers even in their first month of engagement is the shift in time spent resolving data quality issues, uncovering them and resolving them. So we measure time to detect and time to resolve. And you typically see that an incident, and I'd say this is on the good side, but an incident that might take three days to resolve goes down to less than one day. And so the engineering hours you're saving there can be both a saving in terms of headcount, but can also be engineering hours that you put into growth initiatives as opposed to maintaining a data platform. I think the second thing from there is just how you think about resolving or mitigating the data downtime problem and providing greater uptime for your systems. That data downtime has been shown to have a dramatic cost on a business's ability to drive revenue. And I saw this in my experience at the Times, the number of initiatives you can take on as a leader that drive the growth of the business shifts dramatically when you have a reliable and trustworthy data environment that you can either push out internally to democratize. Data to make faster decisions within the business or even build consumer-facing machine learning products in this reliable and trustworthy data environment.
And within an organization, what, in your experience, like who owns data observability?
Yeah, it's a good question and I think it's one that's evolved. It's often the data platform leader, right? Or the CTO right? And so it can depend organizationally, whether you have the data platform under a CDO or a CTO. Those are typically the sort of customers of this and the immediate landing place of a data observability platform. And sometimes the owner is also responsible for the quality of data across the organization. So at the Times, I was responsible for both the platform, but also the quality of the data that we were delivering through analysts and data science projects. Sometimes you might have a business counterpart who may be a data leader on the business side, who is the counterpart to the engineering group, the data platform group, and who is more responsible for data quality? And that could be the person we deal with. But I'd say typically we land in either a governance initiative or a data engineering platform initiative. And then expand from.
Amazing, amazing. Before we wrap up our episode for tradition, I have one last question. You have actively invested in data companies through Invest in Data. So tell us a little bit more about what kind of companies are looking to invest in, and what kind of problems within the whole data space that, are often interesting.
So I've been in this group Invest in Data for about a year and typically we are looking around the modern data stack because it's where so much transformation and growth is happening. I'd say it can range from, how we more effectively managed data think a space that's still really immature around the modern data stack is machine learning and similar to what we talked about earlier, it's both like the sort of organizational change as well as the technology change around machine learning. But I expect that's a space that, that we'll see many vendor-driven solutions that are allowing people to scale machine learning opportunities at their company. And then I'm super interested in the space of both data privacy as well as kind of, AI ethics and so I think that we're seeing several solutions come in this space that's more about the operations of that and how you go from policy to actual Technical enablement. And so any range of solutions around that help both a data leader or a data platform leader on their journey is where we're focused. And what would be the best way for any founder to, come and reach out to you? Yeah. So you can either find me on LinkedIn Shane Murray at Monte Carlo or you can reach out to me let's see. Can we provide, an address for the show?
We will put that on the show links. Perfect. And before we let you go congratulations for you have got a big conference coming up, Impact Conference late this October. Congratulations on getting guests like Daniel Kahneman, Ali Ghodsi, and George Fraser you got an amazing lineup and I hope all of our audience and listeners take out time to attend that event as well. So congratulations on that and thank you so much for taking our time for this episode as well Shane.
Thanks so much, Aayush. This was so much fun.

Relevant Links