Mar 14, 202330 min
Share via:

S02 E04: Legacy to Modern: Transforming Analytics Infrastructure with Ian Macomber, Head of Analytics Engineering & Data Science at Ramp

Ian Macomber, Head of Analytics Engineering & Data Science at Ramp, discusses the company's approach to automating finance tools and building the next generation of finance through data-driven decision-making. Macomber emphasizes the importance of cross-functional collaboration and embedding the data team into every part of the product engineering process. He also highlights the need for data compliance and privacy to be invested in every day and not treated as a one-time effort. Macomber warns against "Layerinitis," where teams prioritize quick solutions over long-term effects, and advises celebrating the hardening of code and inviting people into codebases to teach them best practices.

Available On:
spotify
google podcast
youtube
Amazon Music
apple podcast

About the guest

Ian Macomber
Head of Analytics Engineering & Data Science

Ian Macomber, Head of Analytics Engineering & Data Science at Ramp, a company that's revolutionizing the way companies spend less. Ian brings a wealth of experience to the table, having previously led data teams at Wayfair and Drizly. Ian is known for his expertise in developing and improving analytics platforms and his ability to guide companies through fast pace changes for compliance and security. He's also an MBA graduate from Harvard Business School.

In this episode

  • Data team at Ramp
  • Security and compliance aspects of Ramp
  • Transforming legacy system to modern infrastructure
  • About 'Layerinitis'
  • Building a career in data

Transcript

00:00:00
Welcome to the Modern Data Show, where we explore the latest trends and technologies in data and analytics with some of the brightest minds in the industry today, we are excited to be joined by Ian, the head of analytics and engineering and Data Science at Ramp, a company that's revolutionizing the way companies spend less. Ian brings a wealth of experience to the table, having previously led data teams at Wayfair and Drizly. Ian is known for his expertise in developing and improving analytics platforms and his ability to guide companies through fast pace changes for compliance and security. He's also an MBA graduate from Harvard Business School. We are looking forward to diving into Ian Insights on data-driven decision-making, Modern BI stacks, and the future of corporate spending. So let's get started. Ian, thank you for being with us.
00:00:47
Thank you so much for having me. Looking forward to the chat.
00:00:51
Perfect. So Ian let's start with the first question. Can you tell us a little bit more about yourself, your role at Ramp, and precisely what you and your team are responsible for?
00:01:01
Yeah. Maybe I can start a little bit talking about Ramp. It's a company that if you haven't used it it's hard to understand. So very generally Ramp is automating the finance stack for businesses right now. So really building out the next generation of finance, automating tools. So that's corporate cards, which is what we are first known for. Now it's expense management, bill pay, accounting, integrations and very much a company that is designed around saving businesses time and money. That is truly what our vision is. That's how we make decisions. That's how we think about products. And largely this has been a company that's grown super quickly. And what we're working on with the data team is figuring out how as we go from, one to ten to a hundred to tens of thousands of businesses, how we can start to take that data exhaust and help Ramp internally make better decisions, but externally also help Ramp, build products that we now have the right to, with all the data that we have to help save our customers more time and money.
00:01:59
Amazing. And how did that transition from the Harvard MBA to data and analytics happen?
00:02:06
Yeah. So the MBA was a bit of a detour. I started my analytics career at Wayfair. They had, at the time, legions of business intelligence analysts they hired outta school, and I was one of them. And that's really where I learned about how a massive, soon-to-be public, international company, thought about data. Really sort of one generation of tech stack ago. So in my, first couple of years, I had experience with on-prem hardware things like Hadoop for clickstream data and distributed computing. Having issues where, once you got to data of a certain size, you either had to not do the analysis or leverage customer infrastructure. At Wayfair got a little bit deeper into pricing and recommendation, machine learning. And when I left it was mostly because I felt I didn't want to get pigeonholed just into sort of the pricing machine learning space. Especially for recommendations you can spend an entire career on that. Went to business school, I think with the intent that I would return to analytics. I think probably. Went in thinking it would be at a mega tech company, like Facebook or Google, and came out wanting to lead a data team and I had that opportunity at Drizly. Drizly was my first experience at a cloud-native company. So it was founded in 2012 or 2013, so it never owned a server entirely on AWS. And I really remember thinking about, okay, this is not a culture of we build everything in-house. It's a culture of having to be very precise. So if you think about the Modern data stack, right? I think what's valuable about it is if you take someone who's pretty talented and can read some docs and hack at some stuff and you give them a credit card, they can build most of an analytic stack. Maybe in the afternoon. And so that is really where I developed the modern data stack. I think grew my career along with the trajectory of the modern data stack in these tools. Drizly was successful through the team substantially. There's a pretty big covid boom as it was Alcohol on demand and then came to Ramp recently and say Ramp largely was a company that's also been pretty successful. and the same thing, really wanted to inject data. I think the first year was building a lot of the data, building blocks things that are just foundational to the business, right? Like how many website visitors did we have? Building out some of our first multi-touch attributions, and being able to answer product analytics questions. And now what this next year is gonna be defined by is building things that are a lot more proprietary. So building out things that there might not be a medium article on or, a dbt slack talk on thinking about things that Ramp can build that are proprietary for the first time.
00:04:50
Amazing and can you help us understand how does Ramp's Corporate card differ from traditional card offerings that are available in the market?
00:04:59
Absolutely. And so there are a ton of answers to this question, but I'll just start with a simple example. If you think about maybe five years ago, you might have had an American Express and Expensify, and you think about spending money on a restaurant you have to get the receipt from the restaurant and pay a SAS fee for Expensify input, which probably happens at the end of the month. you receive. and someone has to go through all of that and put it all together and just the fact that these are two different tools just really doesn't make any sense. Expensify doesn't necessarily have a context that you have swiped that credit card until they see the receipt. Amex doesn't have context on what the receipt is until you input it, so it's a very reactive way to run a business. Ramp's rise has been because it becomes so easy to issue virtual credit cards. So we use Stripe Marqeta as issuing partner and can spin up credit cards much more proactively. So I'm just gonna give you an example of an engineering offsite that we did in Miami. We can use an HRIS integration, and figure out that every single engineer or person in the engineering org issue a credit card for $1,000 to be used only on flights and transportation. All of that already has the accounting coding done. The card expires three days after the engineering offsite is over. So if you think about what enables sort a person who uses the card, it's a little easier. You don't have to do receipt matching. But if you think about what enables a finance team, it's actually tremendous. And there are sort of two reasons why. The first is all of this spending is proactively approved, the maximum. that you can spend on this engineering offsite when you issue the cards. Second, all of the accounting, coding and contacts are encoded in the cart. You know that this is for an engineering offsite. You don't have to go in and manually click. And then additionally, we have a lot of integrations, whether it's Gmail, whether it's Uber or Lyft, such that these receipts automatically flow in. So really I would say it's a card that allows you to be very proactive and add context to every single dollar that you spend. I would say for a general employee, it saves you a little bit of money on receipt matching and counting coding, you don't have to do the thing where you chase down receipts at the end of the month. For a finance team and a CFO, it's a bit of a superpower.
00:07:26
Very interesting. Let's dive a little bit deeper into your team. Can you explain how your data team is structured at RAMP and how have you scaled this team so far?
00:07:35
Absolutely. So day one for me at Ramp we were a four or five-person data team at a company that had already raised an 8 billion valuation, which is just nuts. Like it's it was the wrong size data team. And so we certainly had a lot of growth. And I would say the answer right for me is it's like in perpetuity, assuming that how you have set up your team is incorrect or at least if it is correct for right the second, it will be incorrect in three months. So the number one answer I have is more on process than outcome, which is to create a culture of feedback where you understand where teams are being well supported, where they're not, where people are stretched too thin, where they're not stretched too thin. The ramp also really believes in tightly embedded teams of product engineering data and design. So that's how we're organized. The data team rolls up through the CTO and every single person on the data team is in one or two product and engineering pods. And so what this really means for me is when I hire someone, it is my responsibility to have an opinion on what the best product pod is. I work obviously with the engineering design product stakeholders to do that, and then largely what I do is put together a 30, 60, or 90 for them. and the expectation is that 30 days into it, they're gonna rip that up, tell me why it's wrong, and tell me what they're gonna work on instead. So we really at ramp value having data team members embedded quite tightly into engineering and product. We give them a ton of autonomy. It's very much like a bottoms-up process. And one other part about both our culture and how we organized is really what sort of I asked for from the team is that they come up with sort of their three top priorities for a product POD. Cross out the bottom two. And that's really what we judge them on do they deliver the number one thing that they need it to in that quarter? So super generally, that is how our data team is organized.
00:09:38
Wow. And tell us a little bit more about these 30, 60, and 90 things. You said the first 30 days are about ripping apart whatever thing that is already there, what, 60 and 90 for?
00:09:50
Yeah. So I'll give you an example. I think in all jobs, in an ideal world, 50% of it, you come in knowing how to do very well and 50% of it you don't know how to do at all. And I think that's how you get a steep learning curve. That's how you inspire people. That's how you challenge people. That's how you get people to feel like they're progressing. Certainly, for me, I think I came into Ramp with a lot of experience in growth and product analytics and very little in terms of enterprise sales and especially risk credit underwriting frog. And when I think about 30, 60,90's, I think we construct them in a pretty similar way where we try to set up a pretty tightly defined project that we know needs to be done and know needs some data team, essentially setting someone up for success on a project that's like a good thing to have in their 30, 60, 90. And then I'd say the other two things that I look for are a bigger murkier problem space with a team that they can embed in. So for example, working on price intelligence with this engineer and this product person going to these meetings, what network can we create? What meetings can we put on your calendar? Who can you get to know? And then the last piece for me is my expectation of everyone on the data team also has the data team as a stakeholder. So really thinking about how can we have a team that raises the bar. How can we have a team that educates one another? How can we have a team that thinks it's its responsibility to accelerate everyone's career as data professional? So largely I think that's the try to culture. I like to have and really in a 30, 60, 90 calling out, a project that someone's gonna be successful on, a team that they're gonna embed within a surface area where they can raise the bar for the data team.
00:11:30
And talking more about stakeholders. How does the RAMP data team work with other teams like growth go to market your risk and compliance team? And one of the very common structures that we have seen across organizations is having a central data platform team that is responsible for the code data platform. And then you have those federated data engineers. Who is supporting different functions? So how is that structure for you guys?
00:11:59
So I would say that really what we do is embed tightly into product engineering pods for every single part of the team. And that includes data platform. And so I'll give you an example of this, right? Like, many data companies, we are excited to introduce lower latency systems, and streaming Kafka feature stores. all of these things I would consider to be data platforms the same way I would Fivetran the same way I would Airflow or dbt. When we introduce them, we do not introduce them generally, we introduce them for a very specific and bespoke use case with our heads up thinking about how it can generalize. So I'll give you an example of this. When we rolled out some of our data science stacks for the first time, we didn't go across the entire company. survey all the data science projects and think about what to work on. We are tightly embedded into one team in the use case, which for us is risk and credit. We said we are going to design a data platform and stack that works for the specific use case. And the data platform engineers that are going to work on this are going to show up to risk and credit sprint planning. They're gonna know a ton about how we underwrite. They're gonna get to know the engineers by name. They're gonna get to know the product managers by name. They're gonna be incentivized on these KPIs. So largely I'd say for data platform stuff, the muscle that we've been working on is. If you can build something that solves 15 use cases, that's great, but let's be super precise about what the one use case you're gonna solve is first and work full speed to set up your data platform to solve that use case first. Then figure out how it generalizes.
00:13:40
And glad you mentioned risk and security and compliance and credit. Unlike your previous experience with Wayfair or Drizly, while working in those data teams, the situation in Ramp would be probably different in the context of security and compliance. When it comes to data, you're dealing with financial transactions, what changed from your data leadership perspective, in terms of the tools that you operate, in terms of the way you operate teams to be able to take care of security and compliance aspects of Ramp?
00:14:16
Yeah. So I don't think it's all that different from B2C and there are two different counterpoints, right? I think we have banking transaction data. We have KYC KYB bureau data. That stuff is highly sensitive. But if you think about Drizly or Wayfair, their customers are sort of individuals, right? So I think from a consumer protection standpoint, it's a little bit different. I think the number one thing I would say is the data team needs to lead from the front. and get people extremely excited cross-functionally about data compliance and raise that bar. Cuz it's not something you can do on your own. If you think about a company, right? You're gonna have whatever it is, a hundred, 200, 400 people that all have access to some level of privacy. And if you think about compliance and privacy and culture, it is sort of just respect something that's built and drops and lost in buckets. For us, we really have to think about every individual. We have to think about every surface area vector. If you look at some of the stuff that the FTC has been doing in the US, they're prosecuting individuals as opposed to companies for the first time. That is deliberate. That is a point that they're making. And so this needs to be something where you work cross-functionally with your engineering stakeholders and say, hey, Unlike other parts of tech debt across an org, this is not something we can pay down in two years. This is not something where we can focus on security and compliance two years from now, right before we go public or whatever the event is. This is something that we need to invest in every single day. I have found that the only way to get privacy and security and compliance initiatives is by working on them a little bit every month. So said another way. You never really want to put it in a situation where you're what are we gonna work on this month? The new product launch or private C security and compliance need to have a strong working group that is enabled by the CEO, enabled by the CTO, something that you invest in and every single month you say, what are going to be the initiatives that we push forward this month. And I think if you do that for two years, you end up in a pretty good place.
00:16:26
Yeah and does it change anything concerning the data engineering perspective, we are seeing a lot of data teams adopting to various tools like data catalogues or metadata management tools where the ultimate goal is broader collaboration across data assets. Right? And this is pretty much contradictory to the goals you would have at Ramp in terms of data security and data compliance. So, What changes from an engineering perspective?
00:17:00
I have found is certainly that like water sort of running to the bottom, people will do their work in the area where they have the most access. And so if you think about data sets, oftentimes they're replicated in multiple places, right? You have them in your core. You might have something like a Databricks also pointed at your sort of like core SQL you move that data via Fivetran to snowflake. You move that data in Looker, right? , I would say, right? Like we have user-defined Okta Group security access stuff for Snowflake, and we have it for Looker. We're really happy about that. But then the next question is how can you take that access level and have it permeate the entire org? So set another way. If someone can't look up a row of data in Snowflake and Looker, they should not be able to do it in Retool or some sort of additional tool that's hooked up to a pro-SQL system. So this is something that we're working on right now is to think about if there is sensitive data in Snowflake, right? Snowflake didn't generate that rope, it came from somewhere else. So how can we be proactive about moving upstream? Communicating that we have identified sensitive PI in our data warehouse, figuring out where else it went and stemming in at the source and making sure that whatever masking or data retention policies, we have as a data team. also apply to the rest of the company. But fortunately, we have, some great privacy partners and engineering partners to work on this with us.
00:18:33
So you have had a couple of experiences in the past where you have taken completely legacy analytics infrastructure and kind of made an overall to the modern data stack. How do you decide whether to build a solution in-house or purchase an existing tool that is out there in the market? How do you first of all make that decision?
00:18:56
Yeah. So a phrase that I love is. Build what is necessary and strategic, buy what is necessary and ignore everything else. And so the thought there, and this comes from Zach Canners is really if you build something, it is not an asset. It is a liability. And that means it needs to be supported forever. And so really thinking about. What things that my team builds can I honestly believe will improve fastest within Ramps, four walls better than anywhere else? And so I'll give you an example of things that aren't that right. The ability to move data out of Google Ads into Snowflake is not a core competency of Ramp. It is something Fivetran's working hard on every day with their engineering team. If I take that dependency on Fivetran. , I can assume that it will be more stable a year from now than it is today. That is not the case. If I build that pipeline in-house, if I build that pipeline in-house and the data engineer leaves, I can assume that the pipeline will be less stable a year from now. So some other examples, right? If you're going to issue a credit card, take a dependency on striper, and Marqeta, they will be better at issuing credit cards than you will be a year from now, cuz that is their business model. And I think largely right, that focuses you. highly on, sort of saying, what do I need to do better than any company in the world? And that's really what we try to emphasize building.
00:20:30
And assuming the case where you decide to buy an existing data tool, what's typically the process for you guys? Because I think, so this question might be very helpful for a lot of data companies, and a lot of data startups who are looking to, provide solutions to companies at a scale of from. So what's your typical process? And by process, I would love to understand is how do you. Tools I'm sure unlike a lot of companies correct me if I'm wrong but you wouldn't have a standard procurement process in place, I think. So it's still very much dream driven rather than procurement-process-driven. So how does that process look for you?
00:21:12
Yeah. so I will even start before we decide to buy the company or even try it out. One piece of advice that I have been given that's stuck with me is just the importance of asking for help. All of these tools in the space move so quickly. There is no way. That I should know about all these tools. And that's true for everyone, right? And so I think sometimes my team thinks of me as someone who is extremely well-networked in the data industry. That is not true. My network is what I would call like just in time. And I'll give you an example. We recently evaluated a tool called Metaflow. That's open-source data science. The sort of for-profit open core company is outer bounds. I joined their Slack channel. I DMed their CEO and I said, we want to know more about this tool. We have some questions. And he made time for me. And I think that is one of the coolest things about whether it's data, Twitter, or these Slack channels. It's a very open community and you can ask for it. This specific guy Vile set us up with probably two or three people that have adopted Metaflow. We heard some phenomenal things about it. We heard some things that are growing pains for it. We learned a ton about it. So I would say before even really hopping into kind of anything around procurement, that's where I start is just figuring out who can help me in the space, trying to get some time on their calendar and I think you'll be really surprised at who raises their hand and gives you great advice if you ask for it.
00:22:45
Oh, That's lovely advice. And so would it be fair for me to say that having a community around our data product, is a big kind of a consideration for you while considering these products
00:23:02
I think it is. And I think some of the best advice or context I've heard on this is a podcast with Jeremiah of prefect and Chetan of Benchmark on Invest, the best. They talk a lot about open-source and open-core products, especially for an open-source project. It is exciting and it is terrifying if you work either as an investor or sort of an employee at that company, cuz everything you're doing is in public, and you can see if people are engaging in your slot community. You can see if people are downloading your stuff on GitHub or not. You can see if people are upgrading to the paid version. So definitely, I think an engineering question that I quite like in an interview is how do you know which AWS services to use? Because I think pretty famously, right? They have over 200 services now, and some of them are, the building blocks of tech and others have quietly fallen by the wayside. And I think it's similar, which is it makes more sense to take a dependency on something you think is going to be around two years from now and it's gonna be better tomorrow than it is today. And evaluating the community around a tool, I think helps you make those calls.
00:24:15
Right and in, one of your articles you mentioned this term called Layerinitis. And it was very interesting for me to see that term. Can you tell an audience what exactly is that and how you tackle it at the ramp?
00:24:30
Yeah. So this is a concept that is entirely copied from I might butcher the name, but Jean-Michel Lemieux. He had a great Twitter thread about it. Shopify VP. And the observation that he had is that by default teams put code where they can put it fastest as opposed to where it goes when considering the long-term effect on the overall system. I think unquestionably, this is something that I've seen at all stops on my data journey, which is people will use your work. people will take your output. They will transform it for their specific use case. That is exciting. It is phenomenal that they're doing that. You should always have that ability, but you need to think about what that pull, like that gravitational pull towards centralization and building production-grade stuff is. And so for us, right? Like many teams, we have dim fact tables, we have great modelling, we think about Kimball and then we have additional surface areas where people can say, I'm gonna iterate on this stuff. I'm gonna build things in my schema. I'm gonna be building things in notebooks. And actually, before you know it and this happened, we had Ramp have one of our first account takeover events. And, we had people stay up late and they were doing incredible research on account takeovers, and it was all in notebooks and to a large extent. Ramp's entire account takeover fraud program for two weeks was running out of one analyst's notebook on their schema. And it was like a bridge off of our repo and it had a ton of commits cuz they were iterating quickly. And all of that was incredible. And that's exactly what we wanna enable cuz fighting fraud is oftentimes like fighting fires. It's if you come up with a solution two weeks later like it doesn't matter, the house is already burnt down. So we enabled that person to iterate quickly, but then thinking about all of this logic, everything that this person has built that will benefit many teams at Ramp is living in a pretty siloed area of both visibility and dependencies. We don't know what dependencies have been built on top of the data team, and so thinking about how can we invite that person into our code base, how can we show them how to commit? How can we give them help on this journey? How can we take that work back and distribute it to the entire company in a version-controlled way? That is how you solve this Layerinitis problem. And the number one and two things that I think are important are, one, a culture of celebrating hardening code, right? Because it is always easy to say, I would rather focus on the next thing. I wanna do the next analysis. I wanna build the next model. I wanna do the next product launch. It's harder to say we are going to take some of our work, refactor it, and build it the way it should be. So that's important. And then the other is a culture of really inviting people into codebases. So I'd say this is probably the number one thing I've changed my mind on in the last year one year ago I would've said the dbt repo is for analytics engineers and analytics engineers only. That is a recipe for people putting business logic in other parts of the stack. So for me now, I think it's really about how can I invite people in. how can I have them do work that adopts well to dbt? How can we teach them a little bit about dim and fact tables? How can we teach them about modelling? How can we think about making things accessible and looker for the entire company as opposed to in a super custom Databricks notebook with 250, 600 lines of code. So largely like those are the two ways we try to combat layerinitis as a celebration of hardening systems and inviting people into codebases and teaching them our best standards. Wow
00:28:16
Before we wrap up today's episode Ian, the last couple of questions before we let you go. What is that one, number one piece of advice that you would give individuals who are looking to build a career in data and analytics?
00:28:32
I think the number one piece of advice I would give. No one will ever stop you from learning something. And it is really surprising the extent to which you can become an expert on something in two weeks or potentially even one weekend of grinding on a new tool or package. So no one will stop you from doing this. No one will tell you how to do this. But I think that the number one thing is just saying instead of wanting to know about Docker, watch some YouTube videos. Learn Docker. If you know how to build a Jupyter Notebook, you can go from building a Jupyter Notebook to building a flask app that is hosted as a surface at a weekend if you put your head down and grind on it. So that's just the number one thing I'd say you can always read the documentation. No one will ever stop you. You can always figure something out quickly, but you have to invest in putting that time in.
00:29:26
Nice. And last question. How can the audience learn more about Ramp and the work that you guys are doing? What would be the best way to keep updated with your work
00:29:37
I would say certainly Ramp's website, but then also Ramp's Engineering blog. We had a phenomenal post recently by Kevin. Who leads our data platform team for analytics engineering, talking about how Ramp saves money with some of our CI CD processes. So certainly Ramp is all about saving companies time and money and hopefully, we can contribute back a little bit to the analytics engineering community as well. So I'd say start, yeah, start with the analytics engineering blog post on Ramp. Perfect.
00:30:00
So thank you so much, Ian, for your time for this episode. I'm sure our audience would learn a lot and they would've received a lot of insights that would help them in their day-to-day work. So thank you, thank you very much for being a part of the show.