Sep 20, 202236 min
Share via:

S01 E02: The third wave of data technologies with Mahdi Karabiben

The data space has grown rapidly in last few years. To put this progress in context Mahdi has divided it into three waves. First being of ETL, OLAP, and relational data warehouses. The second being the era of scalability and now we have entered the third wave - the Modern Data Stack wave. Join us in this interesting conversation where Mahdi talks about various aspects of Modern data stack like ETL, CDC, data obvervability and much more.

Available On:
spotify
google podcast
youtube
Amazon Music
apple podcast

About the guest

Mahdi Karabiben
Senior Data Engineer

Mahdi is a senior data engineer at Zendesk who enjoys working with data and building scalable data platforms. Zendesk is a leading customer service software that was founded in Copenhagen, Denmark in 2007. At Zendesk, Mahdi is responsible for building data products and enhancing the existing data stack to help internal teams access product data at scale. He loves to contribute to open source, and recently Mahdi also authored a live project series on 'Building end-to-end batch data pipelines with Spark' with Manning Publications.

In this episode

  • Mahdi’s journey from Democracy International to Zendesk
  • Data stack and data team structure at Zendesk
  • Building and managing custom connectors
  • Building Vs buying data tools
  • The third wave of data technologies

Transcript

00:00:00
Hello, everyone. You're listening to another episode of the modern data show. And today we have Mahdi Karabiben joining us from the beautiful city of Paris. Mahdi is a senior data engineer at Zendesk who enjoys working with data and building scalable data platforms. Zendesk is a customer service software that was founded in Copenhagen, Denmark in 2007. And have since grown to over 5,400 employees serving 160,000 paid customer accounts in 160 countries, powering billions of conversations with hundreds of millions of customers over telephony, chat, and many more channels. At Zendesk, Mahdi is responsible for building data products and enhancing the existing data stack to help internal team access product data at scale, and he loves to contribute to open source. Recently Mahdi also authored a live project series on 'Building end-to-end batch data pipelines with spark' with Manning Publications. Mahdi started his career at Democracy International helping Tunician ministries work with electoral data, and then joined an Adtech firm, Numberly to revamp the data architecture of one of the core products followed by a stint at Credit Agricole and FactSet. Welcome Mahdi, we are super happy to have you on our podcast and thank you for joining us.
00:01:16
Thank you Aayush, Thank you for having me. I'm very happy about being on the podcast and thanks again for the invite. And I'm looking forward to talk about data and about data stacks.
00:01:26
Amazing. Amazing. So before jumping into anything specific Mahdi let's first start with your work with Democracy International, how did that even happen?
00:01:35
Yeah, I mean, it was around six years ago. I was still a student. And so basically Democracy International is an American NGO who mainly works with governments on trying to democratize access to data. So, you know, for data that comes from ministries and from official sources, usually it's not very easy to actually understand and comprehend for any citizen. And so Democracy International basically tries to democratize access to that via building interactive charts and dashboards that are easily accessible and trying to build UIs for open data platforms that allow people to easily understand those data sets. So, yeah they reached out when I was still a student because I already started being active in some open source projects. And they reached out saying that, Hey, we, we have this project with Tunisian government to actually try to build platforms and UIs for different open data sets. And that was mainly what I worked on trying to just revamp that type of data and trying to think of how can any citizen access that data in a very efficient manner and a very interactive and intuitive manner to get a full picture of the data, whether it's municipality, whether about how the government is doing about budgets and so on. So it was lot of fun and it was, yeah, my first dive into data and trying to explore different data sets, data quality issues and so on, and it was quite interesting. Yeah.
00:03:03
Awesome. Awesome. And we talked about this when we met in person last month, but tell us a little bit about your journey from Tunicia to now in Paris.
00:03:13
Yeah, it was I think it was quite interesting for me since I started working with Democracy International, it was clear that I want to work in data. Like I, enjoyed the work so much. I enjoyed working with the data, like whether it's modifying it, trying to build dashboards and so on. It was really interesting work for me. And so for my end of studies internship, I wanted to work with as much data as possible and try to maybe work on other parts of the stack, whether it's the data warehouse, whether it's building data lakes, just trying to explore that whole ecosystem and an opportunity with Numberly was available. So Numberly is basically an AdTech company based in Paris. They work on basically most of the marketing campaigns that are done here in France, whether it's via phone SMS, or emails, or even publications or ads on websites. And for that, we use a lot of cookies data, and we use a lot of data from external sources. They have a large Hadoop cluster and the project was very interesting. It was about migrating from existing pipelines that were done with Hive into Spark and Airflow. So it was a lot of work to be done and also a lot of optimization and going into very interesting parts of how Spark actually executes queries and so on. So it was a very interesting project. It was, that led me to come to Paris. And since then I'm here I remained in Paris and I remained working with data.
00:04:46
Amazing. And you know let's go, let's talk a little bit about Zendesk now. So you know, help us understand, how is the data organization at Zendesk structured and how does data work really happens at Zendesk?
00:05:02
Yeah. So I think it's, I'd say it's the typical path of any tech oriented company that grew really quickly. So you start from a point where you have that period of very quick growth in which there are acquisitions, there are new organizations new ways of working with data and so on. And so the whole tech stack grows in different directions very fast, and it's just keeping track with it will be a very hard task. And it's the same for data. Zendesk is by nature a very data-oriented company, and it's very data-driven, all the decisions. Actually, I was surprised when I joined how data driven the company is, every opportunity to actually leverage data in decision-making is taken, there's no way of just having data stay or having data available and not leveraging it. So with that, we are eventually many data teams, but some of them are created organically, some via acquisitions and so on. The three main teams within Zendesk are first is Zendesk Explorer, which is basically a Zendesk product, that allows, Zendesk customers to, have analytics and reports on how, their teams are using Zendesk and how many tickets are created. So behind that there is dedicated data engineering team that works on offering those capabilities and data apps to the Zendesk customers. Then there is foundation, which is the engineering team that works on foundational data platform, which is mainly for product data. How product teams actually publish their data, how data is stored, how it's processed and then how it's made available to internal teams within an internal platform. And the last team, which is the team I'm part of, which is EDA -enterprise data and analytics. We basically work on building the curated data sets and working on the data warehouse of Zendesk to actually work with the internal stakeholders to offer them curated data sets and access to data, whether it's for business purposes, or even for product analytics and so on. So we have many data domains that we own. So for example, take finance data. How is the company doing financially? You can take product data. How are users actually using the features that we release? How are users actually interacting with different features of Zendesk products that would help product teams better organize their roadmap and better think about the features that they are releasing. So, yeah, in total, currently we're working on six data domains within EDA, but we're very small part of big picture when it comes to data at Zendesk that's explore foundation and then EDA, I think
00:07:37
So we saw a similar org structure when we spoke with Canva, you know, they have a very similar, federated, a structure, especially when it comes to data, you know, the data team. So that's super cool to hear. So taking a little bit now, you know, further, so help us understand how does the Zendesk stacks look like? You know, what kind of tools and platform do you guys use internally?
00:07:59
Yeah, I think it's honestly, when I joined, I was very surprised to see this because again, it's many data orgs and many people within the company thinking about data and thinking about how to actually build a platform and make it as accessible as possible. And so it goes in many different direction, but as you said, the main component is central foundation data platform, which is basically on AWS on S3. So we have a main, it's two components. One is the data lake and the other is data hub. The data lake is basically datasets on S3 that are managed by Apache Hudi and that are offered to consumption with Athena. So you can query it via Athena and the tables are on the glue catalog. So the typical AWS stack for managing data lake. And that data comes in from basically change data capture from the product databases. So the bin logs and from the bin logs, you extract all the changes and events pushed to data lake via Kafka. So it's let's say it's yeah a typical CDC process in which the advantage of using Hudi is that we can actually manage the data at row level. So for GDPR, for example, and other reasons, Hudi offers you capabilities to actually ensure that you can delete, for example, one specific row without any large repercussions to that and you can actually modify the row level information in a very efficient manner. And also we use lake formation to actually manage, access to that data and row level access, and also column level access for security and also for data governance and on the second part of it. So data hub is so it's part of the initiative called platform data architecture. And the idea here is that product teams publish the data on their own. So it's product teams who publish events via Kafka then the events get consumed via Flink. And when they get pushed to S3, when again, it's Hudi and you have tables that are managed by Apache Hudi and this is the direction that the company is going through. So the idea is to get product teams to publish their own data. And so here it's easier to establish data contracts, to define standards, how to actually publish that data. What are the standards that every data assets that's published on PDA should meet? So it's a big portion of data mesh principles of how product teams should actually own their data and should publish the data that they're generating via their applications in a very standardized manner. And that's the path in which company is going. There is a very interesting blog that has been published very recently on the AWS tech blog by the head of foundation engineering that goes into the details of this platform. The initiative that’s actually implemented on AWS. It's very interesting. So, for a very deep dive into it, I totally recommend that blog. And it goes into the details of how this came together and how actually platform is doing now.
00:10:57
Sure. We will post the link of the blog along with the episode notes. So tha thank you so much for that. So Mahdi tell us a little bit about the, you know, uh, you talked bit of it in a sense that you have the change data capture process that kind of publishes those key product events back into your data lake and eventually Hudi and then Athena are you do you have your own custom built CDC infrastructure or using something like Debezium.
00:11:26
Yeah, for that, we're using the AWS product. It's called DMS. If I'm not mistaken. And that product basically, so Zendesk data is on around 2000 Aurora databases, Aurora MySQL databases. And to get the CDC from all those databases we're using the AWS product DMS, which basically works on capturing all those changes and pushing them into the dedicated S3 buckets on top of which they are managed by Hudi and on top of which you can access the data via Athena. So that's the part of a process on the foundation side, and that's how the product data gets into the data lake. And all of that is before, the team I'm part of EDA ,even starts operating on the data. So our team EDA we actually consume the data from the data lake. And then we get it into our own data warehouse, which is on Bigquery. And that's when we add other sources to it. So from what we get from AWS is mainly product data, but then you can imagine that you want to actually add other sources to that. So data you get from your CRM, the data that you get from your billing system, data that you get from Even your HR system. So you have a very number of systems that you actually want to add data from to that product data. And that happens on Bigquery on the data warehouse. And we move data from AWS into Bigquery again, using Kafka. So the company as a whole is very, it's very focused on event-driven architecture. So it's very rarely that you see data moving within Zendesk without having Kafka in-between or without relying on events. Yeah.
00:13:07
So you, you talked about, you know consuming the product data using the CDC and you, you just talked about, you know, there are other data sources as well. So have you built your own custom connectors to be able to pull data from these sources? Or are you using some, you know, off the shelf tools like Airbyte or Fivetran?
00:13:28
Yeah, it's mostly custom connectors because again, I think with the company being very data-driven, we had to think about those needs and those initiatives before they were like, let's say democratized and all companies start to implement them. And so when you start thinking about it in 2017, for example, you have very very few options that you can pick from. And most of those options at the time, we didn't have all functionalities you were looking for. So if you wanted to do that six or seven years ago, you had to build a lot of stuff on your own. That was the case for Zendesk. A lot of it is internal components that run on Kubernetes that basically do a lot of a lot of the stuff that's currently today can be done with Airbyte or Fivetran. But for us, we had to start from scratch for most of it. So it's mostly internal components.
00:14:18
Amazing. And, and what kind of data volumes are we talking about? I would assume like given the huge scale of Zendesk, it would be huge. So any, any sense in terms what are the data volumes you're talking about?
00:14:29
Yeah, it's Petabytes, but it's like, as you said, in the intro, like Zendesk is a company we have round, I think more than 100,000 customers. And you can imagine the number of tickets that are being created on a daily basis. So we have billions of tickets. Usually users have a lot of complaints with the companies that they work with. So there are a lot of support tickets that get created and yeah, it's billions and billions of tickets. And the total amount of data it's in the Petabytes. And again we're talking about data, so of product data that gets into the data lake. A portion of that is what comes into the data warehouse and where on data warehouse yet more sources, but just for product data created by Zendesk products is in the petabytes.
00:15:15
Got it. Got it. And one of the key things that we keep hearing, you know, in the data space and which is kind of one of the pitches for most of these ETL companies is that these connectors are hard to maintain, you know, they break and, you know there are change in API specifications, there are tons of things that keeps on happening. How often do you face this challenge? Like, you know, you are, you're kind of managing this internally. So how often do you face these challenges? See some connector broke and you have, you know, those pipelines broken how often that happens there in Zendesk. And what do you do to solve that?
00:15:51
Yeah, it does. It does happen quite often because as you said the thing is when you build, like, when you work on data integration, you may start with maybe five data sources. But in one year, those 5 data source will become 20. So you can't actually build something without thinking about making it flexible, making it resilient because things will break. Things will change. And also you will add new types of data assets into it. So yeah, we do encounter those problems quite frequently. The idea is to ensure that you have logging that you have monitoring, but if something breaks, you are aware of it immediately and you can go out, immediately fix the issue and ensure that whether it's due to an API change, sometimes the API would be down for a period of time because that data source, let's say they're doing maintenance or they maybe updating something. So the API would be down. And it's, very hard to ensure communication on it and to ensure that you get the information when you need it, when there's that type of change. So let's say the brute force way of ensuring that at least when something goes down, you're aware of it just ensure that you're logging everything you have monitoring and alerting. So whenever something breaks, we have immediately an alert on slack that says Hey this task failed and you can immediately go and see why.
00:17:13
Tell us a little bit about how do. How have you built observability around these processes? So we are seeing you know, we are seeing two waves of tools coming in the data observability space, one is around data at rest, which is where once you have your data in the data warehouse you have set of tools that helps you, you know, maintain the sanctity of that data or check the sanctity of data, your tools like Monte Carlo Data, Anamlo there are tons of tools out there. And then there is also a set of tools that are focused more around observing data at motion. That means, you know, observing the data pipelines. Are you guys have any such tools around that or is that something that is, you know, kind of developed in house?
00:17:59
Yeah. So it's mostly for the data quality itself. We, so we recently in the past year, worked a lot on that, especially on the data at rest did the monitoring and observability on top of that. Some of it is in-house processes because again very large parts of a stack are built inhouse. So we use airflow as our orchestration engine, but lot components. And even how we use airflow is very customized to our own platform. So we have also customization on data quality. So as soon as we receive the data, and as soon as it's just into the raw layer, we perform data quality tests on it to ensure that the data is actually meeting the standards that we defined and only then you move it to the next step of your workflow. And then in terms of checking data, quality and observability, when the data is still in the streaming process or still in movement. That's something that we're still working on, at least with parts that I'm aware of and currently we're mostly monitoring breaking changes and if something breaks, we're aware of it immediately, then you can catch all potential issues. So it's a very big effort to be able to catch all potential issues as soon as they happen. So I think the bare minimum is that if something breaks, you're aware of it immediately.
00:19:22
Understood. And, Mahdi you talked about, you know, the ETL process at Zendesk and now keeping aside Zendesk, like from, you know, understanding from you as a senior data engineer, ,what are your thoughts around open source versus commercial solutions for ETL? Right? There's a, there's a new wave of, you know, open source ETL platforms coming up and the, basic pitch is, you know, we through democratizing the connectors, we build that long tail of connectors that, you know, is not kind of affordable to be, get maintained by a singular you know, commercial vendor. So what are your thoughts around that and how should any company, how should anyone earlier in their journey go about selecting commercial versus an on-prem on-prem or open source solution? Like probably Airbyte or Meltano
00:20:17
yeah. I mean, that's the, the eternal debate of build versus buy. I think it's a very delicate equation. You can't have an absolute right answer. It'll always depend on your use case. You just have to ensure that. Yeah, open source is free, but you also have an engineer that goes to actually implement what's open source. You have to not just implement it and deploy it. You have to maintain it. And you have to make sure that you actually can afford the resources and the time to actually maintain that product. And on the other hand, you have the financial requirement to actually buy something from a vendor, which will allow you to put your engineering resource into something else. So it's a very delicate equation, which changes very easily, like the fine line in between from this to this has a lot of factors that are defining it. I think in general, for companies that are getting started or if you just want to get started with it. If you have very limited data engineering resources within the team might as well get those resources to actually producing value from the data, which is ensuring that you can actually build data products. You can deliver data assets, whether it's data apps, or just curated data sets, or even allow your business users to access the data in a very efficient manner. And so that means that the more abstracted parts of the stack, where you're actually getting the data from different sources and you can start by just getting a vendor to deliver that capability for you and you get your team to work on the more business specific areas. So for example, you have a lot of business logic in data transformation. You have a lot of business logic in how you deliver the data, and that's where you need people who are actually aware of all the constraints of your own team, but moving data from place A to place B usually it's not that specific to your case. So if there's a vendor that offers all the connectors you need, might as well start with that, and eventually you can move to something that you built in house or an open source project, if you actually can afford those resources. But I think, yeah, I think just going with an open source project or a vendor based on the number of connectors, isn't always a good metric because you wouldn't need like 150 connectors getting started because as a small company, you would need maybe 10 basic connectors that most tools would offer to you. And if you have maybe one or two niche tools that you're using the cost of actually adding connectors for those tools, No matter which tool you're using, is very low compared to the effort that you would need to actually implement the platform as a whole. So I think that's the wrong metric to look at, especially if you're just getting started. I think we need to just list what you can afford to spend in, in engineering time and say, can I actually dedicate this time to just building capability, but won't deliver immediate value to my customers? Or should I just work with vendor at least get started with that. And then eventually, maybe on the midterm, try to think about what would be the most efficient solution for this on the long term for me.
00:23:21
So I, I saw in one of articles that you mentioned that data technologies are going through a third wave. Tell us more. What do you mean by that?
00:23:30
Yeah, I mean, I think for most data engineers who are currently working on the current data stack. I think most of us started with the Hadoop era and Hadoop ecosystem and all of that, which when that started it was the second wave of let's say data technologies because people started working with data more than 20 years ago. So you have tools, but so the notion of the data warehouse on the notion of ETL and so on are, have been here for decades. And so initially it was a lot of heavy processing, a lot of slow processing to actually get the data into a consumable state. And it had very certain limitations, like the size of the data, the size of resources you can spend because your, the data warehouse is something that lives within your own infrastructure and you have to maintain it. You have to buy, if you want more data, you want to need, you need to buy new disks and so on. And then with Hadoop, it was the second wave. The notion of you can actually do this at scale. You no longer have limitation of one data warehouse or one instance for a data warehouse. So you can have an infinite number or very large number of just disposable hardware. On which you can actually run data processing, you can store your data and do it at scale. So that was a very important point for a lot of companies to actually say, Hey, all of this data, that I can actually leverage is something interesting. It's a big asset for me. So a lot of companies started betting on Hadoop on just trying to build that whole data platform internally Hadoop offered solutions to storing data at scale processing the data at scale with spark, but it was still very complex. It was still a long process to get up and running. And once you do you have new questions of how do you actually ensure data quality at scale? How do you actually ensure that the data that you're proposing at scale is accessible is democratized is easily queried by different types of users. Some of them want to use notebooks, some of them want to query data in an interactive manner. Some of them want dashboard on top of your data. Like how do you actually offer that? And then third wave with modern data stack is going even further. And instead of just focusing on those core capabilities that are now available, it's now tackling those small features. It's like, okay, you don't worry need to worry about data quality. You have now a tool, that will do data quality for you. You don't need to worry about making the data that you have on your distributed platform accessible. You have a data catalog that will do that. You don't need to worry about, for example, ensuring that you capture metadata, you have a tool, that will get metadata for you and make it accessible. So it's now each phase builds on top of a previous one. And with the modern data stack, it's basically just building on top of what we started with Hadoop era and just making the hard bits and yeah, the stuff that was complex to implement in Hadoop much easier because now it's managed services that you can basically get out of a box and you just need to plug your data into it.
00:26:32
And you know, one other thing that we have seen that has happened along with the evolution of the modern data stack is the shift from ETL to actually ELT, which is where you're pushing all the data into a data warehouse in a data lake without actually transforming it. And it is just sits there until someone needs that data. And practically that data is of no use still then, and you know, we are all sold by these, you know, cloud data warehouses, like snowflake, where. You know, the storage is cheap, the computation is expensive. So you only pay for compute. So what impact do you think this have on the overall modeling and governance around data? Because now you don't think about, you know, You know, creating data and putting into a data lake, you just worry about, you know, making it consumable when you are going towards the, the compute part of it. What are your thoughts on that?
00:27:24
Yeah, I think, I think that's, it's a double edged sword because it makes things too easy and when that things get too easy, it's very easy also to just go into an abundance of creating tables using data in any way possible, because you can, and eventually you, if you'll leave your teams to do that, eventually you'll find what's now it's called the data swamp eventually. That's where you land because it's very easy to transform data to build tables on top of it. So I think we go back to point of standards and that you actually need to define standards, even if you can, theoretically like denormalize all your data and build any table possible. You still need to define standards, not only. So even if there are no longer compute constraints or storage constraints. Even for users, eventually you’ll realize that it's much more efficient and it's much easier for people to understand how the data is actually structured and what's in data warehouse, if you apply standards. So I personally don't think that there’s one golden standard. You can go with the Kimball approach, you can even go with the approach of denormalized tables. You just need to standardize it. You just need to say, this is how we model data. This is how we're gonna. Go forward with defining our dataset. This is how we actually want the data to be visible to users. This is how our consumption layer on data warehouse gonna look like, and we need to respect those standards. And I think with that, it'll be much more efficient. First anyone in the company will be able to understand no matter what data model will looking at they'll understand. The current process, how the data is being built and okay, how can I consume this data? And it'll be much easier for you to scale. Otherwise, if, if you just scale using the compute and storage capabilities because you can, eventually you you'll have the data, it'll be just very complex for you to consume it in efficient manner. And anyone who will look at the data warehouse, they will be lost. They won't understand. Okay. Should they use this table or this table? And how is this being created or should they build something else? Sometimes they'll go and build something else and it'll be just more costs for you eventually, and also more complex to use data.
00:29:34
That's an amazing thought. That's an amazing thought, Mahdi. So we are moving towards the end of our episode. And you know, we would like to wrap up the episode with a set of, you know, quick, rapid fire questions. You know, the questions that, you know, you don't have to think much about it. So let me start with the first question. Uh, tell me one tool that you feel you just can't live without. Tell me one tool in this whole data value chain, which you feel. Damn, you know, life would've been much different. Had it not been that tool?
00:30:01
I Mean, if you ask me the question, maybe five years ago, I would've said Spark because Spark was a complete game changer. Like I, as I said, I worked in moving data from data process from Hive to Spark and Spark makes life much easier, but right now I’d say dbt, like, especially for, yeah. When you look at how SQL based pipelines were three years ago and there are things now with dbt it's, it's come like the quality of life upgrades you get are, are massive.
00:30:29
The next question is, we are seeing massive explosion of tools that are coming out of the modern data stack ecosystem, right? Where we have a lot of tools that are solving very small niche problems in that entire data value chain. What do you think is the future? What do you think the is the future having is having like separate tool for every single problem that you have in a value chain or you expect consolidation. Yeah.
00:30:56
I honestly think there will be a lot of consolidation because currently, like there's. A lot of overlap on any aspect of the modern data stack. So you have your data quality tool that has dashboards to show you how the data is doing, but then you can get that same capability on a data catalog and even like how you discuss about data, you have every tool that will offer you those basic capabilities. And since it's basically a whole company that’s tackling a small feature, they're obliged to just expand. And with each one expanding out that core feature into other adjoint features, eventually there's just big overlap and very solid chances for consolidation, because it doesn't make sense for data teams to have like 20 contracts or work with 20 vendors. Eventually you'd want to minimize that. And I think that SaaS companies will figure that out eventually and we'll figure that consolidation is the only way forward.
00:31:52
Perfect. And then the next question is what's your go to place for learning new stuff about data, any particular blogs or, you know, newsletters that you follow and that you would like to share with the audience.
00:32:08
Sure. I mean, I think the first asset is the newsletter of my coworker at Zendesk and Ananth, Data Engineering Weekly, I think it's a very, I think it's great resource and I think the effort he puts every week in curating those articles are really just massive. Like I, I personally try to write maybe one article per month, and I sometimes struggle with that and you see that he reads all those articles every week and publishes stuff. So I think it's a very high-quality asset. And then honestly, I try to follow discussions on data Twitter. Usually that's where you get intrigued by different topics and bits of discussion. I think that Twitter offers a very digestible way of getting information. So we read a few tweets, you'd start thinking about something and then you go and do a deep dive into it, you read articles, you read blog posts and so on. So that's usually the workflow for me,
00:33:02
Yeah, honestly, there were quite a few to follow for example, Bar Moses CEO of Monte Carlo and there are also from Hightouch, Pedram. Who's also very active and who is the person who started most of the interesting debates. And when you start from their end, what I sometimes do is that when I find someone who's publishing interesting stuff on data stack, I just. Who are they following? And they start following those people.
00:33:32
Amazing. Then wrapping off with a last question. So tell us one thing that you love about your job and tell us one thing that you hate about your job
00:33:40
as as data engineer in general.
00:33:43
Yes. As a data engineer
00:33:46
I mean, one thing that I love is seeing the impact of the things we built. And this is especially true, for example, within a company like Zendesk, you actually see that you are impacting decision making by the data assets that you're working on and the data pipelines that you're building, which is honestly it's I think there are very few jobs within tech, as you see that immediate impact at that scale. So I think it's quite rewarding and it's a very nice feeling. The thing that I hate, I think. It's something that the whole community is working on. That as data team you work with a lot of external stakeholders and you have a lot of potential things that can go wrong. Because the data platform depends on many things that are out of your control. Especially, if you, like, if you look at a more old school approach where you have a data platform team that's working on the data, you have data producers that don't care about what they're creating. It's very frustrating to have to fix something that you didn't actually create. So it's an issue that didn't start from your end, but yet you have to fix it and you have to just worry about it. And the team that actually created the data doesn't even care about it.
00:34:57
Thank you Mahdi so much for taking our time for this episode .I think so we had an amazing conversation and I hope the audience would enjoy our conversation as much as we had while even doing the conversation. So thank you so much for your time.
00:35:08
Yeah. Thank you. It was it was a lot of fun and it was great to, to discuss about all these data topics. Yeah. Thank you again for having me and looking forward to chatting again.
00:35:18
Thank you so much.