I started my career as a first-generation analyst focusing on writing SQL scripts, learning R, and publishing dashboards. As things progressed, I graduated into Data Science and Data Engineering where my focus shifted to managing the life-cycle of ML models and data pipelines. 2022 is my 16th year in the data industry and I am still learning new ways to be productive and impactful. Today, I am now the head of a data science & data engineering function in one of the unicorns and I would like to share my findings and where I am heading next.
When I look at the big picture, I realised that the problems most companies face are quite similar. Their vision towards being data-driven has turned into a BHAG — pronounced “bee hag” (Big Hairy Audacious Goal). We data folks like patterns, so here are my findings:
The list is long, I am sure you can relate or add more to this.
In a nutshell, I found that data reliability is a BIG challenge and there is a need for a solution that is easy to use, understand, deploy, and also not heavy on investment.
Hello, I am Jatin Solanki who is on a mission to build and develop a solution to make your data reliable.
Complexities around data infrastructure are surging as companies gear to get a competitive edge and out-of-the-box offerings.
Every company goes through a data maturity matrix. In order to reach a level where you deploy AI models or self-service models, you need to invest in a robust foundation.
In my opinion, the foundation begins with a reliable data source or defining source of truth. Your data models won’t be impactful if it’s ingested with bad data. You know it’s garbage
in
garbageout
On a high level, here are a few checks you can implement to ensure data reliability:
staging
and production
OR between source
and destination
. This could be effective in running some financial recon too, like payment gateway to the sales table.The most common question people face with:
Build versus Buy
I am a big fan of open source tech, however, in some critical modules, I prefer buying an out-of-the-box solution because it’s scalable and already tested in the market. Developing in-house might cost you around US2k per month and it includes a few hours of engineer’s time along with cloud cost.
If you are inclined toward buying an out-of-the-box solution, here are a few factors that should be part of your checklist.
debug
.It should be in a position to automatically detect my critical data assets and apply hygiene checks.
At last, the solution should help you reduce data quality incidents and make your data more reliable.
If your answer to any of the below questions or scenarios is “Yes”, then you should procure or deploy a data observability solution right away.
As software developers have leveraged on DataDog, Dynatrace, etc kind of solutions to ensure web/app uptime, data leaders should invest in data observability solutions to ensure data reliability.
Similar Journal