Breaking down some of the problems I’ve seen in data collaboration and offering advice on how to make better, faster decisions with collaborative analytics.
We all know how analytics can help make businesses make better decisions. Being data-driven just simply isn’t an option anymore. Startups are collecting more data than ever before and are increasingly using that data to eat the world. We’ve seen the rise of OKR goal-setting, reporting tabs in nearly every SaaS tool, and metrics dashboards lining the walls in office hallways.
Yet for all of this progress, the ways we share insights and make decisions together haven’t evolved over the last decade. Too many companies spend too long collecting, preparing, and reporting on data to improve. In this post, I’ll break down some of the problems I’ve seen first-hand in data collaboration and offer advice on how you can create a better environment for creating, sharing, and measuring data together.
Startups now have a plethora of dashboards in our tools to find data. But it’s still too challenging to find the data you need. Legacy BI tools suffer from dashboard sprawl and poor search experiences. It’s challenging to agree upon and standardize metrics to get common questions answered in a scalable way.
To fix this, we’ve tried creating data catalogs so operators can more easily answer FAQs and find the data they’re looking for. We’ve tried to provide definitions for data to help operators understand what tables entail, how metrics are calculated, and what a particular column means. These efforts result in incremental progress towards becoming data-driven, but more is needed.
Too few employees have access to real-time data to make everyday decisions. When operators need to query the database or ask a question involving data from multiple sources, they’re forced to ask for help in a shared analytics channel or submit a ticket to their data or engineering teams.
This process results in a painful waiting period until the technical team can prioritize the request. Even when the data or engineering team can prioritize a request, there’s typically a frustrating back-and-forth of questions and answers across Slack, Email, and Zoom.
Operators shouldn’t have to wait days or even weeks for the answers they need, while technical teams should be able to focus on strategic work without constant pings for help on tactical, ad-hoc questions.
Stop me if you’ve gone through this process before. You need to prepare for a presentation. You go to your apps, pull CSVs, build some metrics, screenshot some charts, and throw them into a deck. You present the information and get some live questions about the data. Unfortunately, because the chart is a static screenshot, you can’t dive into the data live, so you promise to follow up afterward.
The hours-long workflow we endure to prepare presentations across a suite of apps is fundamentally broken. There’s no connective tissue between the data sources, the analysis tools, the presentation, and the feedback on the data.
Because of these silos, we may get fewer eyeballs from peers before a presentation. We might forget to include a killer insight or make a potential error in our analyses. Because it's too difficult to answer questions together live during a presentation, we can get caught flat-footed and look unprepared. And because it's so difficult to measure and discuss progress on metrics together, we fail to monitor updates closely and act quickly and decisively as teams.
To help teams move quickly to get the answers they need, you should consider investing in collaborative processes and tools. See below for some key components you should consider.
Having a single source of truth that’s accurate and up-to-date ensures your teams can all work off the correct data independently and confidently. Best-in-class companies typically use a data warehouse and ETL tools to centralize data from multiple sources in one place. If you don’t know where to start, check out my guide on navigating the tools in the modern data stack.
Suppose you don’t have the bandwidth or technical experts to implement a data stack correctly. In that case (time for a quick plug), Canvas provides a managed data stack for centralizing your apps and modeling your data in one place.
With your data in one place, you’ll need to organize it, provide context, and make it easy to find. Key metrics and common areas should be searchable and social proof should help teams quickly identify the right data to answer their questions.
You should have clear owners for building and maintaining dashboards on the company and team levels. They should define and document data clearly and ensure a clear process for getting help on questions. You should be aware of which dashboards are frequently used and invest accordingly, while continually improving or pruning dashboards that are ghost towns.
Shared, collaborative workspaces are critical in breaking down silos and helping teams share and discuss insights with others.
Yes, having dashboards are a must. But as I pointed out, your teams will still have everyday questions that dashboards don’t answer. Instead of a ticketing system, you should have a way for your operators to answer these questions without knowing SQL.
No-code and spreadsheet-like interfaces help your operators gain self-serve access without going days or weeks for technical teams to answer strategic and tactical questions. And when they have a question, operators and technical teams can collaborate and comment right where the data lives. This way, you can ditch the screenshots, snippets, and Slack channels and start making decisions together from anywhere.
At Canvas, we’re thrilled to be working on the future of data collaboration. Unlike legacy BI tools, Canvas has:
You’ve likely heard about ELT — Extract Load and Transform… the Modern Data Stack’s evolution on ETL. This is a game changer by nature in that it enables organizations to ingest raw data into the data warehouse and transform it later. ELT gives end-users access to the entirety of the datasets they need by circumventing downstream issues of missing data that could prevent a specific business question from being answered.
A majority of business leaders believe data insights are key to the success of their business in a digital environment. However, many companies struggle to build a data-driven culture, with a key reason being the lack of a sound data democratization strategy.
Just like data mesh or the metrics layer, active metadata is the latest hot topic in the data world. As with every other new concept that gains popularity in the data stack, there’s been a sudden explosion of vendors rebranding to “active metadata”, ads following you everywhere and… confusion.
As the amount of data rapidly increases, so does the importance of data wrangling and data cleansing. Both processes play a key role in ensuring raw data can be used for operations, analytics, insights, and inform business decisions.
Do you know the current status — quality, reliability, and uptime — of your data and data systems? Not last month or last week, but where they stand at this moment. As businesses grow, being able to confidently answer this question becomes more important. That’s because data needs to be clean, accurate, and up-to-date to be considered reliable for analysis and decision-making. This confidence comes through what’s known as data observability.
In the past years, organizations have been investing heavily to convert themselves into data-driven organizations with the objective to personalize customer experiences, optimize business processes, drive strategic business decisions, etc. As a result, modern data environments are constantly evolving and becoming more and more complex. In general, more data means more business insights that can lead to better decision-making. However, more data also means more complex data infrastructure, which can cause decreased data quality, a higher chance of data breaking, and consequently erosion of data trust within organizations and risk of not being compliant with regulations. The data observability category — which has quickly been developing during the past couple of years — aims to solve these challenges by enabling organizations to trust their data at all times. Although the category is relatively young, there are already a wide variety of players with different offerings and applying various technologies to solve data quality problems.
Data governance is more than just having a strategy – it is about establishing a culture where quality data is achieved, maintained, valued, and used to drive the business. Modern-day businesses are supported by data and information in many ways and forms. In recent years, data has become the foundation for competition, productivity, growth, and innovation. We are seeing successful organizations shift their focus from producing data to consuming it, and data governance strategies becoming increasingly important to support their crucial business initiatives. Executives and shareholders are starting to realize that data is a strategic asset and data governance is a must if they want to get value from data.
I started my career as a first-generation analyst focusing on writing SQL scripts, learning R, and publishing dashboards. As things progressed, I graduated into Data Science and Data Engineering where my focus shifted to managing the life-cycle of ML models and data pipelines. 2022 is my 16th year in the data industry and I am still learning new ways to be productive and impactful. Today, I am now the head of a data science & data engineering function in one of the unicorns and I would like to share my findings and where I am heading next.