MDS Community

World’s first data community focused on the Modern Data Stack. Our mission is to create a single platform for data practitioners from around the world to learn & share all things Modern Data Stack.
2346 Members

MDS Community
Timeplus: Turn your streaming data in Kafka/Pulsar/DB to real-time charts and alerts
share

Glad to join the community. This is Jove, from timeplus.com We are building a cool service to help you to quickly build real-time applications easily, mainly with SQL. It's publicly available on timeplus.cloud, you can sign up a free account and connect to your streaming data and make sense of it with SQL and real-time charts.Just to give a few examples(usecases):* streaming ETL: filter or aggregate your data in Kafka/Confluent/Pulsar topics, route/transform the data and send to other message bus, or send to databases such as snowflake* build real-time charts: for example what are most active github repos in the past 5-10mins, or hot tweets on twitter* build realtime feature store for machine learning* build real-time alerts, for example when the user signs up your site, takes a certain actionBesides the SaaS/PaaS offering, we also provide on-prem/BYOC deployment. Comparing to Flink/Spark, much lower infra cost (no JVM) and high performance.Look forwards to your feedback if you can try this.

Nov 16, 2022
Build a visual data ETL pipeline and a Cow Counter Dashboard using VDP, PostgreSQL and Metabase
share

Hey folks! Background: we're building VDP https://github.com/instill-ai/vdp, an open-source visual data ETL tool to streamline the end-to-end visual data processing pipeline. Recently, I built a prototype to analyse livestock in a drone video of a cattle farm. First, I built an object detection ETL pipeline with our tool VDP to analyse the video, and stored the analysis results in our PostgreSQL database. Then, based on the data in the database I created a "Cow Counter" Dashboard using Metabase that tracks every time a cow 🐄 appears in the video footage. Check out the step-by-step tutorial https://www.youtube.com/watch?v=0Rdv8oqqxfw

Sep 13, 2022
VDP - Open source visual data ETL for developers
share

Hi everyone! We're building VDP (https://github.com/instill-ai/vdp), an open-source ETL tool for unstructured visual data.When people say they are data-driven, most of the time it means they are driven by structured data. I will cut the part where we cite reports claiming that 80% of data are unstructured. The reality is unstructured data are more difficult to analyse and not a lot of companies know or have the resources to deal with them. That's why we decided to build VDP, an open-source, general and modularised ETL infrastructure for unstructured visual data for a broader community.VDP is built from a data-driven perspective. Although the computer vision model is the most critical component in a visual data ETL pipeline, the ultimate goal of VDP is to streamline the end-to-end visual data flow, with the transform component being able to flexibly import computer vision models from different sources.Today, the early version of VDP supports 2 sources and all Airbyte destination connectors, and it can import computer vision models from various sources including Local, GitHub, DVC, ArtiVC and Hugging Face.VDP can run locally with Docker Compose. We're working on integrating with Kubernetes and a fully managed version in Instill Cloud. Setting up a VDP pipeline is fairly easy via its low-code API and no-code Console. Please take a look at the tutorial: https://www.instill.tech/docs/tutorials/build-an-async-det-pipelineWe aim to build VDP as the single point of visual data integration, so users can sync visual data from anywhere into centralised warehouses or applications and focus on gaining insights across all data sources, just like how the modern data stack handles structured data.Thanks for reading. We are first-time open-source project maintainers. There are definitely lots to learn! Let us know what you think.

Aug 24, 2022
Bringing ClickHouse to data science notebooks
share

With my team at @Deepnote we're *big* ClickHouse fans, so we’ve built a ClickHouse integration into our notebooks. This lets you run SQL queries against your CH instance in a notebook environment and 100x your performance against traditional databases. The one thing that has been most helpful to me is the interoperability of SQL with Python, which means I get to save the results of my CH queries as Python variables and switch back and forth in one space + do bunch of other things like create quick visualizations and build out dashboards on the top of my notebooks for ad hoc reporting. If you’re a ClickHouse and/or a notebooks user, I’d love to hear from you and see how we can make this even more useful: https://deepnote.com/blog/clickhouse-cl4zs29ikocet0blrv08ishlj

Jun 30, 2022
Which MDS categories do you think might get bundled in the future?
seek

With @Hightouch acquiring Workbase and @Airbyte acquiring @Grouparoo - what MDS categories you think are deemed to be bundled together?

Apr 27, 2022
Is the MDS converging to open-source?
seek

Looking at the rocketship growth that @Airbyte has achieved in the past year, @Transform Data open-sourcing their metrics layer, is the future of MDS open-source?Ref: https://airbyte.com/blog/goodbye-2021-welcome-2022

Apr 18, 2022