share
Aug 24, 2022

VDP - Open source visual data ETL for developers

Hi everyone! We're building VDP (https://github.com/instill-ai/vdp), an open-source ETL tool for unstructured visual data.When people say they are data-driven, most of the time it means they are driven by structured data. I will cut the part where we cite reports claiming that 80% of data are unstructured. The reality is unstructured data are more difficult to analyse and not a lot of companies know or have the resources to deal with them. That's why we decided to build VDP, an open-source, general and modularised ETL infrastructure for unstructured visual data for a broader community.VDP is built from a data-driven perspective. Although the computer vision model is the most critical component in a visual data ETL pipeline, the ultimate goal of VDP is to streamline the end-to-end visual data flow, with the transform component being able to flexibly import computer vision models from different sources.Today, the early version of VDP supports 2 sources and all Airbyte destination connectors, and it can import computer vision models from various sources including Local, GitHub, DVC, ArtiVC and Hugging Face.VDP can run locally with Docker Compose. We're working on integrating with Kubernetes and a fully managed version in Instill Cloud. Setting up a VDP pipeline is fairly easy via its low-code API and no-code Console. Please take a look at the tutorial: https://www.instill.tech/docs/tutorials/build-an-async-det-pipelineWe aim to build VDP as the single point of visual data integration, so users can sync visual data from anywhere into centralised warehouses or applications and focus on gaining insights across all data sources, just like how the modern data stack handles structured data.Thanks for reading. We are first-time open-source project maintainers. There are definitely lots to learn! Let us know what you think.

0 Reply
Please login to reply