Discover, Share & Showcase
the Best Data Engineering Projects
Explore curated Data Engineering projects from the community. Be recognized for your projects, vote for your favorites and share your own creations.
Week of Dec 28, 2025
Dec 28 - Jan 3
Week of Dec 21, 2025
Dec 21 - 27
3.Silism Commerce 360 - Local AI Native Lakehouse
AI-native e-commerce data platform you can run locally (Airflow + dbt + MCP)
4.Real-Time-Sales-Streaming-Pipeline
Modern Lakehouse Architecture with Kafka + Spark Structured Streaming + Delta Lake
5.AIRFLow Medical Data Pipeline
Enterprise-grade ETL pipeline transforming medical XML data into actionable business intelligence
6.Bluesky NBA Real-Time Sentiment Analysis
A real-time data streaming pipeline that captures live posts from Bluesky regarding the NBA, perform
Week of Dec 14, 2025
Dec 14 - 20
7.Airflow Python React Widgets
Python React Experiment
8.Yelp Batch ETL Pipeline
A batch ETL pipeline that processes Yelp business raw data to generate analytics and insights
9.Airflow and DBT Analytics Accelerator
An Open-source accelerator for a ready-to-run, end-to-end analytics platform
10.Cricket Analytics Data Pipeline
CAP is an end-to-end cricket analytics platform built on Cricsheet ball-by-ball data
Week of Dec 7, 2025
Dec 7 - 13
Week of Nov 23, 2025
Nov 23 - 29
12.Smart Wardrobe Suggestion
LLM Based Smart Clothing Suggestion
13.Reddit ETL Pipeline in Docker
Reddit Data Engineering ETL Pipeline: Spark, Airflow, MinIO in Docker Medallion Architecture
14.Flink Sales Pipeline
Real-Time E-Commerce Sales Analytics Pipeline
15.Baskpipe
Fully AWS-native data pipelines for processing basketball (NBA) data.
16.Data Warehousing for Realtime Pipelines
Building a real-time data warehouse with the use of state-of-the-art tools like Apache Kafka..etc
17.Github Stars Monitor
Never miss a new top starred repository
18.Macro Agents Economic Data Platform
From FRED to Forecasts: A Modern Data Stack for Economic Intelligence
19.E2E Real-Time Data Pipeline
Real-time data pipeline with Kafka, Flink, Iceberg, Trino, and Superset.
20.F1 Insights Real Time Replay
What if your dashboards were as realtime as Max vestappen!
21.Batch data pipeline
Batch Data Pipeline with Airflow, DuckDB, Delta Lake, Trino and Metabase. Observability and quality.
22.Daggie The Airflow DAG Quality Auditor
A friendly (and sometimes strict!) animated DAG auditor for Apache Airflow 3.1+
Week of Nov 16, 2025
Nov 16 - 22
23.Automated News Intelligence Pipeline
An end-to-end automated pipeline for collecting, processing, and analyzing news articles with machin
24.Dbt power tools AI based Documentation
A powerful CLI tool that generates LLM-powered documentation for dbt models and columns
Week of Nov 9, 2025
Nov 9 - 15
25.AIRFLOW YAHOO ETL
SCALABLE_YAHOO_API_ETL_PIPELINE_USING_AIRFLOW
26.Airflow Bulk Pause Unpause Plugin
Bulk manage Airflow DAG states effortlessly — pause or unpause in one action.
27.AirfloGotchi
AirfloGotchi is a virtual pet game integrated with Airflow to keep your DAGs healthy