Batch Processing Data Engineering Projects
Scheduled data processing jobs
Discover open-source data engineering projects in Batch Processing from the community.
13 projects found
1.Real-Time-Sales-Streaming-Pipeline
Modern Lakehouse Architecture with Kafka + Spark Structured Streaming + Delta Lake
2.Bluesky NBA Real-Time Sentiment Analysis
A real-time data streaming pipeline that captures live posts from Bluesky regarding the NBA, perform
3.Yelp Batch ETL Pipeline
A batch ETL pipeline that processes Yelp business raw data to generate analytics and insights
4.Yelp Medallion Batch Pipeline
Building medallion architecture for crowd-sourced reviews using Snowflake native features
5.Smart Wardrobe Suggestion
LLM Based Smart Clothing Suggestion
6.Reddit ETL Pipeline in Docker
Reddit Data Engineering ETL Pipeline: Spark, Airflow, MinIO in Docker Medallion Architecture
7.Baskpipe
Fully AWS-native data pipelines for processing basketball (NBA) data.
8.Github Stars Monitor
Never miss a new top starred repository
9.Daggie The Airflow DAG Quality Auditor
A friendly (and sometimes strict!) animated DAG auditor for Apache Airflow 3.1+
10.Automated News Intelligence Pipeline
An end-to-end automated pipeline for collecting, processing, and analyzing news articles with machin
11.Dbt power tools AI based Documentation
A powerful CLI tool that generates LLM-powered documentation for dbt models and columns
12.AIRFLOW YAHOO ETL
SCALABLE_YAHOO_API_ETL_PIPELINE_USING_AIRFLOW
13.Airflow Bulk Pause Unpause Plugin
Bulk manage Airflow DAG states effortlessly — pause or unpause in one action.