Back to all projects

Batch Processing Data Engineering Projects

Scheduled data processing jobs

Discover open-source data engineering projects in Batch Processing from the community.

13 projects found

1.Real-Time-Sales-Streaming-Pipeline

Modern Lakehouse Architecture with Kafka + Spark Structured Streaming + Delta Lake

by imen.bnamar

2.Bluesky NBA Real-Time Sentiment Analysis

A real-time data streaming pipeline that captures live posts from Bluesky regarding the NBA, perform

by imen.bnamar

3.Yelp Batch ETL Pipeline

A batch ETL pipeline that processes Yelp business raw data to generate analytics and insights

+1
by darracq.aurelien

4.Yelp Medallion Batch Pipeline

Building medallion architecture for crowd-sourced reviews using Snowflake native features

by vikneshwararb

5.Smart Wardrobe Suggestion

LLM Based Smart Clothing Suggestion

by Rahul Rajasekharan

6.Reddit ETL Pipeline in Docker

Reddit Data Engineering ETL Pipeline: Spark, Airflow, MinIO in Docker Medallion Architecture

+1
by Abdullah

7.Baskpipe

Fully AWS-native data pipelines for processing basketball (NBA) data.

+1
by dominik.zsajovic

8.Github Stars Monitor

Never miss a new top starred repository

by maxime.lemaitre

9.Daggie The Airflow DAG Quality Auditor

A friendly (and sometimes strict!) animated DAG auditor for Apache Airflow 3.1+

by Rahul Rajasekharan

10.Automated News Intelligence Pipeline

An end-to-end automated pipeline for collecting, processing, and analyzing news articles with machin

by charbeldaher34

11.Dbt power tools AI based Documentation

A powerful CLI tool that generates LLM-powered documentation for dbt models and columns

+1
by Rahul Rajasekharan

12.AIRFLOW YAHOO ETL

SCALABLE_YAHOO_API_ETL_PIPELINE_USING_AIRFLOW

+1
by ravitejach888

13.Airflow Bulk Pause Unpause Plugin

Bulk manage Airflow DAG states effortlessly — pause or unpause in one action.

by Rahul Rajasekharan