E2E Real-Time Data Pipeline

Real-time data pipeline with Kafka, Flink, Iceberg, Trino, and Superset.

β€’
Apache FlinkΒ·
Apache IcebergΒ·
Apache KafkaΒ·
AWS S3Β·
Trino

πŸ“– OverviewThis project demonstrates a real-time end-to-end (E2E) data pipeline designed to handle clickstream data. It shows how to ingest, process, store, query, and visualize streaming data using o...

Screenshot 1

About this project

πŸ“– Overview

This project demonstrates a real-time end-to-end (E2E) data pipeline designed to handle clickstream data. It shows how to ingest, process, store, query, and visualize streaming data using open-source tools, all containerized with Docker for easy deployment.

πŸ”Ž Technologies Used:

  • Data Ingestion: Apache Kafka

  • Stream Processing: Apache Flink

  • Object Storage: MinIO (S3-compatible)

  • Data Lake Table Format: Apache Iceberg

  • Query Engine: Trino

  • Visualization: Apache Superset

Flow

  1. Clickstream Data Generator simulates real-time user events and pushes them to Kafka topic.

  2. Apache Flink processes Kafka streams and writes clean data to Iceberg tables stored on MinIO.

  3. Trino connects to Iceberg for querying the processed data.

  4. Apache Superset visualizes the data by connecting to Trino.

πŸ† Key Features

πŸ”„ Real-Time Data Processing

  • Stream processing with Apache Flink.

  • Clickstream events are transformed and filtered in real-time.

πŸ“‚ Modern Data Lakehouse

  • Data is stored in Apache Iceberg on MinIO, S3 compatible, supporting schema evolution and time travel.

⚑ Fast SQL Analytics

  • Trino provides fast, distributed SQL queries on Iceberg data.

πŸ“Š Interactive Dashboards

  • Apache Superset delivers real-time visual analytics.

πŸ“¦ Fully Containerized Setup

  • Simplified deployment using Docker and Docker Compose for seamless integration across all services.

Stack:
Apache FlinkApache IcebergApache KafkaAWS S3Trino
Team

You must be logged in to comment

Sign in to comment

Comments

No comments yet

Be the first to share your thoughts!

Project Info

Published on Nov 25, 2025
View on GitHub