Yelp Medallion Batch Pipeline
Building medallion architecture for crowd-sourced reviews using Snowflake native features
This batch data pipeline implements a medallion architecture data pipeline for the Yelp Open Dataset using Snowflake Dynamic Tables. The project demonstrates modern data engineering best practices by...

About this project
This batch data pipeline implements a medallion architecture data pipeline for the Yelp Open Dataset using Snowflake Dynamic Tables. The project demonstrates modern data engineering best practices by processing raw JSON data through Bronze, Silver, and Gold layers to create analytics-ready datasets.
Business Value
This project provides:
360° Business Intelligence: Complete view of businesses, users, and reviews
Scalable Analytics: Dimensional design supports complex analytical workloads
Near-Real-time Insights: Dynamic Tables provide fresh data with minimal latency
Cost Optimization: Automated warehouse management and incremental processing
Enterprise-Ready: Production-grade data governance and quality controls
Key Highlights
Medallion Architecture: Bronze (Raw) → Silver (Cleaned) → Gold (Business Logic)
Snowflake Dynamic Tables: Automated refresh management and incremental processing
Dimensional Modeling: Star schema design optimized for analytics
Advanced SQL Transformations: JSON processing, array operations, and business logic
📋 Prerequisites
Snowflake Requirements
Snowflake account with SYSADMIN role access
Warehouse creation privileges
Database creation privileges
Dynamic Tables feature enabled
Local Environment
Python 3.7+ with standard libraries
Access to Yelp Open Dataset (see Licensing)
Snowflake connectivity (SnowSQL, Python connector, or Snowsight)
✨ Features
JSON Processing: Complex VARIANT data extraction and transformation
Array Operations: Category parsing, friend networks, elite year processing
Dynamic Tables: Automated incremental refresh management
Warehouse Management: Auto-suspend/resume for cost optimization
Scalable Architecture: Designed for large dataset processing
Audit Trails: Complete data lineage from source to gold
Metadata Tracking: File-level processing information
You must be logged in to comment
Sign in to commentComments
No comments yet
Be the first to share your thoughts!