Drift Detective

Drift Detective is a Python library for tracking schema evolution using versioned JSON snapshots

Python

Drift Detective “Did the structure of my data change, and should I care?”Drift Detective is a Python library for tracking schema evolution and detecting structural drift in tabular datasets using vers...

Screenshot 1

About this project

Drift Detective

“Did the structure of my data change, and should I care?”

Drift Detective is a Python library for tracking schema evolution and detecting structural drift in tabular datasets using versioned JSON snapshots.

It is designed for data workflows where table schemas evolve over time.

The library focuses on schema-level changes and not row-level

Drift Detective is built around four core components, each responsible for a specific part of schema tracking and reporting:

  • DfSnapshot: Captures the schema state of a pandas DataFrame at a specific point in time and stores it as a versioned snapshot.

  • SnapshotHistory: Creates a schema evolution timeline listing version and schema changes.

  • SnapshotDiff: Compares schema changes between two snapshot versions, listing all added and removed columns across intermediate versions.

  • SchemaReport: Integrates all components into a complete report to tell the full story

JSON snapshot

{ "table_name": "netflix_titles", "filepath": "netflix_titles.csv", "timestamp": "20251230_161527", "version": 1, "column_count": 12, "row_count": 8807, "schema": { "show_id": "object", "type": "object", "title": "object", "director": "object", "cast": "object", "country": "object", "date_added": "object", "release_year": "int64", "rating": "object", "duration": "object", "listed_in": "object", "description": "object" }, "columns_added": [], "columns_removed": [] }Snapshot Timeline for table: netflix_titles ──────────────────────────────────────────────────────────── v1 ● 20251230_162126 │ columns: 12 │ rows: 8807 │ initial snapshot v2 ● 20251230_163649 │ columns: 11 │ rows: 8807 │ - removed columns: title v3 ● 20251230_163729 │ columns: 10 │ rows: 8807 │ - removed columns: listed_in ────────────────────────────────────────────────────────────

Stack:
Python
Team

You must be logged in to comment

Sign in to comment

Comments

No comments yet

Be the first to share your thoughts!

Project Info

Published on Dec 30, 2025
View on GitHub