Problem
A production schema I worked with had grown past 190 tables across five logical groups. Every tool for visualizing and versioning it was unacceptable in a different way:
- MySQL Workbench — binary
.mwbfiles, unreadable in git diffs - dbdiagram.io — no auto-grouping, dumps everything into one wall of tables
- DrawDB — visibly lags past ~100 tables
- DbSchema — the only tool with smart grouping, but $59 per seat after a 15-day trial
- DBML format — its
TableGroupdirective is ignored by most renderers
Nothing free handled “200+ tables with smart auto-grouping and a git-friendly file format.”
What I built
A Tauri 2 desktop app with a Rust analyzer driving a React Flow canvas.
- Rust backend (
schema_analyzer.rs, 1,043 lines) runs the O(n²) relationship analysis and layout. Hub-based clustering finds tables with the most foreign-key connections, treats them as cluster centroids, then assigns peripheral tables to their strongest hub. Orphans are grouped by keyword similarity — Levenshtein distance, underscore/camelCase tokenization, and prefix/suffix overlap. - React + React Flow (WebGL canvas) renders the graph. Drag, zoom, and pan stay smooth at 200+ nodes because the heavy math never crosses into JS.
- Tauri 2 over Electron — ~3 MB binary vs. 100 MB+, with native system access for file dialogs and future direct-MySQL connections.
- Custom JSON file format stores tables, relationships, positions, colors, and groups. Designed specifically for git: stable ordering, one table per block, human-readable diffs.
Interesting bits
- Layout is a three-pass pipeline. Groups are placed on a balanced grid with placement weights derived from inter-group edge counts. Tables within each group use a radial layout around their hub. A final
optimize_layoutpass detects and resolves overlaps the first two passes can’t avoid. - Keyword tokenization handles real-world naming.
user_addressesanduserAddressestokenize to[user, addresses]and match even though their raw strings don’t. Prefix and suffix checks catch conventions likeorder_*or*_audit. - Why Rust for the math. O(n²) over 200+ tables would stall JS under GC pressure during the analysis pass, so the math lives in Rust from the start. Rust returns serialized JSON and the Tauri IPC boundary is the only crossing the UI has to make.
- JSON over binary was the single most useful decision. PR reviews now show real schema diffs instead of “binary file changed.”