Playbook Board
Pipeline Flow
Playbook view of the ELT pipeline from extraction to export
Pipeline Flow
This page reads like a set play: bring the ball in from the NBA API, validate each possession, stage the data, and fan it out into star-schema outputs that are ready for analysis and distribution.
Playbook cue: The validation checkpoints work like replay review — they stop bad possessions before they become downstream tables.
The shape of the play stays the same across init, daily, monthly,
full, and export; only the scope and runtime change.
Quick navigation
Read the whole play left to right
Start with
Read the possession left to right
when you need the fast version before any command or table detail.
Check the command lane
Jump to Pipeline commands when you already know the stages and only need the run-mode route.
Focus on guardrails
Use
Read the possession left to right
for the validation checkpoints that stop bad data before it reaches star outputs.
Leave the playbook for dependency trace
Skip to Next steps when the stage map is clear and you need endpoint coverage, ER shape, or lineage.
Use this page when…
| If you need to answer… | Start here |
|---|---|
| “Where does validation happen?” | Read the possession left to right |
“What actually changes between init, daily, monthly, full, and export?” | Pipeline commands |
| “Which stages produce the public warehouse surface?” | Read the possession left to right |
| “Where should I go after the stage map?” | Next steps from pipeline flow |
nbadb follows an ELT (Extract, Load, Transform) pipeline pattern.
flowchart TD
subgraph Extract["1. Extract"]
API["nba_api
stats + static + live"] --> Raw["Raw Polars
DataFrames"]
Static["Static Data
Players & Teams"] --> Raw
end
subgraph Validate1["2. Raw Validation"]
Raw --> RawSchema["Pandera Raw
Schema Check"]
end
…Read the possession left to right
| Stage | What to look for | Why it matters |
|---|---|---|
| 1. Extract | Which endpoints and static feeds start the run | This is the inbound surface and the first place coverage gaps appear |
| 2. Raw validation | Structural checks on API-shaped payloads | Bad possessions get stopped before they are staged as if they were trustworthy |
| 3. Stage to DuckDB | Normalized stg_* landing zone | This is the operational layer most transforms depend on directly |
| 4. Staging validation | Type, nullability, and range checks | Naming is normalized here and contract drift becomes visible |
| 5. Transform | Dimension, fact, bridge, aggregate, and analytics builders | This is where warehouse shape and dependency fan-out happen |
| 6. Star validation | Final schema enforcement on public tables | It protects the analytical contract before export |
| 7-8. Export and distribute | SQLite, DuckDB, Parquet, CSV, and Kaggle lanes | This is the finish: same modeled surface, different packaging |
The short read
- Extract raw payloads from live endpoints and static reference sources.
- Validate the raw and staging layers before transform logic touches downstream models.
- Transform staging tables into public dimensions, facts, bridges, aggregates, and analytics views.
- Export and distribute the validated star surface to SQLite, DuckDB, Parquet, CSV, and Kaggle-ready artifacts.
Pipeline commands
| Command | Stages | Duration |
|---|---|---|
nbadb init | 1-8 (full rebuild) | ~2-4h |
nbadb daily | 1-7 (incremental, 7-day lookback) | ~5-15m |
nbadb monthly | 1-7 (dimension refresh) | ~30-60m |
nbadb full | 1-7 (fill gaps, preserve existing) | ~2-4h |
nbadb export | 7-8 (re-export only) | ~5-10m |
Key Technologies
- Polars: Primary DataFrame engine for all transforms
- DuckDB: Staging engine with zero-copy Arrow interchange
- Pandera: 3-tier schema validation (raw, staging, star)
- ADBC: Arrow Database Connectivity for SQLite export
- zstd: Compression for Parquet output files
Next steps from pipeline flow
Reconnect each stage to actual source families
Use Endpoint Map when you need to know which endpoint families feed the possession before it reaches staging and transform layers.
Inspect the finishing lineup
Open ER Diagram when the playbook has shown the movement and you now need the shape of the dimensions, facts, and bridges produced at the end.
Replay one dependency chain in slow motion
Continue to Table Lineage when a pipeline stage is not specific enough and you need the exact tables involved in one downstream possession.
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.