Ball Movement
Data Lineage
Follow ball movement from NBA API sources through staging and into the star schema
Data Lineage
Lineage is the film room for nbadb. Instead of watching a possession from the broadcast angle, you watch every touch: the inbound pass at the NBA API, the outlet into raw capture, the half-court reset in staging, and the finish in the analytical warehouse.
Replay-review note: Start here when the question is "where did this come from?" or "what breaks if I change this?"
lineage-auto.mdx is generator-owned. Use the curated pages in this section
for orientation, then use the generated map when you need exhaustive,
code-sourced dependency detail.
Pick the replay lens
Table Lineage
Start with Table Lineage when you need the full possession chain from source feed to downstream table.
Column Lineage
Start with Column Lineage when the breakage is local to one key, metric, rename, or constraint.
Generated Lineage Map
Start with Generated Lineage Map when the curated examples are not wide enough and you need exhaustive, code-sourced coverage.
Choose by symptom
| If you need to answer... | Start here | Then escalate to... |
|---|---|---|
| “Where did this table come from?” | Table Lineage | Generated Lineage Map if you need the full graph |
| “Which upstream field fed this column?” | Column Lineage | Generated Lineage Map if the example trail is not enough |
| “What else breaks if I change this staging schema or transformer?” | Table Lineage | Generated Lineage Map for exhaustive dependency blast radius |
| “Where is the exact code-derived dependency map?” | Generated Lineage Map | Schema Reference or Data Dictionary once you need contracts |
Curated vs generated boundary
| Surface | Optimized for | Not trying to do | Maintenance path |
|---|---|---|---|
| Table Lineage | Table-level dependency tracing and impact analysis | Exhaustively list every transformer edge | Hand-authored |
| Column Lineage | Field-level debugging examples and rename tracing | Replace the full auto-generated graph | Hand-authored |
| Generated Lineage Map | Exhaustive dependency lookup sourced from code metadata | Teach route selection or worked examples | Regenerate, do not hand-edit |
Watch one possession end to end
| Touch | What changes | What to verify |
|---|---|---|
| Source feed | endpoint-specific naming and result-set shape | the field still maps cleanly back to the upstream NBA surface |
| Raw capture | source shape is preserved with minimal interpretation | you can still reason about the original payload without warehouse assumptions |
| Staging | names normalize to snake_case, types tighten, and join anchors become stable | downstream transforms have a clean, typed lane to build on |
| Star / analytics surface | the field becomes part of an analyst-facing grain or convenience view | readers can join or filter without knowing the source quirks that started the possession |
Text fallback: use lineage to find the stage where a field stopped being source-shaped and became warehouse-safe. That is usually the moment when debugging, documentation, and join strategy become easier.
Why Lineage Matters
- Debugging: When a value looks wrong in a fact table, trace it back to the source API endpoint
- Impact analysis: Before changing a staging schema, see which downstream tables are affected
- Coverage: Identify which API endpoints feed which warehouse tables
- Documentation: Understand the complete data flow without reading transform code
Possession Map
flowchart LR
A["Tip-off
NBA API"] --> B["Outlet pass
Raw capture"]
B --> C["Half-court set
Staging validation"]
C --> D["Finish
Star schema"]
D --> E["Kick-out
Aggregates, analytics, export"]
style A fill:#e1f5fe
style B fill:#fff8e1
…Read it left to right: sources start the action, raw preserves the original shape, staging organizes the possession, and the star surface makes the result queryable.
Read the replay by question
| Question | Focus on | Then route to |
|---|---|---|
| “Where did the chain start?” | The source and raw touches in the possession map | Endpoints if you need source-family detail |
| “Where was the shape normalized?” | The staging touch and validation table below | Schema Reference if you need exact contracts |
| “What table or view finished the play?” | The star and export touches | Table Lineage or Column Lineage for the detailed replay |
NBA API --> Raw capture --> Staging validation --> Star surface --> Export
source preserve feed normalize + type dim/fact/agg SQLite /
+ dependency flow DuckDB / Parquet / CSVEach stage applies progressively stricter validation:
| Stage | Schema Layer | Validation | Column Names |
|---|---|---|---|
| Extract | Raw | Structural only | UPPER_CASE (API native) |
| Stage | Staging | Types + nullability + ranges | snake_case |
| Transform | Star | Full constraints + FK refs | snake_case |
Generation
Lineage documentation can be regenerated from transform code:
uv run nbadb docs-autogen
# or: uv run python -m nbadb.docs_genThis introspects BaseTransformer.depends_on and staging schema metadata["source"] to build lineage graphs automatically.
Next steps from lineage
Switch from dependency to warehouse shape
Move to Diagrams when you understand the chain of custody and now need the faster visual board for schema shape, pipeline flow, or endpoint coverage.
Verify exact contracts after the replay
Continue to Schema Reference or the Data Dictionary when the lineage answer still needs an exact column contract, field meaning, or naming convention check.
Reconnect the replay to source scouting reports
Jump to Endpoints when the upstream question is really about the nba_api family, result set, or extractor surface that starts the possession.
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.
