nbadbArena Data Lab
DiagramsTelestratorPipeline Flow8 waypoints

Playbook Board

Pipeline Flow

Playbook view of the ELT pipeline from extraction to export

Pipeline Flow

This page reads like a set play: bring the ball in from the NBA API, validate each possession, stage the data, and fan it out into star-schema outputs that are ready for analysis and distribution.

Playbook cue: The validation checkpoints work like replay review — they stop bad possessions before they become downstream tables.

The shape of the play stays the same across init, daily, monthly, full, and export; only the scope and runtime change.

Quick navigation

Entry surface

Read the whole play left to right

Start with

Read the possession left to right

when you need the fast version before any command or table detail.

Entry surface

Check the command lane

Jump to Pipeline commands when you already know the stages and only need the run-mode route.

Entry surface

Focus on guardrails

Use

Read the possession left to right

for the validation checkpoints that stop bad data before it reaches star outputs.

Next route

Leave the playbook for dependency trace

Skip to Next steps when the stage map is clear and you need endpoint coverage, ER shape, or lineage.

Use this page when…

If you need to answer…Start here
“Where does validation happen?”Read the possession left to right
“What actually changes between init, daily, monthly, full, and export?”Pipeline commands
“Which stages produce the public warehouse surface?”Read the possession left to right
“Where should I go after the stage map?”Next steps from pipeline flow

nbadb follows an ELT (Extract, Load, Transform) pipeline pattern.

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart TD
    subgraph Extract["1. Extract"]
        API["nba_api
stats + static + live"] --> Raw["Raw Polars
DataFrames"]
        Static["Static Data
Players & Teams"] --> Raw
    end
    subgraph Validate1["2. Raw Validation"]
        Raw --> RawSchema["Pandera Raw
Schema Check"]
    end
Call the stages

Read the possession left to right

StageWhat to look forWhy it matters
1. ExtractWhich endpoints and static feeds start the runThis is the inbound surface and the first place coverage gaps appear
2. Raw validationStructural checks on API-shaped payloadsBad possessions get stopped before they are staged as if they were trustworthy
3. Stage to DuckDBNormalized stg_* landing zoneThis is the operational layer most transforms depend on directly
4. Staging validationType, nullability, and range checksNaming is normalized here and contract drift becomes visible
5. TransformDimension, fact, bridge, aggregate, and analytics buildersThis is where warehouse shape and dependency fan-out happen
6. Star validationFinal schema enforcement on public tablesIt protects the analytical contract before export
7-8. Export and distributeSQLite, DuckDB, Parquet, CSV, and Kaggle lanesThis is the finish: same modeled surface, different packaging

The short read

  1. Extract raw payloads from live endpoints and static reference sources.
  2. Validate the raw and staging layers before transform logic touches downstream models.
  3. Transform staging tables into public dimensions, facts, bridges, aggregates, and analytics views.
  4. Export and distribute the validated star surface to SQLite, DuckDB, Parquet, CSV, and Kaggle-ready artifacts.

Pipeline commands

CommandStagesDuration
nbadb init1-8 (full rebuild)~2-4h
nbadb daily1-7 (incremental, 7-day lookback)~5-15m
nbadb monthly1-7 (dimension refresh)~30-60m
nbadb full1-7 (fill gaps, preserve existing)~2-4h
nbadb export7-8 (re-export only)~5-10m

Key Technologies

  • Polars: Primary DataFrame engine for all transforms
  • DuckDB: Staging engine with zero-copy Arrow interchange
  • Pandera: 3-tier schema validation (raw, staging, star)
  • ADBC: Arrow Database Connectivity for SQLite export
  • zstd: Compression for Parquet output files
Next board cut

Next steps from pipeline flow

Next stop

Reconnect each stage to actual source families

Use Endpoint Map when you need to know which endpoint families feed the possession before it reaches staging and transform layers.

Next stop

Inspect the finishing lineup

Open ER Diagram when the playbook has shown the movement and you now need the shape of the dimensions, facts, and bridges produced at the end.

Next stop

Replay one dependency chain in slow motion

Continue to Table Lineage when a pipeline stage is not specific enough and you need the exact tables involved in one downstream possession.

Keep moving

Stay in the same possession

Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.

Section hub

On this page