nbadb Documentation

Welcome to the Arena Data Lab for nbadb: the control tower for getting from first install to first query, first refresh, and first production handoff.

nbadb turns the NBA stats surface into a public analytical model with DuckDB staging, SQL-first transforms, and exports for DuckDB, SQLite, Parquet, and CSV.

Public surface

Star schema

dimensions, facts, bridges, aggregates, and analytics outputs

Extractors

Full coverage

registered wrappers around the current nba_api runtime surface

Core run modes

init, daily, monthly, and backfill run

Export formats

sqlite, duckdb, csv, and parquet by default

Entry surface

Start the install

Jump to Installation for prerequisites, install routes, env defaults, and the first sanity checks.

Entry surface

Understand the pipeline

Open Architecture for the raw → staging → star flow, internal state boundaries, and why the warehouse is shaped this way.

Entry surface

Find the right command

Use CLI Reference when the question is "what command does this?" or "what flags are accepted right now?"

No install

Try the browser explorer

Open SQL Playground when you want to rehearse DuckDB query shapes in-browser before you pull the full dataset locally.

Entry surface

Find the right table

Start with Schema Reference, then move to the Data Dictionary when you need exact fields and meanings.

First five minutes

If you just want to get oriented without reading the whole site, take this route:

Install nbadb from Installation.
Run a first build or pull a published dataset.
Check CLI Reference for the exact command surface.
Use SQL Playground to rehearse the query shape if you want a browser-only warm-up.
Use Schema Reference to find the right table family.
Move into the guide lane that matches your job.

pip install nbadb
nbadb init
nbadb schema

nbadb init is the full historical build and usually takes hours, not minutes. Once your data directory is seeded, nbadb daily becomes the standard game-day possession for refreshing the current season.

Choose your route

Reader lane

I am installing or evaluating nbadb

Start with Installation. It covers the PyPI route, the source route, the NBADB_ settings that matter first, and what should appear in your data directory when things work.

Reader lane

I need to understand how the warehouse works

Read Architecture next. That page explains the pipeline stages, validation tiers, internal pipeline tables, export lanes, and the design choices behind the public model.

Reader lane

I need to query or model against the data

Use SQL Playground when you want a browser-only warm-up, then move to Schema Reference for table discovery, the Data Dictionary for field-level meaning, and DuckDB Query Examples for working SQL against the real model.

Reader lane

I need recurring operator workflows

Keep Daily Updates, Kaggle Setup, and the Troubleshooting Playbook open together. Those are the practical runbook pages.

Reader routes in one glance

Reader	Start here	Keep this open next
Evaluator or first-time installer	Installation	CLI Reference
Analyst looking for usable tables fast	SQL Playground or Schema Reference	DuckDB Query Examples
Builder changing pipeline or docs behavior	Architecture	CLI Reference
Operator running recurring refreshes	Daily Updates	Troubleshooting Playbook

What's on the floor

Surface	Read it as…	Reach for it when…	Start here
Dimensions	`dim_*` identity and lookup context	You need stable entities like players, teams, games, seasons, and history-aware reference context	Dimensions
Facts	`fact_*` event and measurement grain	You need player, team, game, play, shot, tracking, or standings data at its working grain	Facts & Bridges
Bridges	`bridge_*` connectors	You are resolving many-to-many relationships between public entities	Facts & Bridges
Aggregates	`agg_*` reusable rollups	You want pre-rolled summaries instead of rebuilding the same season or career SQL repeatedly	Derived Aggregations
Analytics outputs	`analytics_*` convenience surfaces	You want the shortest route to analysis-ready tables and views	Analytics Views

Role-based lanes

Fast lanes by reader

Who should start here

Analyst lane

Begin with SQL Playground if you want a browser-only warm-up, then keep Analytics Quickstart and DuckDB Query Examples open for real warehouse reads.

Who should start here

Builder lane

Start with Installation, then read CLI Reference and Architecture before changing pipeline behavior, schemas, or docs generation flows.

Who should start here

Operator lane

Use Daily Updates for the recurring refresh cadence, Kaggle Setup for distribution handoffs, and the Troubleshooting Playbook when artifacts or CI go sideways.

Generated vs. hand-written docs

Some docs pages are hand-authored. Others are generated from schema metadata and lineage information.

uv run nbadb docs-autogen --docs-root docs/content/docs

That command owns schema/{raw,staging,star}-reference.mdx, data-dictionary/{raw,staging,star}.mdx, diagrams/er-auto.mdx, lineage/lineage-auto.mdx, docs/lib/generated/{schema,lineage,schema-coverage}.json, and docs/lib/site-metrics.generated.ts. Regenerate those outputs when code changes; do not hand-edit them.

Jump to the rest of the arena

Core entry pages

Installation — prerequisites, install routes, config defaults, and first checks
Architecture — pipeline stages, validation tiers, internal state, and design decisions
CLI Reference — exact command signatures plus operator and CI notes

Reference surfaces

Schema Reference — table families, join lanes, and generated reference coverage
Data Dictionary — column-level definitions and glossary terms
Endpoints — upstream endpoint grouping and extraction coverage

Guided routes

Guides — practice-facility hub for analysis drills, recurring ops, onboarding, and recovery workflows

nbadb Documentation

Welcome to the Arena Data Lab for nbadb: the control tower for getting from first install to first query, first refresh, and first production handoff.

nbadb turns the NBA stats surface into a public analytical model with DuckDB staging, SQL-first transforms, and exports for DuckDB, SQLite, Parquet, and CSV.

Public surface

Star schema

dimensions, facts, bridges, aggregates, and analytics outputs

Extractors

Full coverage

registered wrappers around the current nba_api runtime surface

Core run modes

init, daily, monthly, and backfill run

Export formats

sqlite, duckdb, csv, and parquet by default

Entry surface

Start the install

Jump to Installation for prerequisites, install routes, env defaults, and the first sanity checks.

Entry surface

Understand the pipeline

Open Architecture for the raw → staging → star flow, internal state boundaries, and why the warehouse is shaped this way.

Entry surface

Find the right command

Use CLI Reference when the question is "what command does this?" or "what flags are accepted right now?"

No install

Try the browser explorer

Open SQL Playground when you want to rehearse DuckDB query shapes in-browser before you pull the full dataset locally.

Entry surface

Find the right table

Start with Schema Reference, then move to the Data Dictionary when you need exact fields and meanings.

Use this page when…

If you need to…	Go here first	Why this is the shortest route
Get nbadb onto a machine and prove it works	Installation	It covers prerequisites, install paths, `NBADB_` defaults, and first-run checks
Understand how raw extraction becomes queryable warehouse tables	Architecture	It explains the raw → staging → star flow, validation tiers, and internal state boundaries
Find the exact command or flag surface	CLI Reference	It maps command intent, shared behaviors, and current signatures in one place
Rehearse SQL in the browser before local setup	SQL Playground	It loads DuckDB-WASM with self-contained NBA-flavored examples so you can practice the query shape first
Find the right public table family or field	Schema Reference and Data Dictionary	Use schema pages for table discovery, then switch to field-level definitions

First-five-minutes route

First five minutes

If you just want to get oriented without reading the whole site, take this route:

Install nbadb from Installation.
Run a first build or pull a published dataset.
Check CLI Reference for the exact command surface.
Use SQL Playground to rehearse the query shape if you want a browser-only warm-up.
Use Schema Reference to find the right table family.
Move into the guide lane that matches your job.

pip install nbadb
nbadb init
nbadb schema

Choose your route

Reader lane

I am installing or evaluating nbadb

Start with Installation. It covers the PyPI route, the source route, the NBADB_ settings that matter first, and what should appear in your data directory when things work.

Reader lane

I need to understand how the warehouse works

Read Architecture next. That page explains the pipeline stages, validation tiers, internal pipeline tables, export lanes, and the design choices behind the public model.

Reader lane

I need to query or model against the data

Reader lane

I need recurring operator workflows

Keep Daily Updates, Kaggle Setup, and the Troubleshooting Playbook open together. Those are the practical runbook pages.

Reader routes in one glance

Reader	Start here	Keep this open next
Evaluator or first-time installer	Installation	CLI Reference
Analyst looking for usable tables fast	SQL Playground or Schema Reference	DuckDB Query Examples
Builder changing pipeline or docs behavior	Architecture	CLI Reference
Operator running recurring refreshes	Daily Updates	Troubleshooting Playbook

What's on the floor

Surface	Read it as…	Reach for it when…	Start here
Dimensions	`dim_*` identity and lookup context	You need stable entities like players, teams, games, seasons, and history-aware reference context	Dimensions
Facts	`fact_*` event and measurement grain	You need player, team, game, play, shot, tracking, or standings data at its working grain	Facts & Bridges
Bridges	`bridge_*` connectors	You are resolving many-to-many relationships between public entities	Facts & Bridges
Aggregates	`agg_*` reusable rollups	You want pre-rolled summaries instead of rebuilding the same season or career SQL repeatedly	Derived Aggregations
Analytics outputs	`analytics_*` convenience surfaces	You want the shortest route to analysis-ready tables and views	Analytics Views

Role-based lanes

Fast lanes by reader

Who should start here

Analyst lane

Begin with SQL Playground if you want a browser-only warm-up, then keep Analytics Quickstart and DuckDB Query Examples open for real warehouse reads.

Who should start here

Builder lane

Start with Installation, then read CLI Reference and Architecture before changing pipeline behavior, schemas, or docs generation flows.

Who should start here

Operator lane

Use Daily Updates for the recurring refresh cadence, Kaggle Setup for distribution handoffs, and the Troubleshooting Playbook when artifacts or CI go sideways.

Generated vs. hand-written docs

Some docs pages are hand-authored. Others are generated from schema metadata and lineage information.

uv run nbadb docs-autogen --docs-root docs/content/docs

Jump to the rest of the arena

Core entry pages

Installation — prerequisites, install routes, config defaults, and first checks
Architecture — pipeline stages, validation tiers, internal state, and design decisions
CLI Reference — exact command signatures plus operator and CI notes

Reference surfaces

Schema Reference — table families, join lanes, and generated reference coverage
Data Dictionary — column-level definitions and glossary terms
Endpoints — upstream endpoint grouping and extraction coverage

Guided routes

Guides — practice-facility hub for analysis drills, recurring ops, onboarding, and recovery workflows

Getting Started

Start the install

Understand the pipeline

Find the right command

Try the browser explorer

Find the right table

I am installing or evaluating nbadb

I need to understand how the warehouse works

I need to query or model against the data

I need recurring operator workflows

Analyst lane

Builder lane

Operator lane

Stay in the same possession

Installation

Architecture

CLI Reference

On this page

Getting Started

Start the install

Understand the pipeline

Find the right command

Try the browser explorer

Find the right table

I am installing or evaluating nbadb

I need to understand how the warehouse works

I need to query or model against the data

I need recurring operator workflows

Analyst lane

Builder lane

Operator lane

Stay in the same possession

Installation

Architecture

CLI Reference

On this page