Jump Ball
Getting Started
The tip-off and control tower for nbadb's public analytical surface.
nbadb Documentation
Welcome to the Arena Data Lab for nbadb: the control tower for getting from first install to first query, first refresh, and first production handoff.
nbadb turns the NBA stats surface into a public analytical model with DuckDB staging, SQL-first transforms, and exports for DuckDB, SQLite, Parquet, and CSV.
Quick navigation
Start the install
Jump to Installation for prerequisites, install routes, env defaults, and the first sanity checks.
Understand the pipeline
Open Architecture for the raw → staging → star flow, internal state boundaries, and why the warehouse is shaped this way.
Find the right command
Use CLI Reference when the question is "what command does this?" or "what flags are accepted right now?"
Try the browser explorer
Open SQL Playground when you want to rehearse DuckDB query shapes in-browser before you pull the full dataset locally.
Find the right table
Start with Schema Reference, then move to the Data Dictionary when you need exact fields and meanings.
Use this page when…
| If you need to… | Go here first | Why this is the shortest route |
|---|---|---|
| Get nbadb onto a machine and prove it works | Installation | It covers prerequisites, install paths, NBADB_ defaults, and first-run checks |
| Understand how raw extraction becomes queryable warehouse tables | Architecture | It explains the raw → staging → star flow, validation tiers, and internal state boundaries |
| Find the exact command or flag surface | CLI Reference | It maps command intent, shared behaviors, and current signatures in one place |
| Rehearse SQL in the browser before local setup | SQL Playground | It loads DuckDB-WASM with self-contained NBA-flavored examples so you can practice the query shape first |
| Find the right public table family or field | Schema Reference and Data Dictionary | Use schema pages for table discovery, then switch to field-level definitions |
First five minutes
If you just want to get oriented without reading the whole site, take this route:
- Install nbadb from Installation.
- Run a first build or pull a published dataset.
- Check CLI Reference for the exact command surface.
- Use SQL Playground to rehearse the query shape if you want a browser-only warm-up.
- Use Schema Reference to find the right table family.
- Move into the guide lane that matches your job.
pip install nbadb
nbadb init
nbadb schemanbadb init is the full historical build and usually takes hours, not
minutes. Once your data directory is seeded, nbadb daily becomes the
standard game-day possession for refreshing the current season.
Choose your route
I am installing or evaluating nbadb
Start with Installation. It covers the PyPI
route, the source route, the NBADB_ settings that matter first, and what
should appear in your data directory when things work.
I need to understand how the warehouse works
Read Architecture next. That page explains the pipeline stages, validation tiers, internal pipeline tables, export lanes, and the design choices behind the public model.
I need to query or model against the data
Use SQL Playground when you want a browser-only warm-up, then move to Schema Reference for table discovery, the Data Dictionary for field-level meaning, and DuckDB Query Examples for working SQL against the real model.
I need recurring operator workflows
Keep Daily Updates, Kaggle Setup, and the Troubleshooting Playbook open together. Those are the practical runbook pages.
Reader routes in one glance
| Reader | Start here | Keep this open next |
|---|---|---|
| Evaluator or first-time installer | Installation | CLI Reference |
| Analyst looking for usable tables fast | SQL Playground or Schema Reference | DuckDB Query Examples |
| Builder changing pipeline or docs behavior | Architecture | CLI Reference |
| Operator running recurring refreshes | Daily Updates | Troubleshooting Playbook |
What's on the floor
| Surface | Read it as… | Reach for it when… | Start here |
|---|---|---|---|
| Dimensions | dim_* identity and lookup context | You need stable entities like players, teams, games, seasons, and history-aware reference context | Dimensions |
| Facts | fact_* event and measurement grain | You need player, team, game, play, shot, tracking, or standings data at its working grain | Facts & Bridges |
| Bridges | bridge_* connectors | You are resolving many-to-many relationships between public entities | Facts & Bridges |
| Aggregates | agg_* reusable rollups | You want pre-rolled summaries instead of rebuilding the same season or career SQL repeatedly | Derived Aggregations |
| Analytics outputs | analytics_* convenience surfaces | You want the shortest route to analysis-ready tables and views | Analytics Views |
Fast lanes by reader
Analyst lane
Begin with SQL Playground if you want a browser-only warm-up, then keep Analytics Quickstart and DuckDB Query Examples open for real warehouse reads.
Builder lane
Start with Installation, then read CLI Reference and Architecture before changing pipeline behavior, schemas, or docs generation flows.
Operator lane
Use Daily Updates for the recurring refresh cadence, Kaggle Setup for distribution handoffs, and the Troubleshooting Playbook when artifacts or CI go sideways.
Generated vs. hand-written docs
Some docs pages are hand-authored. Others are generated from schema metadata and lineage information.
uv run nbadb docs-autogen --docs-root docs/content/docsThat command owns schema/{raw,staging,star}-reference.mdx, data-dictionary/{raw,staging,star}.mdx, diagrams/er-auto.mdx, lineage/lineage-auto.mdx, docs/lib/generated/{schema,lineage,schema-coverage}.json, and docs/lib/site-metrics.generated.ts. Regenerate those outputs when code changes; do not hand-edit them.
Jump to the rest of the arena
Core entry pages
- Installation — prerequisites, install routes, config defaults, and first checks
- Architecture — pipeline stages, validation tiers, internal state, and design decisions
- CLI Reference — exact command signatures plus operator and CI notes
Reference surfaces
- Schema Reference — table families, join lanes, and generated reference coverage
- Data Dictionary — column-level definitions and glossary terms
- Endpoints — upstream endpoint grouping and extraction coverage
Guided routes
- Guides — practice-facility hub for analysis drills, recurring ops, onboarding, and recovery workflows
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.
