CLI Reference

Treat this page like the sideline play sheet for nbadb. It is the fastest route from “what command do I need?” to “what exactly does that command accept?” without reading the CLI code directly.

CLI commands

20 top-level + 4 backfill subcommands

Pipeline run modes

init, daily, monthly, and full

Default exports

sqlite, duckdb, csv, and parquet

Status JSON keys

watermarks, journal, and metadata

Entry surface

Build or refresh data

Start with Pipeline commands if you are deciding between init, daily, monthly, and full.

Entry surface

Inspect or debug a dataset

Jump to

Inspection and quality commands

for schema, status, and scan.

Entry surface

Move data in or out

Use

Distribution and artifact commands

for export, download, upload, and docs-autogen.

Entry surface

Need exact flags

Skip to Full command matrix when the only question is the literal current signature.

Opening possessions

Goal	Command
Build the full historical dataset	`nbadb init`
Refresh the current season after games settle	`nbadb daily`
Sweep the last 3 seasons more broadly	`nbadb monthly`
Targeted backfill with gap detection	`nbadb backfill run`
Detect missing data across seasons	`nbadb backfill gaps`
Re-export from the existing DuckDB database	`nbadb export`
Inspect pipeline state	`nbadb status`
Regenerate generated docs artifacts	`nbadb docs-autogen`
Download or publish the Kaggle dataset	`nbadb download` / `nbadb upload`
Scan for data quality issues and gaps	`nbadb scan`
Lint SQL in transform definitions	`nbadb lint-sql`
Launch the analytics chat UI	`nbadb chat`
Generate Kaggle dataset metadata	`nbadb metadata`
Export pipeline telemetry	`nbadb journal-summary`

Choose the command family first

If your question is…	Start here	Typical commands
“How do I build or refresh data?”	Pipeline commands	`init`, `daily`, `monthly`
“How do I fill gaps or re-extract specific data?”	Backfill commands	`backfill run`, `backfill gaps`, `backfill completeness`, `backfill journal`
“How do I inspect, validate, or debug what already exists?”	Inspection and quality commands	`schema`, `status`, `scan`, `lint-sql`, `ask`, `chat`, `journal-summary`
“How do I move artifacts in or out?”	Distribution and artifact commands	`export`, `download`, `upload`, `docs-autogen`, `extract-completeness`, `audit-models`, `metadata`, `migrate`
“What are the exact current signatures?”	Exact signatures	Grouped tables for every current command

Command modules

Pipeline commands

These are the commands most operators touch first.

Historical build

`nbadb init`

Full rebuild across historical seasons. Supports --season-start/-s, --season-end/-e, repeatable --format/-f, --data-dir/-d, --verbose/-v, and --quality-check.

Current-season refresh

`nbadb daily`

Uses the current season plus the NBADB_DAILY_LOOKBACK_DAYS window, refreshes active players and teams, then rebuilds downstream tables in replace mode.

Recent multi-season sweep

`nbadb monthly`

Refreshes the last 3 seasons and then rebuilds downstream tables in replace mode.

Deprecated

`nbadb full`

Deprecated. Use nbadb backfill run instead, which provides targeted extraction with gap detection, --dry-run, and finer control.

Run modes that matter most

Command	Primary intent	Scope
`nbadb init`	Full historical build	Historical seasons from `--season-start` through `--season-end`
`nbadb daily`	Current-season refresh focused on recent games	Current season, recent games within `NBADB_DAILY_LOOKBACK_DAYS`, plus active player/team refresh
`nbadb monthly`	Broader refresh of recent history	The last 3 seasons
~~`nbadb full`~~	~~Retry failures and fill historical gaps~~	Deprecated. Use `nbadb backfill run` instead

Deprecated commands: nbadb full is deprecated — use nbadb backfill run for targeted gap-fill. nbadb run-quality is deprecated and removed — use nbadb scan instead.

daily and monthly are not lightweight star-surface upsert commands. Their extraction scope differs, but each run ends by rebuilding downstream tables in replace mode.

The backfill command group provides targeted extraction, gap detection, and journal management. Unlike the pipeline commands above, backfill operates selectively on specific seasons, endpoints, and parameter patterns.

Targeted extraction

`nbadb backfill run`

Extracts specific endpoints/seasons with --seasons/-s, --endpoint/-e, and --pattern/-p filters. Supports --force to re-extract, --extract-only and --transform-only for partial runs, and --dry-run/-n to preview.

Gap detection

`nbadb backfill gaps`

Scans the extraction journal to find missing data. Reports expected vs actual counts by endpoint, season, and parameter pattern. Use --output-format json for CI.

Coverage report

`nbadb backfill completeness`

Aggregated completeness view grouped by season and endpoint. Complements gaps with summary counts rather than per-entry detail.

Journal management

`nbadb backfill journal`

Show, count, reset, or clear journal entries with --action/-a. Filter by --endpoint, --seasons, and --status (done/failed/abandoned). Use --yes/-y to skip confirmation on destructive actions.

Shared backfill options

Option	Short	Description
`--seasons`	`-s`	Comma-separated seasons or range: `'2015-16,2016-17'` or `'2015:2020'`
`--endpoint`	`-e`	Comma-separated endpoint names
`--pattern`	`-p`	Comma-separated param patterns (e.g. `'game,season'`)
`--data-dir`	`-d`	Override default data directory
`--output-format`	`-f`	Output format: `text` (default) or `json`

`nbadb backfill run`

Use --dry-run to preview what would be extracted without executing:

nbadb backfill run --seasons 2023-24,2024-25 --endpoint box_score_traditional --dry-run

Use --force to re-extract entries already marked as done:

nbadb backfill run --seasons 2024-25 --force

Use --extract-only or --transform-only for partial pipeline runs. These flags are mutually exclusive.

`nbadb backfill journal`

The journal subcommand supports four actions via --action/-a:

Action	Description	Destructive?
`show`	List journal entries (default, max 100)	No
`count`	Counts grouped by endpoint and status	No
`reset`	Reset entries to pending (re-extractable)	Yes
`clear`	Delete matching entries permanently	Yes

Destructive actions require at least one filter (--endpoint, --seasons, or --status) and prompt for confirmation unless --yes/-y is passed.

Inspection and quality commands

Discovery

`nbadb schema [TABLE]`

With no argument, it lists discovered output tables grouped by prefix. With an optional table argument, it shows lineage-style details for that table.

Operational state

`nbadb status`

Shows pipeline watermarks, journal summary, and pipeline metadata. Use --output-format json for CI or scripts.

Quality and gap scanner

`nbadb scan`

Scans the database for missing data, gaps, and quality issues. Supports --category/-c, --table/-t, --severity/-s, --fail-on/-F, --report-path/-r, --output-format/-f, and --ci.

Local query agent

`nbadb ask QUESTION`

Sends a natural-language question to the local query agent with --limit/-l and an accepted-but-currently-nonfunctional --verbose/-v flag.

`nbadb status`

Use --output-format json for scripts and CI. The JSON payload contains three top-level keys:

{
  "watermarks": [],
  "journal": {},
  "metadata": []
}

`nbadb scan`

The scan command replaces the deprecated run-quality with a broader quality and gap scanner:

nbadb scan --category data_quality,missing_table --severity warning --fail-on error

--category/-c filters to specific check categories: cross_table, temporal, missing_table, data_quality
--table/-t filters by table name prefix (e.g. fact_box_score, stg_, dim_)
--severity/-s sets the minimum severity to display: error, warning, info
--fail-on/-F exits non-zero when findings meet or exceed a severity threshold
--report-path/-r writes the full JSON report to a file
--output-format/-f switches between text (default) and json
--ci enables GitHub Actions integration (step summary + annotations)

Distribution and artifact commands

Command	What it is for	Notes
`nbadb export`	Re-export public tables from DuckDB	Uses repeatable `--format/-f`; defaults come from `NBADB_FORMATS`
`nbadb download`	Pull the latest published Kaggle dataset into the data directory	Seeds `nba.duckdb` from `nba.sqlite` if DuckDB is missing
`nbadb upload`	Publish the data directory to Kaggle	Ensures `dataset-metadata.json` exists first
`nbadb docs-autogen`	Regenerate generated docs artifacts	Defaults `--docs-root` to `docs/content/docs`
`nbadb extract-completeness`	Generate endpoint coverage artifacts	Supports `--output-dir/-o`, `--require-full`, and `--require-model-contract`
`nbadb endpoint-support-matrix`	Generate the strict endpoint support and warehouse contract matrix	Supports `--output-dir/-o`, `--require-complete`, and `--require-season-type-contract`
`nbadb endpoint-adequacy-scorecard`	Generate the endpoint adequacy scorecard and report	Supports `--output-dir/-o`
`nbadb audit-models`	Generate end-to-end model + result-table audit artifacts	Supports `--mode`, `--strictness`, `--output-dir/-o`, `--baseline`, and `--require-result-table-contract`
`nbadb metadata`	Generate Kaggle `dataset-metadata.json` from table catalog	Supports `--output/-o` and `--data-dir/-d`
`nbadb migrate`	Create or migrate pipeline tables	Usually a setup or recovery command

`nbadb extract-completeness`

By default this writes coverage artifacts to artifacts/endpoint-coverage/:

endpoint-coverage-matrix.json
endpoint-coverage-summary.json
endpoint-coverage-report.md

Pass --output-dir to change the destination.

Use --require-full in CI to fail when any of these source-coverage gaps remain non-zero:

runtime_gap
staging_only
extractor_only
source_only

Use --require-model-contract to fail when a staged stats surface still lacks an explicit downstream decision. The command summary prints separate counts for analytically modeled, passthrough-only, compatibility/reference-only, explicitly excluded, and still-unowned surfaces.

That endpoint-level contract is intentionally separate from the output-schema audit. The same summary now prints star schema coverage counts for transform outputs so you can see, independently, how many tables have a schema-backed final-tier contract versus how many still bypass star-schema validation.

`nbadb endpoint-support-matrix`

Use this when you need the strictest repo-local answer to "is every endpoint wired into the repo contract, and what kind of downstream ownership does it have?"

uv run nbadb endpoint-support-matrix --require-complete --require-season-type-contract

By default this writes companion artifacts next to extract-completeness under artifacts/endpoint-coverage/:

endpoint-support-matrix.json
endpoint-support-summary.json
endpoint-support-report.md

It also refreshes the shared endpoint-coverage bundle in that directory, including the adequacy scorecard, so CI can gate the full contract from a single regeneration pass.

The command distinguishes three execution semantics:

historical_backfill for stats/static surfaces that can be rebuilt from their earliest supported season or date
reference_snapshot for reference/static snapshots
live_snapshot for append-only live feeds such as scoreboard, odds, live play-by-play, and live box score packets

Use --require-complete to fail whenever any endpoint remains only partially wired into staging, transforms, or schemas. The summary also breaks downstream semantics into analytically modeled, passthrough-only, compatibility/reference-only, explicitly excluded, and unowned buckets so compatibility surfaces do not get mistaken for full analytical ownership. Use --require-season-type-contract to fail whenever a historical surface still lacks explicit season-type semantics or remains deliberately blocked from that contract.

`nbadb endpoint-adequacy-scorecard`

Use this when you want the endpoint-centric view that combines extraction coverage, support contract state, and downstream ownership into a single scorecard:

uv run nbadb endpoint-adequacy-scorecard

By default this writes companion artifacts next to extract-completeness under artifacts/endpoint-coverage/:

endpoint-adequacy-scorecard.json
endpoint-adequacy-scorecard-report.md

If you already regenerated endpoint-support-matrix or extract-completeness, those shared coverage runs also refresh this scorecard; use this command when you only need the adequacy pair itself.

The scorecard surfaces endpoint-level adequacy, downstream ownership, execution semantics, and the underlying coverage/support gap counts so you can quickly see whether each endpoint is analytically modeled, passthrough-only, compatibility/reference-only, explicitly excluded, or still unowned.

`nbadb audit-models`

Use this when you need the end-to-end model contract, not just endpoint completeness:

uv run nbadb audit-models --mode inventory --strictness consistency

The command writes model-audit artifacts under artifacts/model-audit/ by default:

inventory.json
matrix.json
report.md
result-table-inventory.json

probe, build, and full modes also emit:

live-probes.json
build-validation.json

When --baseline is provided, the command also writes baseline-comparison.json and enables no-regressions or zero-gaps enforcement. Pass --require-result-table-contract to fail when any runtime result table is still missing from staging, unowned downstream, or only weakly classified because schema coverage is incomplete.

Use the strictness levels progressively:

consistency fails on discovery and audit-contract inconsistencies
no-regressions compares the current inventory summary against a committed baseline, including new result-table contract failures
zero-gaps requires both a baseline and zero remaining audit problems or result-table contract failures

The result-table inventory keeps warehouse ownership semantics explicit per runtime result set:

modeled — analytically modeled with non-passthrough ownership
passthrough_only — owned only by passthrough/union surfaces
compatibility_reference_only — intentionally retained for compatibility/reference visibility
excluded / deprecated — explicitly non-modeled but still visible
unowned, weakly_classified, and missing — contract failures that should be fixed or explicitly classified

For CI canaries, keep --mode probe and use the repository’s configured canary profile so PRs exercise each source kind and parameter pattern without running the full live matrix.

`nbadb docs-autogen`

Use this when generated reference pages drift from code:

uv run nbadb docs-autogen --docs-root docs/content/docs

It regenerates:

schema/{raw,staging,star}-reference.mdx
data-dictionary/{raw,staging,star}.mdx
diagrams/er-auto.mdx
lineage/lineage-auto.mdx
docs/lib/generated/{schema,lineage,schema-coverage}.json
docs/lib/site-metrics.generated.ts

The same generator is also available via:

uv run python -m nbadb.docs_gen --docs-root docs/content/docs

The command prints updated: or unchanged: for each artifact, then a final summary line.

Shared behavior for pipeline commands

These notes apply to nbadb init, daily, and monthly.

--data-dir overrides the default data directory (nbadb/) and the NBADB_DATA_DIR setting.
--quality-check runs post-pipeline quality checks against the DuckDB file, but those checks are informational today: empty-table warnings are reported without failing the command.
If stdout is an interactive terminal and --verbose is not set, the CLI uses the Textual TUI.
Use --verbose to force plain log output and DEBUG-level logging.
A single Ctrl+C attempts a graceful stop and preserves resume-safe journal/checkpoint state; pressing Ctrl+C again forces exit.
Successful runs print a summary like this:

daily: 12 tables, 245,901 rows, 38.4s

Shared flags and defaults that matter most

Flag or default	Applies to	What to remember
`--data-dir/-d`	Most commands that read or write local data	Overrides the default `nbadb/` directory and the `NBADB_DATA_DIR` setting
`--format/-f`	`init` and `export`	Repeatable; defaults come from `NBADB_FORMATS` when omitted
`--verbose/-v`	Pipeline commands and `ask`	On pipeline commands, it forces plain logs and DEBUG-level output instead of the TUI
`--quality-check`	`init`, `daily`, `monthly`	Runs informational post-pipeline quality checks without currently failing on empty-table warnings
`--output-format/-f`	`status`	`text` by default; use `json` for CI and scripts

Data and output defaults

Default export formats come from NbaDbSettings.formats: sqlite, duckdb, csv, and parquet.
nbadb init and nbadb export use all configured formats when you omit --format.
nbadb download copies the Kaggle dataset into the target data directory and seeds nba.duckdb from nba.sqlite if only SQLite is present.
nbadb upload ensures dataset-metadata.json exists before publishing.
nbadb schema groups tables by output prefix. Analysis-ready convenience outputs use the analytics_* prefix.

Exact signatures

Full command matrix

The matrix below mirrors the current Typer command signatures in src/nbadb/cli/commands.

Pipeline commands

Command	What it does	Positional args	Options
`nbadb init`	Full rebuild across historical seasons	—	`--data-dir/-d`, `--format/-f` (repeatable), `--season-start/-s` (default: `1946`), `--season-end/-e`, `--verbose/-v`, `--quality-check`
`nbadb daily`	Current-season refresh focused on recent games	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`
`nbadb monthly`	Refresh the last 3 seasons	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`
~~`nbadb full`~~	~~Retry failed extractions and fill historical gaps~~ (deprecated — use `backfill run`)	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`

Backfill commands

Command	What it does	Positional args	Options
`nbadb backfill run`	Targeted backfill extraction	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--pattern/-p`, `--force`, `--extract-only`, `--transform-only`, `--dry-run/-n`, `--verbose/-v`, `--quality-check`
`nbadb backfill gaps`	Detect and report extraction gaps	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--pattern/-p`, `--output-format/-f`
`nbadb backfill completeness`	Show data completeness per season/endpoint	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--output-format/-f`
`nbadb backfill journal`	Manage extraction journal entries	—	`--data-dir/-d`, `--endpoint/-e`, `--seasons/-s`, `--status`, `--action/-a` (`show`/`count`/`reset`/`clear`), `--yes/-y`, `--output-format/-f`

Inspection and quality commands

Command	What it does	Positional args	Options
`nbadb schema [TABLE]`	List all discovered output tables or show lineage for one table	optional `TABLE`	—
`nbadb status`	Show pipeline status, watermarks, and metadata	—	`--data-dir/-d`, `--output-format/-f` (default: `text`; `json` supported)
`nbadb scan`	Scan database for missing data, gaps, and quality issues	—	`--data-dir/-d`, `--category/-c`, `--table/-t`, `--severity/-s`, `--fail-on/-F`, `--report-path/-r`, `--output-format/-f`, `--ci`
`nbadb lint-sql`	Lint SQL in SqlTransformer `_SQL` ClassVars using SQLFluff	—	`--fix`, `--table/-t`, `--fail-on/-F`
`nbadb ask QUESTION`	Ask the local query agent a natural-language question	required `QUESTION`	`--limit/-l` (default: `10`), `--verbose/-v`
`nbadb chat`	Launch the AI-powered NBA data analytics chat UI	—	`--port/-p` (default: `8421`), `--host` (default: `127.0.0.1`)
`nbadb journal-summary`	Export pipeline telemetry for the docs admin panel	—	`--data-dir/-d`, `--output-format/-f`, `--json`, `--output-path/-o`, `--window-days`, `--limit`

Distribution and artifact commands

Command	What it does	Positional args	Options
`nbadb export`	Re-export user tables from DuckDB into configured output formats	—	`--data-dir/-d`, `--format/-f` (repeatable)
`nbadb download`	Download the latest Kaggle dataset into the data directory	—	`--data-dir/-d`
`nbadb upload`	Upload the dataset directory to Kaggle	—	`--data-dir/-d`, `--message/-m` (default: `"Automated update"`)
`nbadb migrate`	Create or migrate pipeline tables	—	`--data-dir/-d`
`nbadb docs-autogen`	Regenerate generated docs artifacts under `docs/content/docs` and `docs/lib/generated`	—	`--docs-root` (default: `docs/content/docs`)
`nbadb extract-completeness`	Generate endpoint coverage artifacts	—	`--output-dir/-o`, `--require-full`, `--require-model-contract`
`nbadb endpoint-support-matrix`	Generate the strict endpoint support and warehouse contract matrix	—	`--output-dir/-o`, `--require-complete`, `--require-season-type-contract`
`nbadb endpoint-adequacy-scorecard`	Generate the endpoint adequacy scorecard and report	—	`--output-dir/-o`
`nbadb audit-models`	Generate end-to-end model + result-table audit artifacts	—	`--mode`, `--strictness`, `--output-dir/-o`, `--baseline`, `--require-result-table-contract`
`nbadb metadata`	Generate Kaggle `dataset-metadata.json` from table catalog	—	`--output/-o` (default: `dataset-metadata.json`), `--data-dir/-d`

Installation for setup and environment defaults
Architecture for run-mode context and warehouse boundaries
Daily Updates for recurring refreshes
Kaggle Setup for download/upload workflows
Troubleshooting Playbook for CI, docs, and artifact failures

CLI Reference

CLI commands

20 top-level + 4 backfill subcommands

Pipeline run modes

init, daily, monthly, and full

Default exports

sqlite, duckdb, csv, and parquet

Status JSON keys

watermarks, journal, and metadata

Entry surface

Build or refresh data

Start with Pipeline commands if you are deciding between init, daily, monthly, and full.

Entry surface

Inspect or debug a dataset

Jump to

Inspection and quality commands

for schema, status, and scan.

Entry surface

Move data in or out

Use

Distribution and artifact commands

for export, download, upload, and docs-autogen.

Entry surface

Need exact flags

Skip to Full command matrix when the only question is the literal current signature.

Opening possessions

Goal	Command
Build the full historical dataset	`nbadb init`
Refresh the current season after games settle	`nbadb daily`
Sweep the last 3 seasons more broadly	`nbadb monthly`
Targeted backfill with gap detection	`nbadb backfill run`
Detect missing data across seasons	`nbadb backfill gaps`
Re-export from the existing DuckDB database	`nbadb export`
Inspect pipeline state	`nbadb status`
Regenerate generated docs artifacts	`nbadb docs-autogen`
Download or publish the Kaggle dataset	`nbadb download` / `nbadb upload`
Scan for data quality issues and gaps	`nbadb scan`
Lint SQL in transform definitions	`nbadb lint-sql`
Launch the analytics chat UI	`nbadb chat`
Generate Kaggle dataset metadata	`nbadb metadata`
Export pipeline telemetry	`nbadb journal-summary`

Choose the command family first

If your question is…	Start here	Typical commands
“How do I build or refresh data?”	Pipeline commands	`init`, `daily`, `monthly`
“How do I fill gaps or re-extract specific data?”	Backfill commands	`backfill run`, `backfill gaps`, `backfill completeness`, `backfill journal`
“How do I inspect, validate, or debug what already exists?”	Inspection and quality commands	`schema`, `status`, `scan`, `lint-sql`, `ask`, `chat`, `journal-summary`
“How do I move artifacts in or out?”	Distribution and artifact commands	`export`, `download`, `upload`, `docs-autogen`, `extract-completeness`, `audit-models`, `metadata`, `migrate`
“What are the exact current signatures?”	Exact signatures	Grouped tables for every current command

Command modules

Pipeline commands

These are the commands most operators touch first.

Historical build

`nbadb init`

Full rebuild across historical seasons. Supports --season-start/-s, --season-end/-e, repeatable --format/-f, --data-dir/-d, --verbose/-v, and --quality-check.

Current-season refresh

`nbadb daily`

Uses the current season plus the NBADB_DAILY_LOOKBACK_DAYS window, refreshes active players and teams, then rebuilds downstream tables in replace mode.

Recent multi-season sweep

`nbadb monthly`

Refreshes the last 3 seasons and then rebuilds downstream tables in replace mode.

Deprecated

`nbadb full`

Deprecated. Use nbadb backfill run instead, which provides targeted extraction with gap detection, --dry-run, and finer control.

Run modes that matter most

Command	Primary intent	Scope
`nbadb init`	Full historical build	Historical seasons from `--season-start` through `--season-end`
`nbadb daily`	Current-season refresh focused on recent games	Current season, recent games within `NBADB_DAILY_LOOKBACK_DAYS`, plus active player/team refresh
`nbadb monthly`	Broader refresh of recent history	The last 3 seasons
~~`nbadb full`~~	~~Retry failures and fill historical gaps~~	Deprecated. Use `nbadb backfill run` instead

Deprecated commands: nbadb full is deprecated — use nbadb backfill run for targeted gap-fill. nbadb run-quality is deprecated and removed — use nbadb scan instead.

daily and monthly are not lightweight star-surface upsert commands. Their extraction scope differs, but each run ends by rebuilding downstream tables in replace mode.

Backfill commands

Targeted extraction

`nbadb backfill run`

Gap detection

`nbadb backfill gaps`

Scans the extraction journal to find missing data. Reports expected vs actual counts by endpoint, season, and parameter pattern. Use --output-format json for CI.

Coverage report

`nbadb backfill completeness`

Aggregated completeness view grouped by season and endpoint. Complements gaps with summary counts rather than per-entry detail.

Journal management

`nbadb backfill journal`

Shared backfill options

Option	Short	Description
`--seasons`	`-s`	Comma-separated seasons or range: `'2015-16,2016-17'` or `'2015:2020'`
`--endpoint`	`-e`	Comma-separated endpoint names
`--pattern`	`-p`	Comma-separated param patterns (e.g. `'game,season'`)
`--data-dir`	`-d`	Override default data directory
`--output-format`	`-f`	Output format: `text` (default) or `json`

`nbadb backfill run`

Use --dry-run to preview what would be extracted without executing:

nbadb backfill run --seasons 2023-24,2024-25 --endpoint box_score_traditional --dry-run

Use --force to re-extract entries already marked as done:

nbadb backfill run --seasons 2024-25 --force

Use --extract-only or --transform-only for partial pipeline runs. These flags are mutually exclusive.

`nbadb backfill journal`

The journal subcommand supports four actions via --action/-a:

Action	Description	Destructive?
`show`	List journal entries (default, max 100)	No
`count`	Counts grouped by endpoint and status	No
`reset`	Reset entries to pending (re-extractable)	Yes
`clear`	Delete matching entries permanently	Yes

Destructive actions require at least one filter (--endpoint, --seasons, or --status) and prompt for confirmation unless --yes/-y is passed.

Inspection and quality commands

Discovery

`nbadb schema [TABLE]`

With no argument, it lists discovered output tables grouped by prefix. With an optional table argument, it shows lineage-style details for that table.

Operational state

`nbadb status`

Shows pipeline watermarks, journal summary, and pipeline metadata. Use --output-format json for CI or scripts.

Quality and gap scanner

`nbadb scan`

Scans the database for missing data, gaps, and quality issues. Supports --category/-c, --table/-t, --severity/-s, --fail-on/-F, --report-path/-r, --output-format/-f, and --ci.

Local query agent

`nbadb ask QUESTION`

Sends a natural-language question to the local query agent with --limit/-l and an accepted-but-currently-nonfunctional --verbose/-v flag.

`nbadb status`

Use --output-format json for scripts and CI. The JSON payload contains three top-level keys:

{
  "watermarks": [],
  "journal": {},
  "metadata": []
}

`nbadb scan`

The scan command replaces the deprecated run-quality with a broader quality and gap scanner:

nbadb scan --category data_quality,missing_table --severity warning --fail-on error

--category/-c filters to specific check categories: cross_table, temporal, missing_table, data_quality
--table/-t filters by table name prefix (e.g. fact_box_score, stg_, dim_)
--severity/-s sets the minimum severity to display: error, warning, info
--fail-on/-F exits non-zero when findings meet or exceed a severity threshold
--report-path/-r writes the full JSON report to a file
--output-format/-f switches between text (default) and json
--ci enables GitHub Actions integration (step summary + annotations)

Distribution and artifact commands

Command	What it is for	Notes
`nbadb export`	Re-export public tables from DuckDB	Uses repeatable `--format/-f`; defaults come from `NBADB_FORMATS`
`nbadb download`	Pull the latest published Kaggle dataset into the data directory	Seeds `nba.duckdb` from `nba.sqlite` if DuckDB is missing
`nbadb upload`	Publish the data directory to Kaggle	Ensures `dataset-metadata.json` exists first
`nbadb docs-autogen`	Regenerate generated docs artifacts	Defaults `--docs-root` to `docs/content/docs`
`nbadb extract-completeness`	Generate endpoint coverage artifacts	Supports `--output-dir/-o`, `--require-full`, and `--require-model-contract`
`nbadb endpoint-support-matrix`	Generate the strict endpoint support and warehouse contract matrix	Supports `--output-dir/-o`, `--require-complete`, and `--require-season-type-contract`
`nbadb endpoint-adequacy-scorecard`	Generate the endpoint adequacy scorecard and report	Supports `--output-dir/-o`
`nbadb audit-models`	Generate end-to-end model + result-table audit artifacts	Supports `--mode`, `--strictness`, `--output-dir/-o`, `--baseline`, and `--require-result-table-contract`
`nbadb metadata`	Generate Kaggle `dataset-metadata.json` from table catalog	Supports `--output/-o` and `--data-dir/-d`
`nbadb migrate`	Create or migrate pipeline tables	Usually a setup or recovery command

`nbadb extract-completeness`

By default this writes coverage artifacts to artifacts/endpoint-coverage/:

endpoint-coverage-matrix.json
endpoint-coverage-summary.json
endpoint-coverage-report.md

Pass --output-dir to change the destination.

Use --require-full in CI to fail when any of these source-coverage gaps remain non-zero:

runtime_gap
staging_only
extractor_only
source_only

`nbadb endpoint-support-matrix`

Use this when you need the strictest repo-local answer to "is every endpoint wired into the repo contract, and what kind of downstream ownership does it have?"

uv run nbadb endpoint-support-matrix --require-complete --require-season-type-contract

By default this writes companion artifacts next to extract-completeness under artifacts/endpoint-coverage/:

endpoint-support-matrix.json
endpoint-support-summary.json
endpoint-support-report.md

It also refreshes the shared endpoint-coverage bundle in that directory, including the adequacy scorecard, so CI can gate the full contract from a single regeneration pass.

The command distinguishes three execution semantics:

historical_backfill for stats/static surfaces that can be rebuilt from their earliest supported season or date
reference_snapshot for reference/static snapshots
live_snapshot for append-only live feeds such as scoreboard, odds, live play-by-play, and live box score packets

`nbadb endpoint-adequacy-scorecard`

Use this when you want the endpoint-centric view that combines extraction coverage, support contract state, and downstream ownership into a single scorecard:

uv run nbadb endpoint-adequacy-scorecard

By default this writes companion artifacts next to extract-completeness under artifacts/endpoint-coverage/:

endpoint-adequacy-scorecard.json
endpoint-adequacy-scorecard-report.md

If you already regenerated endpoint-support-matrix or extract-completeness, those shared coverage runs also refresh this scorecard; use this command when you only need the adequacy pair itself.

`nbadb audit-models`

Use this when you need the end-to-end model contract, not just endpoint completeness:

uv run nbadb audit-models --mode inventory --strictness consistency

The command writes model-audit artifacts under artifacts/model-audit/ by default:

inventory.json
matrix.json
report.md
result-table-inventory.json

probe, build, and full modes also emit:

live-probes.json
build-validation.json

Use the strictness levels progressively:

consistency fails on discovery and audit-contract inconsistencies
no-regressions compares the current inventory summary against a committed baseline, including new result-table contract failures
zero-gaps requires both a baseline and zero remaining audit problems or result-table contract failures

The result-table inventory keeps warehouse ownership semantics explicit per runtime result set:

modeled — analytically modeled with non-passthrough ownership
passthrough_only — owned only by passthrough/union surfaces
compatibility_reference_only — intentionally retained for compatibility/reference visibility
excluded / deprecated — explicitly non-modeled but still visible
unowned, weakly_classified, and missing — contract failures that should be fixed or explicitly classified

For CI canaries, keep --mode probe and use the repository’s configured canary profile so PRs exercise each source kind and parameter pattern without running the full live matrix.

`nbadb docs-autogen`

Use this when generated reference pages drift from code:

uv run nbadb docs-autogen --docs-root docs/content/docs

It regenerates:

schema/{raw,staging,star}-reference.mdx
data-dictionary/{raw,staging,star}.mdx
diagrams/er-auto.mdx
lineage/lineage-auto.mdx
docs/lib/generated/{schema,lineage,schema-coverage}.json
docs/lib/site-metrics.generated.ts

The same generator is also available via:

uv run python -m nbadb.docs_gen --docs-root docs/content/docs

The command prints updated: or unchanged: for each artifact, then a final summary line.

Shared behavior for pipeline commands

These notes apply to nbadb init, daily, and monthly.

--data-dir overrides the default data directory (nbadb/) and the NBADB_DATA_DIR setting.
--quality-check runs post-pipeline quality checks against the DuckDB file, but those checks are informational today: empty-table warnings are reported without failing the command.
If stdout is an interactive terminal and --verbose is not set, the CLI uses the Textual TUI.
Use --verbose to force plain log output and DEBUG-level logging.
A single Ctrl+C attempts a graceful stop and preserves resume-safe journal/checkpoint state; pressing Ctrl+C again forces exit.
Successful runs print a summary like this:

daily: 12 tables, 245,901 rows, 38.4s

Shared flags and defaults that matter most

Flag or default	Applies to	What to remember
`--data-dir/-d`	Most commands that read or write local data	Overrides the default `nbadb/` directory and the `NBADB_DATA_DIR` setting
`--format/-f`	`init` and `export`	Repeatable; defaults come from `NBADB_FORMATS` when omitted
`--verbose/-v`	Pipeline commands and `ask`	On pipeline commands, it forces plain logs and DEBUG-level output instead of the TUI
`--quality-check`	`init`, `daily`, `monthly`	Runs informational post-pipeline quality checks without currently failing on empty-table warnings
`--output-format/-f`	`status`	`text` by default; use `json` for CI and scripts

Data and output defaults

Default export formats come from NbaDbSettings.formats: sqlite, duckdb, csv, and parquet.
nbadb init and nbadb export use all configured formats when you omit --format.
nbadb download copies the Kaggle dataset into the target data directory and seeds nba.duckdb from nba.sqlite if only SQLite is present.
nbadb upload ensures dataset-metadata.json exists before publishing.
nbadb schema groups tables by output prefix. Analysis-ready convenience outputs use the analytics_* prefix.

Exact signatures

Full command matrix

The matrix below mirrors the current Typer command signatures in src/nbadb/cli/commands.

Pipeline commands

Command	What it does	Positional args	Options
`nbadb init`	Full rebuild across historical seasons	—	`--data-dir/-d`, `--format/-f` (repeatable), `--season-start/-s` (default: `1946`), `--season-end/-e`, `--verbose/-v`, `--quality-check`
`nbadb daily`	Current-season refresh focused on recent games	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`
`nbadb monthly`	Refresh the last 3 seasons	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`
~~`nbadb full`~~	~~Retry failed extractions and fill historical gaps~~ (deprecated — use `backfill run`)	—	`--data-dir/-d`, `--verbose/-v`, `--quality-check`

Backfill commands

Command	What it does	Positional args	Options
`nbadb backfill run`	Targeted backfill extraction	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--pattern/-p`, `--force`, `--extract-only`, `--transform-only`, `--dry-run/-n`, `--verbose/-v`, `--quality-check`
`nbadb backfill gaps`	Detect and report extraction gaps	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--pattern/-p`, `--output-format/-f`
`nbadb backfill completeness`	Show data completeness per season/endpoint	—	`--data-dir/-d`, `--seasons/-s`, `--endpoint/-e`, `--output-format/-f`
`nbadb backfill journal`	Manage extraction journal entries	—	`--data-dir/-d`, `--endpoint/-e`, `--seasons/-s`, `--status`, `--action/-a` (`show`/`count`/`reset`/`clear`), `--yes/-y`, `--output-format/-f`

Inspection and quality commands

Command	What it does	Positional args	Options
`nbadb schema [TABLE]`	List all discovered output tables or show lineage for one table	optional `TABLE`	—
`nbadb status`	Show pipeline status, watermarks, and metadata	—	`--data-dir/-d`, `--output-format/-f` (default: `text`; `json` supported)
`nbadb scan`	Scan database for missing data, gaps, and quality issues	—	`--data-dir/-d`, `--category/-c`, `--table/-t`, `--severity/-s`, `--fail-on/-F`, `--report-path/-r`, `--output-format/-f`, `--ci`
`nbadb lint-sql`	Lint SQL in SqlTransformer `_SQL` ClassVars using SQLFluff	—	`--fix`, `--table/-t`, `--fail-on/-F`
`nbadb ask QUESTION`	Ask the local query agent a natural-language question	required `QUESTION`	`--limit/-l` (default: `10`), `--verbose/-v`
`nbadb chat`	Launch the AI-powered NBA data analytics chat UI	—	`--port/-p` (default: `8421`), `--host` (default: `127.0.0.1`)
`nbadb journal-summary`	Export pipeline telemetry for the docs admin panel	—	`--data-dir/-d`, `--output-format/-f`, `--json`, `--output-path/-o`, `--window-days`, `--limit`

Distribution and artifact commands

Command	What it does	Positional args	Options
`nbadb export`	Re-export user tables from DuckDB into configured output formats	—	`--data-dir/-d`, `--format/-f` (repeatable)
`nbadb download`	Download the latest Kaggle dataset into the data directory	—	`--data-dir/-d`
`nbadb upload`	Upload the dataset directory to Kaggle	—	`--data-dir/-d`, `--message/-m` (default: `"Automated update"`)
`nbadb migrate`	Create or migrate pipeline tables	—	`--data-dir/-d`
`nbadb docs-autogen`	Regenerate generated docs artifacts under `docs/content/docs` and `docs/lib/generated`	—	`--docs-root` (default: `docs/content/docs`)
`nbadb extract-completeness`	Generate endpoint coverage artifacts	—	`--output-dir/-o`, `--require-full`, `--require-model-contract`
`nbadb endpoint-support-matrix`	Generate the strict endpoint support and warehouse contract matrix	—	`--output-dir/-o`, `--require-complete`, `--require-season-type-contract`
`nbadb endpoint-adequacy-scorecard`	Generate the endpoint adequacy scorecard and report	—	`--output-dir/-o`
`nbadb audit-models`	Generate end-to-end model + result-table audit artifacts	—	`--mode`, `--strictness`, `--output-dir/-o`, `--baseline`, `--require-result-table-contract`
`nbadb metadata`	Generate Kaggle `dataset-metadata.json` from table catalog	—	`--output/-o` (default: `dataset-metadata.json`), `--data-dir/-d`

Installation for setup and environment defaults
Architecture for run-mode context and warehouse boundaries
Daily Updates for recurring refreshes
Kaggle Setup for download/upload workflows
Troubleshooting Playbook for CI, docs, and artifact failures

CLI Reference

Build or refresh data

Inspect or debug a dataset

Move data in or out

Need exact flags

`nbadb init`

`nbadb daily`

`nbadb monthly`

`nbadb full`

`nbadb backfill run`

`nbadb backfill gaps`

`nbadb backfill completeness`

`nbadb backfill journal`

`nbadb schema [TABLE]`

`nbadb status`

`nbadb scan`

`nbadb ask QUESTION`

Stay in the same possession

Installation

Architecture

On this page

CLI Reference

Build or refresh data

Inspect or debug a dataset

Move data in or out

Need exact flags

`nbadb init`

`nbadb daily`

`nbadb monthly`

`nbadb full`

`nbadb backfill run`

`nbadb backfill gaps`

`nbadb backfill completeness`

`nbadb backfill journal`

`nbadb schema [TABLE]`

`nbadb status`

`nbadb scan`

`nbadb ask QUESTION`

Stay in the same possession

Installation

Architecture

On this page