Installation

Use this page as the pregame checklist for getting nbadb onto your machine and confirming the first possession works before you move into queries, refreshes, or docs work.

Python

3.12+

required for the current package and source workflow

Install routes

PyPI for quick use, source checkout for contributors

Default exports

sqlite, duckdb, csv, and parquet

Daily lookback

7 days

default recent-game window for nbadb daily

Entry surface

Quick package install

Jump to Install from PyPI if you just want the CLI on your machine fast.

Entry surface

Contributor source setup

Use Install from source if you will edit code, docs, or generated artifacts.

Entry surface

Env and defaults

Go straight to Configuration defaults for the NBADB_ knobs that matter first.

Entry surface

Sanity-check the setup

Skip to First possession checklist if install already succeeded and you need to validate it.

Preflight

Before you install:

You need Python 3.12 or newer.
Use uv for source installs and contributor workflows.
Use pip if you want the quickest path to the packaged CLI.
If you plan to use Kaggle download/upload flows, keep your Kaggle credentials available for later.

Pick an install route

Best for

PyPI route

Analysts and users who want the CLI without cloning the repo. Install the package, confirm nbadb --help works, then move into the first build or download flow.

Best for

Source route

Contributors, operators, and docs writers working inside the repo. This route keeps the checked-out code, docs generator, and local CLI aligned.

Install routes in one glance

Route	Best when…	First commands	What you get
PyPI	You want the packaged CLI quickly and do not need a repo checkout	`pip install nbadb`	The installed CLI and defaults needed for normal use
Source	You will edit code, docs, generated artifacts, or local project config	`uv sync --extra dev`	A checked-out repo, contributor tooling, docs generator, and repo-aligned CLI

Install from PyPI

Virtual environment recommended. Create and activate a virtual environment before installing to avoid dependency conflicts with other packages:

python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

pip install nbadb
nbadb --help

Use this route when you do not need the repository checkout itself.

Install from source

git clone https://github.com/wyattowalsh/nbadb.git
cd nbadb
uv sync --extra dev
uv run nbadb --help

If you are contributing docs or code, prefer the source route. It keeps the docs generator, tests, and local CLI aligned with the checked-out code.

Configuration defaults

nbadb reads environment variables with the NBADB_ prefix. The fastest way to inspect or override settings is to start from the checked-in example file:

cp .env.example .env

Core settings you will usually care about first

Variable	Purpose	Default
`NBADB_DATA_DIR`	Root folder for local database files and exported data	`nbadb`
`NBADB_FORMATS`	Default export formats for `init` and `export`	`['sqlite', 'duckdb', 'csv', 'parquet']`
`NBADB_DAILY_LOOKBACK_DAYS`	Recent-game window used by `nbadb daily`	`7`
`NBADB_KAGGLE_DATASET`	Dataset slug used by download/upload flows	`wyattowalsh/basketball`
`NBADB_PROXY_ENABLED`	Enable proxy rotation support	`false`

Fast decisions for new installs

Question	Default answer	Change it when…
Where should files land?	Keep the default `nbadb/` data directory	You want the dataset somewhere else or need multiple local copies
Which formats should export?	Keep all four: `sqlite`, `duckdb`, `csv`, `parquet`	You only want a subset for local storage or downstream tooling
Should proxies be enabled?	No	You are explicitly solving extraction-network or rate-limit issues
Should I use Kaggle credentials now?	Not unless you need `download` or `upload`	You want the fastest path to a published dataset or need to publish one

Optional knobs worth knowing exist

Variable	Why you might set it
`NBADB_LOG_DIR`	Move logs away from the default `logs/` folder
`NBADB_REQUEST_TIMEOUT`	Override the extractor request timeout without changing code
`NBADB_PROXY_URLS`, `NBADB_PROXY_USER`, `NBADB_PROXY_PASS`	Point extraction at explicit proxies or authenticated SOCKS5 proxies
`KAGGLE_USERNAME`, `KAGGLE_KEY`	Enable `nbadb download` and `nbadb upload` workflows

All settings ship with defaults. Only add .env entries for the knobs you actually want to change.

Sanity-check flow

First possession checklist

The fastest path to a usable local dataset

If your goal is…	Take this route
Use nbadb as fast as possible	Install from PyPI, then run `nbadb download`
Build everything locally from source	Install from source, then run `nbadb init`
Confirm the CLI works before any long-running data command	Run `nbadb --help`, then `nbadb schema` after data exists

1. Build or fetch data

Choose one lane:

# Full historical build
nbadb init

# Pull the latest published dataset into your data directory
nbadb download

nbadb init is the full historical rebuild. If you want the fastest path to a usable local dataset, nbadb download is usually the quicker opening possession.

2. Inspect the floor

nbadb schema
nbadb status

Use this pairing when you want a quick “did install actually produce a usable warehouse?” answer:

nbadb schema lists discovered public tables.
nbadb status shows pipeline watermarks, journal summary, and table metadata.

3. Verify data quality

nbadb scan --severity warning

Use scan to verify the dataset has no missing tables, data gaps, or quality issues. This replaces the older run-quality command with broader coverage.

4. Run the standard refresh play

nbadb daily

That command refreshes the current season, looks back NBADB_DAILY_LOOKBACK_DAYS days by default, updates active players and teams, and then rebuilds downstream tables in replace mode.

What lands in your data directory

By default, nbadb writes into nbadb/.

Default output map

Path	What it is for	Typical first use
`nba.duckdb`	Primary local warehouse	Querying, status inspection, and quality checks
`nba.sqlite`	Portable single-file export	Sharing and broad tool compatibility
`parquet/`	Columnar export lane	DataFrame-heavy and analytics workflows
`csv/`	Flat-file export lane	Simple ingestion into tools that expect CSV

Portable export

`nba.sqlite`

A single-file SQLite database for broad tool compatibility and easy sharing.

Primary local warehouse

`nba.duckdb`

The DuckDB database used for analytics, status inspection, quality checks, and local SQL work.

Columnar export lane

`parquet/`

One directory per exported table for DataFrame-friendly and analytics-heavy workflows.

Wide compatibility lane

`csv/`

One CSV per exported table for tools that prefer simple flat-file ingestion.

Common install-time decisions

Decision	Reach for this	Why
Put data somewhere other than `nbadb/`	`--data-dir` for one-off runs, `NBADB_DATA_DIR` for a lasting default	Keeps multiple datasets or non-default storage locations cleanly separated
Change what `init` and `export` write	`NBADB_FORMATS` or repeatable `--format` on supported commands	Lets you trim disk usage or match downstream tool expectations
Enable proxies	`NBADB_PROXY_ENABLED=true` plus proxy settings	Only needed when you are explicitly addressing extraction-network behavior
Use published datasets instead of building from scratch	`nbadb download` with Kaggle credentials configured	Usually the shortest route to a usable local dataset

Which page should you read next?

If you want to…	Go here
Learn what `daily`, `monthly`, `status`, `scan`, and other commands do	CLI Reference
Understand raw → staging → star and the public table families	Architecture
Start querying with the easiest analysis-ready surface	Analytics Quickstart
Download from or publish to Kaggle	Kaggle Setup

Docs contributors: generated artifacts boundary

The docs site mixes hand-written pages with generated reference artifacts. See the Docs boundary section on the Architecture page for the full list of generated artifacts and how to regenerate them.

Installation

Use this page as the pregame checklist for getting nbadb onto your machine and confirming the first possession works before you move into queries, refreshes, or docs work.

Python

3.12+

required for the current package and source workflow

Install routes

PyPI for quick use, source checkout for contributors

Default exports

sqlite, duckdb, csv, and parquet

Daily lookback

7 days

default recent-game window for nbadb daily

Entry surface

Quick package install

Jump to Install from PyPI if you just want the CLI on your machine fast.

Entry surface

Contributor source setup

Use Install from source if you will edit code, docs, or generated artifacts.

Entry surface

Env and defaults

Go straight to Configuration defaults for the NBADB_ knobs that matter first.

Entry surface

Sanity-check the setup

Skip to First possession checklist if install already succeeded and you need to validate it.

Preflight

Before you install:

You need Python 3.12 or newer.
Use uv for source installs and contributor workflows.
Use pip if you want the quickest path to the packaged CLI.
If you plan to use Kaggle download/upload flows, keep your Kaggle credentials available for later.

Pick an install route

Best for

PyPI route

Analysts and users who want the CLI without cloning the repo. Install the package, confirm nbadb --help works, then move into the first build or download flow.

Best for

Source route

Contributors, operators, and docs writers working inside the repo. This route keeps the checked-out code, docs generator, and local CLI aligned.

Install routes in one glance

Route	Best when…	First commands	What you get
PyPI	You want the packaged CLI quickly and do not need a repo checkout	`pip install nbadb`	The installed CLI and defaults needed for normal use
Source	You will edit code, docs, generated artifacts, or local project config	`uv sync --extra dev`	A checked-out repo, contributor tooling, docs generator, and repo-aligned CLI

Install from PyPI

Virtual environment recommended. Create and activate a virtual environment before installing to avoid dependency conflicts with other packages:

python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

pip install nbadb
nbadb --help

Use this route when you do not need the repository checkout itself.

Install from source

git clone https://github.com/wyattowalsh/nbadb.git
cd nbadb
uv sync --extra dev
uv run nbadb --help

If you are contributing docs or code, prefer the source route. It keeps the docs generator, tests, and local CLI aligned with the checked-out code.

Configuration defaults

nbadb reads environment variables with the NBADB_ prefix. The fastest way to inspect or override settings is to start from the checked-in example file:

cp .env.example .env

Core settings you will usually care about first

Variable	Purpose	Default
`NBADB_DATA_DIR`	Root folder for local database files and exported data	`nbadb`
`NBADB_FORMATS`	Default export formats for `init` and `export`	`['sqlite', 'duckdb', 'csv', 'parquet']`
`NBADB_DAILY_LOOKBACK_DAYS`	Recent-game window used by `nbadb daily`	`7`
`NBADB_KAGGLE_DATASET`	Dataset slug used by download/upload flows	`wyattowalsh/basketball`
`NBADB_PROXY_ENABLED`	Enable proxy rotation support	`false`

Fast decisions for new installs

Question	Default answer	Change it when…
Where should files land?	Keep the default `nbadb/` data directory	You want the dataset somewhere else or need multiple local copies
Which formats should export?	Keep all four: `sqlite`, `duckdb`, `csv`, `parquet`	You only want a subset for local storage or downstream tooling
Should proxies be enabled?	No	You are explicitly solving extraction-network or rate-limit issues
Should I use Kaggle credentials now?	Not unless you need `download` or `upload`	You want the fastest path to a published dataset or need to publish one

Optional knobs worth knowing exist

Variable	Why you might set it
`NBADB_LOG_DIR`	Move logs away from the default `logs/` folder
`NBADB_REQUEST_TIMEOUT`	Override the extractor request timeout without changing code
`NBADB_PROXY_URLS`, `NBADB_PROXY_USER`, `NBADB_PROXY_PASS`	Point extraction at explicit proxies or authenticated SOCKS5 proxies
`KAGGLE_USERNAME`, `KAGGLE_KEY`	Enable `nbadb download` and `nbadb upload` workflows

All settings ship with defaults. Only add .env entries for the knobs you actually want to change.

Sanity-check flow

First possession checklist

The fastest path to a usable local dataset

If your goal is…	Take this route
Use nbadb as fast as possible	Install from PyPI, then run `nbadb download`
Build everything locally from source	Install from source, then run `nbadb init`
Confirm the CLI works before any long-running data command	Run `nbadb --help`, then `nbadb schema` after data exists

1. Build or fetch data

Choose one lane:

# Full historical build
nbadb init

# Pull the latest published dataset into your data directory
nbadb download

nbadb init is the full historical rebuild. If you want the fastest path to a usable local dataset, nbadb download is usually the quicker opening possession.

2. Inspect the floor

nbadb schema
nbadb status

Use this pairing when you want a quick “did install actually produce a usable warehouse?” answer:

nbadb schema lists discovered public tables.
nbadb status shows pipeline watermarks, journal summary, and table metadata.

3. Verify data quality

nbadb scan --severity warning

Use scan to verify the dataset has no missing tables, data gaps, or quality issues. This replaces the older run-quality command with broader coverage.

4. Run the standard refresh play

nbadb daily

That command refreshes the current season, looks back NBADB_DAILY_LOOKBACK_DAYS days by default, updates active players and teams, and then rebuilds downstream tables in replace mode.

What lands in your data directory

By default, nbadb writes into nbadb/.

Default output map

Path	What it is for	Typical first use
`nba.duckdb`	Primary local warehouse	Querying, status inspection, and quality checks
`nba.sqlite`	Portable single-file export	Sharing and broad tool compatibility
`parquet/`	Columnar export lane	DataFrame-heavy and analytics workflows
`csv/`	Flat-file export lane	Simple ingestion into tools that expect CSV

Portable export

`nba.sqlite`

A single-file SQLite database for broad tool compatibility and easy sharing.

Primary local warehouse

`nba.duckdb`

The DuckDB database used for analytics, status inspection, quality checks, and local SQL work.

Columnar export lane

`parquet/`

One directory per exported table for DataFrame-friendly and analytics-heavy workflows.

Wide compatibility lane

`csv/`

One CSV per exported table for tools that prefer simple flat-file ingestion.

Common install-time decisions

Decision	Reach for this	Why
Put data somewhere other than `nbadb/`	`--data-dir` for one-off runs, `NBADB_DATA_DIR` for a lasting default	Keeps multiple datasets or non-default storage locations cleanly separated
Change what `init` and `export` write	`NBADB_FORMATS` or repeatable `--format` on supported commands	Lets you trim disk usage or match downstream tool expectations
Enable proxies	`NBADB_PROXY_ENABLED=true` plus proxy settings	Only needed when you are explicitly addressing extraction-network behavior
Use published datasets instead of building from scratch	`nbadb download` with Kaggle credentials configured	Usually the shortest route to a usable local dataset

Which page should you read next?

If you want to…	Go here
Learn what `daily`, `monthly`, `status`, `scan`, and other commands do	CLI Reference
Understand raw → staging → star and the public table families	Architecture
Start querying with the easiest analysis-ready surface	Analytics Quickstart
Download from or publish to Kaggle	Kaggle Setup

Docs contributors: generated artifacts boundary

The docs site mixes hand-written pages with generated reference artifacts. See the Docs boundary section on the Architecture page for the full list of generated artifacts and how to regenerate them.

Installation

Quick package install

Contributor source setup

Env and defaults

Sanity-check the setup

PyPI route

Source route

`nba.sqlite`

`nba.duckdb`

`parquet/`

`csv/`

Stay in the same possession

Architecture

CLI Reference

On this page

Installation

Quick package install

Contributor source setup

Env and defaults

Sanity-check the setup

PyPI route

Source route

`nba.sqlite`

`nba.duckdb`

`parquet/`

`csv/`

Stay in the same possession

Architecture

CLI Reference

On this page