Jump Ball
Installation
Opening whistle for installing, configuring, and sanity-checking nbadb.
Installation
Use this page as the pregame checklist for getting nbadb onto your machine and confirming the first possession works before you move into queries, refreshes, or docs work.
Quick navigation
Quick package install
Jump to Install from PyPI if you just want the CLI on your machine fast.
Contributor source setup
Use Install from source if you will edit code, docs, or generated artifacts.
Env and defaults
Go straight to Configuration defaults for the NBADB_ knobs that matter first.
Sanity-check the setup
Skip to First possession checklist if install already succeeded and you need to validate it.
Preflight
Before you install:
- You need Python 3.12 or newer.
- Use uv for source installs and contributor workflows.
- Use
pipif you want the quickest path to the packaged CLI. - If you plan to use Kaggle download/upload flows, keep your Kaggle credentials available for later.
Pick an install route
PyPI route
Analysts and users who want the CLI without cloning the repo. Install the package, confirm nbadb --help works, then move into the first build or download flow.
Source route
Contributors, operators, and docs writers working inside the repo. This route keeps the checked-out code, docs generator, and local CLI aligned.
Install routes in one glance
| Route | Best when… | First commands | What you get |
|---|---|---|---|
| PyPI | You want the packaged CLI quickly and do not need a repo checkout | pip install nbadb | The installed CLI and defaults needed for normal use |
| Source | You will edit code, docs, generated artifacts, or local project config | uv sync --extra dev | A checked-out repo, contributor tooling, docs generator, and repo-aligned CLI |
Install from PyPI
pip install nbadb
nbadb --helpUse this route when you do not need the repository checkout itself.
Install from source
git clone https://github.com/wyattowalsh/nbadb.git
cd nbadb
uv sync --extra dev
uv run nbadb --helpIf you are contributing docs or code, prefer the source route. It keeps the docs generator, tests, and local CLI aligned with the checked-out code.
Configuration defaults
nbadb reads environment variables with the NBADB_ prefix. The fastest way to inspect or override settings is to start from the checked-in example file:
cp .env.example .envCore settings you will usually care about first
| Variable | Purpose | Default |
|---|---|---|
NBADB_DATA_DIR | Root folder for local database files and exported data | nbadb |
NBADB_FORMATS | Default export formats for init and export | ['sqlite', 'duckdb', 'csv', 'parquet'] |
NBADB_DAILY_LOOKBACK_DAYS | Recent-game window used by nbadb daily | 7 |
NBADB_KAGGLE_DATASET | Dataset slug used by download/upload flows | wyattowalsh/basketball |
NBADB_PROXY_ENABLED | Enable proxy rotation support | false |
Fast decisions for new installs
| Question | Default answer | Change it when… |
|---|---|---|
| Where should files land? | Keep the default nbadb/ data directory | You want the dataset somewhere else or need multiple local copies |
| Which formats should export? | Keep all four: sqlite, duckdb, csv, parquet | You only want a subset for local storage or downstream tooling |
| Should proxies be enabled? | No | You are explicitly solving extraction-network or rate-limit issues |
| Should I use Kaggle credentials now? | Not unless you need download or upload | You want the fastest path to a published dataset or need to publish one |
Optional knobs worth knowing exist
| Variable | Why you might set it |
|---|---|
NBADB_LOG_DIR | Move logs away from the default logs/ folder |
NBADB_REQUEST_TIMEOUT | Override the extractor request timeout without changing code |
NBADB_PROXY_URLS, NBADB_PROXY_USER, NBADB_PROXY_PASS | Point extraction at explicit proxies or authenticated SOCKS5 proxies |
KAGGLE_USERNAME, KAGGLE_KEY | Enable nbadb download and nbadb upload workflows |
All settings ship with defaults. Only add .env entries for the knobs you actually want to change.
First possession checklist
The fastest path to a usable local dataset
| If your goal is… | Take this route |
|---|---|
| Use nbadb as fast as possible | Install from PyPI, then run nbadb download |
| Build everything locally from source | Install from source, then run nbadb init |
| Confirm the CLI works before any long-running data command | Run nbadb --help, then nbadb schema after data exists |
1. Build or fetch data
Choose one lane:
# Full historical build
nbadb init# Pull the latest published dataset into your data directory
nbadb downloadnbadb init is the full historical rebuild. If you want the fastest path to a usable local dataset, nbadb download is usually the quicker opening possession.
2. Inspect the floor
nbadb schema
nbadb statusUse this pairing when you want a quick “did install actually produce a usable warehouse?” answer:
nbadb schemalists discovered public tables.nbadb statusshows pipeline watermarks, journal summary, and table metadata.
3. Run the standard refresh play
nbadb dailyThat command refreshes the current season, looks back NBADB_DAILY_LOOKBACK_DAYS days by default, updates active players and teams, and then rebuilds downstream tables in replace mode.
What lands in your data directory
By default, nbadb writes into nbadb/.
Default output map
| Path | What it is for | Typical first use |
|---|---|---|
nba.duckdb | Primary local warehouse | Querying, status inspection, and quality checks |
nba.sqlite | Portable single-file export | Sharing and broad tool compatibility |
parquet/ | Columnar export lane | DataFrame-heavy and analytics workflows |
csv/ | Flat-file export lane | Simple ingestion into tools that expect CSV |
`nba.sqlite`
A single-file SQLite database for broad tool compatibility and easy sharing.
`nba.duckdb`
The DuckDB database used for analytics, status inspection, quality checks, and local SQL work.
`parquet/`
One directory per exported table for DataFrame-friendly and analytics-heavy workflows.
`csv/`
One CSV per exported table for tools that prefer simple flat-file ingestion.
Common install-time decisions
| Decision | Reach for this | Why |
|---|---|---|
Put data somewhere other than nbadb/ | --data-dir for one-off runs, NBADB_DATA_DIR for a lasting default | Keeps multiple datasets or non-default storage locations cleanly separated |
Change what init and export write | NBADB_FORMATS or repeatable --format on supported commands | Lets you trim disk usage or match downstream tool expectations |
| Enable proxies | NBADB_PROXY_ENABLED=true plus proxy settings | Only needed when you are explicitly addressing extraction-network behavior |
| Use published datasets instead of building from scratch | nbadb download with Kaggle credentials configured | Usually the shortest route to a usable local dataset |
Which page should you read next?
| If you want to… | Go here |
|---|---|
Learn what daily, monthly, full, status, and run-quality actually do | CLI Reference |
| Understand raw → staging → star and the public table families | Architecture |
| Start querying with the easiest analysis-ready surface | Analytics Quickstart |
| Download from or publish to Kaggle | Kaggle Setup |
Docs contributors: generated artifacts boundary
The docs site mixes hand-written pages with generated reference artifacts. When generator-owned docs drift from the code, refresh them with:
uv run nbadb docs-autogen --docs-root docs/content/docsThat command regenerates schema/{raw,staging,star}-reference.mdx, data-dictionary/{raw,staging,star}.mdx, diagrams/er-auto.mdx, lineage/lineage-auto.mdx, and lineage/lineage.json. Do not hand-edit those generated outputs.
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.