nbadbArena Data Lab
Core DocsArena MapInstallation21 waypoints

Jump Ball

Installation

Opening whistle for installing, configuring, and sanity-checking nbadb.

Installation

Use this page as the pregame checklist for getting nbadb onto your machine and confirming the first possession works before you move into queries, refreshes, or docs work.

Python
3.12+
required for the current package and source workflow
Install routes
2
PyPI for quick use, source checkout for contributors
Default exports
4
sqlite, duckdb, csv, and parquet
Daily lookback
7 days
default recent-game window for nbadb daily

Quick navigation

Entry surface

Quick package install

Jump to Install from PyPI if you just want the CLI on your machine fast.

Entry surface

Contributor source setup

Use Install from source if you will edit code, docs, or generated artifacts.

Entry surface

Env and defaults

Go straight to Configuration defaults for the NBADB_ knobs that matter first.

Entry surface

Sanity-check the setup

Skip to First possession checklist if install already succeeded and you need to validate it.

Preflight

Preflight

Before you install:

  • You need Python 3.12 or newer.
  • Use uv for source installs and contributor workflows.
  • Use pip if you want the quickest path to the packaged CLI.
  • If you plan to use Kaggle download/upload flows, keep your Kaggle credentials available for later.

Pick an install route

Best for

PyPI route

Analysts and users who want the CLI without cloning the repo. Install the package, confirm nbadb --help works, then move into the first build or download flow.

Best for

Source route

Contributors, operators, and docs writers working inside the repo. This route keeps the checked-out code, docs generator, and local CLI aligned.

Install routes in one glance

RouteBest when…First commandsWhat you get
PyPIYou want the packaged CLI quickly and do not need a repo checkoutpip install nbadbThe installed CLI and defaults needed for normal use
SourceYou will edit code, docs, generated artifacts, or local project configuv sync --extra devA checked-out repo, contributor tooling, docs generator, and repo-aligned CLI

Install from PyPI

pip install nbadb
nbadb --help

Use this route when you do not need the repository checkout itself.

Install from source

git clone https://github.com/wyattowalsh/nbadb.git
cd nbadb
uv sync --extra dev
uv run nbadb --help

If you are contributing docs or code, prefer the source route. It keeps the docs generator, tests, and local CLI aligned with the checked-out code.

Configuration defaults

nbadb reads environment variables with the NBADB_ prefix. The fastest way to inspect or override settings is to start from the checked-in example file:

cp .env.example .env

Core settings you will usually care about first

VariablePurposeDefault
NBADB_DATA_DIRRoot folder for local database files and exported datanbadb
NBADB_FORMATSDefault export formats for init and export['sqlite', 'duckdb', 'csv', 'parquet']
NBADB_DAILY_LOOKBACK_DAYSRecent-game window used by nbadb daily7
NBADB_KAGGLE_DATASETDataset slug used by download/upload flowswyattowalsh/basketball
NBADB_PROXY_ENABLEDEnable proxy rotation supportfalse

Fast decisions for new installs

QuestionDefault answerChange it when…
Where should files land?Keep the default nbadb/ data directoryYou want the dataset somewhere else or need multiple local copies
Which formats should export?Keep all four: sqlite, duckdb, csv, parquetYou only want a subset for local storage or downstream tooling
Should proxies be enabled?NoYou are explicitly solving extraction-network or rate-limit issues
Should I use Kaggle credentials now?Not unless you need download or uploadYou want the fastest path to a published dataset or need to publish one

Optional knobs worth knowing exist

VariableWhy you might set it
NBADB_LOG_DIRMove logs away from the default logs/ folder
NBADB_REQUEST_TIMEOUTOverride the extractor request timeout without changing code
NBADB_PROXY_URLS, NBADB_PROXY_USER, NBADB_PROXY_PASSPoint extraction at explicit proxies or authenticated SOCKS5 proxies
KAGGLE_USERNAME, KAGGLE_KEYEnable nbadb download and nbadb upload workflows

All settings ship with defaults. Only add .env entries for the knobs you actually want to change.

Sanity-check flow

First possession checklist

The fastest path to a usable local dataset

If your goal is…Take this route
Use nbadb as fast as possibleInstall from PyPI, then run nbadb download
Build everything locally from sourceInstall from source, then run nbadb init
Confirm the CLI works before any long-running data commandRun nbadb --help, then nbadb schema after data exists

1. Build or fetch data

Choose one lane:

# Full historical build
nbadb init
# Pull the latest published dataset into your data directory
nbadb download

nbadb init is the full historical rebuild. If you want the fastest path to a usable local dataset, nbadb download is usually the quicker opening possession.

2. Inspect the floor

nbadb schema
nbadb status

Use this pairing when you want a quick “did install actually produce a usable warehouse?” answer:

  • nbadb schema lists discovered public tables.
  • nbadb status shows pipeline watermarks, journal summary, and table metadata.

3. Run the standard refresh play

nbadb daily

That command refreshes the current season, looks back NBADB_DAILY_LOOKBACK_DAYS days by default, updates active players and teams, and then rebuilds downstream tables in replace mode.

What lands in your data directory

By default, nbadb writes into nbadb/.

Default output map

PathWhat it is forTypical first use
nba.duckdbPrimary local warehouseQuerying, status inspection, and quality checks
nba.sqlitePortable single-file exportSharing and broad tool compatibility
parquet/Columnar export laneDataFrame-heavy and analytics workflows
csv/Flat-file export laneSimple ingestion into tools that expect CSV
Portable export

`nba.sqlite`

A single-file SQLite database for broad tool compatibility and easy sharing.

Primary local warehouse

`nba.duckdb`

The DuckDB database used for analytics, status inspection, quality checks, and local SQL work.

Columnar export lane

`parquet/`

One directory per exported table for DataFrame-friendly and analytics-heavy workflows.

Wide compatibility lane

`csv/`

One CSV per exported table for tools that prefer simple flat-file ingestion.

Common install-time decisions

DecisionReach for thisWhy
Put data somewhere other than nbadb/--data-dir for one-off runs, NBADB_DATA_DIR for a lasting defaultKeeps multiple datasets or non-default storage locations cleanly separated
Change what init and export writeNBADB_FORMATS or repeatable --format on supported commandsLets you trim disk usage or match downstream tool expectations
Enable proxiesNBADB_PROXY_ENABLED=true plus proxy settingsOnly needed when you are explicitly addressing extraction-network behavior
Use published datasets instead of building from scratchnbadb download with Kaggle credentials configuredUsually the shortest route to a usable local dataset
If you want to…Go here
Learn what daily, monthly, full, status, and run-quality actually doCLI Reference
Understand raw → staging → star and the public table familiesArchitecture
Start querying with the easiest analysis-ready surfaceAnalytics Quickstart
Download from or publish to KaggleKaggle Setup

Docs contributors: generated artifacts boundary

The docs site mixes hand-written pages with generated reference artifacts. When generator-owned docs drift from the code, refresh them with:

uv run nbadb docs-autogen --docs-root docs/content/docs

That command regenerates schema/{raw,staging,star}-reference.mdx, data-dictionary/{raw,staging,star}.mdx, diagrams/er-auto.mdx, lineage/lineage-auto.mdx, and lineage/lineage.json. Do not hand-edit those generated outputs.

Keep moving

Stay in the same possession

Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.

Section hub

On this page