nbadbArena Data Lab
GuidesSet MenuKaggle Setup15 waypoints

Playbook

Kaggle Setup

Download, inspect, and publish nbadb through the Kaggle delivery lane.

Kaggle Setup

Use this guide as the loading dock for the public nbadb dataset on Kaggle: wyattowalsh/basketball.

Pick the right delivery format

If you want…Use…
A single portable database filenba.sqlite
Fast local SQL and inspectionnba.duckdb
Columnar files for Polars/Pandas/Arrowparquet/
Broadest compatibilitycsv/

nbadb download copies the latest Kaggle dataset into your configured data directory. If Kaggle provides nba.sqlite but not nba.duckdb, nbadb seeds DuckDB from the SQLite file automatically.

Choose your Kaggle route

If you need to…Start hereWhy
Get a ready-to-use local dataset through the CLIDownload via the nbadb CLIFastest path into the rest of the nbadb command surface
Control the download inside PythonDownload via kagglehubEasier notebook or script integration
Publish your own refreshed buildUpload your own buildHandles metadata generation and dataset upload

Download via the nbadb CLI

nbadb download

That command downloads the dataset, copies the files into your data directory, and makes the local folder ready for the rest of the CLI.

Download via kagglehub

import kagglehub

path = kagglehub.dataset_download("wyattowalsh/basketball")
print(f"Dataset downloaded to: {path}")

Use this route when you want direct control over download handling inside Python.

What lands on disk

A default local layout looks like this:

nbadb/
├── nba.sqlite
├── nba.duckdb
├── parquet/
│   └── <table>/...
├── csv/
│   └── <table>.csv
└── dataset-metadata.json
PathWhat it isReach for it when…
nba.sqlitePortable relational exportYou need maximum tool compatibility
nba.duckdbFast analytical local databaseYou want immediate SQL without file globs
parquet/Columnar table exportsYour workflow is Polars, Pandas, Arrow, or DuckDB-over-files
csv/Flat text exportsA downstream system cannot read DuckDB or Parquet
dataset-metadata.jsonKaggle dataset metadataYou are preparing or inspecting a publish handoff

Load the files in Python

FormatBest for
SQLitePortable inspection or compatibility-oriented tools
DuckDBFast SQL, joins, and ad hoc analysis
ParquetDataFrame-first or file-based analytical workflows

SQLite

import sqlite3

conn = sqlite3.connect("nbadb/nba.sqlite")
rows = conn.execute("SELECT * FROM dim_player LIMIT 5").fetchall()

DuckDB

import duckdb

conn = duckdb.connect("nbadb/nba.duckdb")
df = conn.sql("SELECT * FROM dim_player LIMIT 5").pl()

Parquet with Polars

import polars as pl

df = pl.read_parquet("nbadb/parquet/dim_player/dim_player.parquet")

Parquet with Pandas

import pandas as pd

df = pd.read_parquet("nbadb/parquet/dim_player/dim_player.parquet")

Upload your own build

nbadb upload

Preflight checklist

Confirm this firstWhy
Your target data directory already contains the dataset you want to publishupload publishes what is on disk
Kaggle credentials are available to the environment that kagglehub usesUpload cannot authenticate without them
NBADB_KAGGLE_DATASET points at the correct slug if you are not using the defaultMetadata and upload target must agree

Version notes and metadata

nbadb upload --message "Post-trade-deadline refresh"
  • The CLI default message is "Automated update".
  • nbadb ensures dataset-metadata.json exists before upload.
  • The metadata generator uses the configured Kaggle dataset slug as the dataset id.

Keep moving

Stay in the same possession

Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.

Section hub

On this page