Playbook
Analytics Quickstart
Opening possession for analysts who want to get from download to first query in minutes.
Analytics Quickstart
Get from dataset download to a real answer in a few minutes. This page is the opening possession: install a lightweight stack, open the DuckDB file, and leave with one leaderboard, one shot read, and one standings query you can reuse.
If you want to rehearse the SQL shape before downloading anything, warm up in SQL Playground. It runs DuckDB-WASM in the browser with self-contained NBA-flavored sample rows, so you can practice the structure first and connect to the full dataset second.
Choose one first possession
| If you have today… | Start here | First useful action | Stop when |
|---|---|---|---|
| A browser tab and no local setup | SQL Playground | Run one sample query and change one filter, sort, or aggregate so the SQL shape is yours | You understand the query skeleton you want to port |
| A local file or download path | Possession 3 | Connect to nba.duckdb and run SHOW TABLES | You can see analytics_*, dim_*, and fact_* surfaces |
| A basketball question and need one reusable answer | Possession 4 | Run one option end to end and keep the output dataframe or notebook cell | You have one working pattern you can reuse |
Win 1
Prove that the warehouse opens cleanly and that SHOW TABLES
returns the surfaces you expect.
Win 2
Leave with one pattern you can reuse: a leaderboard, a shot-zone slice, or a season snapshot.
Win 3
Know whether the next possession belongs in DuckDB Query Examples, Player Comparison, Shot Chart Analysis, or Schema Reference.
Pick your route
Rehearse the query shape first
Open SQL Playground if you want browser-only reps before touching the local warehouse.
Prove the dataset is usable locally
Start at Possession 1 and
stop once SHOW TABLES returns the surfaces you expect.
Grab one working pattern fast
Jump to Possession 4 if you already have the file and just need one reusable analysis pattern.
Branch deeper after the first result
Finish on Where to go next once you know whether the next possession is SQL, visualization, comparison, or schema reading.
Coaching note: Pick one lane only. Browser-only reps belong in
SQL Playground; live-data validation starts
with SHOW TABLES; deeper analysis starts after you have one working query in
hand. Start with the denormalized analytics_* surfaces when you need the
fastest first read, then drop to fact tables once you know which possession
deserves deeper inspection.
Run the opening four possessions
Possession 1 — Install dependencies
$ pip install polars duckdb kagglehubThis gives you one SQL engine (duckdb), one DataFrame tool (polars), and one download path (kagglehub) without a heavier local stack.
Possession 2 — Download the data
import kagglehub
path = kagglehub.dataset_download("wyattowalsh/basketball")
print(f"Data at: {path}")You want the printed path because the next possession reuses it directly.
Possession 3 — Connect and confirm the floor
import duckdb
conn = duckdb.connect(f"{path}/nba.duckdb")
# See all tables
tables = conn.sql("SHOW TABLES").fetchall()
for t in tables:
print(t[0])What good looks like: you can see familiar surfaces such as dimension tables (dim_*), fact tables (fact_*), and convenience analytics outputs (analytics_*).
Possession 4 — Run your first question
Pick one of these three reads before you branch out. Each one gives you a reusable pattern for joins, filters, and aggregation.
Option A
Use this when you need an all-time or season-scoped ranking and want a clean fact-plus-dimension aggregation pattern.
Option B
Use this when you need a player slice with court coordinates you can later plot or group by zone.
Option C
Use this when you need a current-season team snapshot that already lives in a purpose-built fact table.
Option A — Who are the all-time scoring leaders?
df = conn.sql("""
SELECT
p.full_name AS player,
COUNT(*) AS games,
SUM(b.pts) AS total_points,
ROUND(AVG(b.pts), 1) AS ppg
FROM fact_player_game_traditional b
JOIN dim_player p ON b.player_id = p.player_id
JOIN dim_game g ON b.game_id = g.game_id
WHERE p.is_current = true
AND g.season_type = 'Regular Season'
GROUP BY 1
ORDER BY total_points DESC
LIMIT 10
""").pl()
print(df)Use this pattern when you need a leaderboard: join a fact table, add player or team labels from dimensions, then aggregate.
Option B — What does a player's shot chart look like?
import polars as pl
shots = conn.sql("""
SELECT
sc.loc_x, sc.loc_y,
sc.shot_made_flag,
sc.shot_zone_basic
FROM fact_shot_chart sc
JOIN dim_player p ON sc.player_id = p.player_id
WHERE p.is_current = true
AND p.full_name = 'LeBron James'
AND sc.season_year = '2024-25'
""").pl()
print(f"Total shots: {len(shots)}")
print(shots.group_by("shot_zone_basic").agg([
pl.count().alias("attempts"),
pl.col("shot_made_flag").sum().alias("makes"),
]))Use this pattern when you need a player-level slice with location columns you can later plot.
Option C — How do teams compare this season?
standings = conn.sql("""
SELECT
t.full_name AS team,
s.conference,
s.wins, s.losses,
ROUND(s.win_pct, 3) AS pct,
s.conf_rank
FROM fact_standings s
JOIN dim_team t ON s.team_id = t.team_id
WHERE s.season_year = '2024-25'
AND s.season_type = 'Regular Season'
ORDER BY s.conference, s.conf_rank
""").pl()
print(standings)Use this pattern when you need a current-season snapshot that already lives in a purpose-built fact table.
Where to go next
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.
