Playbook
Analytics Quickstart
Opening possession for analysts who want to get from download to first query in minutes.
Analytics Quickstart
Get from dataset download to a real answer in a few minutes. This page is the opening possession: install a lightweight stack, open the DuckDB file, and leave with one leaderboard, one shot read, and one standings query you can reuse.
Pick your route
| If you need to… | Start here | Stop when… |
|---|---|---|
| Prove the dataset is usable locally | Install dependencies | SHOW TABLES returns the surfaces you expect |
| Grab one working analysis pattern fast | Run your first question | You have one query you can now swap filters into |
| Go deeper after the first result | Where to go next | You are on the guide that matches the next question |
Coaching note: Start with the denormalized analytics_* surfaces when you need the fastest first read, then drop to fact tables once you know which possession deserves deeper inspection.
Run the opening four possessions
Possession 1 — Install dependencies
pip install polars duckdb kagglehubThis gives you one SQL engine (duckdb), one DataFrame tool (polars), and one download path (kagglehub) without a heavier local stack.
Possession 2 — Download the data
import kagglehub
path = kagglehub.dataset_download("wyattowalsh/basketball")
print(f"Data at: {path}")You want the printed path because the next possession reuses it directly.
Possession 3 — Connect and confirm the floor
import duckdb
conn = duckdb.connect(f"{path}/nba.duckdb")
# See all tables
tables = conn.sql("SHOW TABLES").fetchall()
for t in tables:
print(t[0])What good looks like: you can see familiar surfaces such as dimension tables (dim_*), fact tables (fact_*), and convenience analytics outputs (analytics_*).
Possession 4 — Run your first question
Pick one of these three reads before you branch out. Each one gives you a reusable pattern for joins, filters, and aggregation.
Option A — Who are the all-time scoring leaders?
df = conn.sql("""
SELECT
p.first_name || ' ' || p.last_name AS player,
COUNT(*) AS games,
SUM(b.pts) AS total_points,
ROUND(AVG(b.pts), 1) AS ppg
FROM fact_box_score_player b
JOIN dim_player p ON b.player_id = p.player_id
JOIN dim_game g ON b.game_id = g.game_id
WHERE g.season_type = 'Regular Season'
GROUP BY 1
ORDER BY total_points DESC
LIMIT 10
""").pl()
print(df)Use this pattern when you need a leaderboard: join a fact table, add player or team labels from dimensions, then aggregate.
Option B — What does a player's shot chart look like?
import polars as pl
shots = conn.sql("""
SELECT
sc.loc_x, sc.loc_y,
sc.shot_made_flag,
sc.shot_zone_basic
FROM fact_shot_chart sc
JOIN dim_player p ON sc.player_id = p.player_id
JOIN dim_game g ON sc.game_id = g.game_id
WHERE p.last_name = 'James'
AND p.first_name = 'LeBron'
AND g.season_year = 2024
""").pl()
print(f"Total shots: {len(shots)}")
print(shots.group_by("shot_zone_basic").agg([
pl.count().alias("attempts"),
pl.col("shot_made_flag").sum().alias("makes"),
]))Use this pattern when you need a player-level slice with location columns you can later plot.
Option C — How do teams compare this season?
standings = conn.sql("""
SELECT
t.city || ' ' || t.nickname AS team,
s.conference,
s.wins, s.losses,
ROUND(s.win_pct, 3) AS pct,
s.playoff_rank
FROM fact_standings s
JOIN dim_team t ON s.team_id = t.team_id
WHERE s.season_id = '2024-25'
ORDER BY s.conference, s.playoff_rank
""").pl()
print(standings)Use this pattern when you need a current-season snapshot that already lives in a purpose-built fact table.
Where to go next
| Your next question | Best route |
|---|---|
| “Give me more SQL examples I can copy.” | DuckDB Query Examples |
| “Now I want to visualize shooting locations.” | Shot Chart Analysis |
| “I need side-by-side player reads.” | Player Comparison |
| “I need table-level documentation before I write more SQL.” | Schema Reference |
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.