Playbook
Reading the Schema
How to navigate the nbadb star schema -- dimensions, facts, bridges, and analytics views explained.
Reading the Schema
The nbadb warehouse follows a star schema design. Every table has a prefix that tells you its role. Once you can read the prefix, you can navigate the entire warehouse without memorizing individual table names.
The warehouse table families
| Prefix | Role | Example | What it stores |
|---|---|---|---|
dim_ | Dimension | dim_player | Slowly changing descriptors -- who, what, where. One row per entity. |
fact_ | Fact | fact_player_game_traditional | Measured events -- one row per occurrence (player-game, shot, play). |
bridge_ | Bridge | bridge_game_official | Many-to-many links between dimensions and facts. |
agg_ | Aggregate | agg_player_season | Pre-computed roll-ups built from facts and dimensions. |
analytics_ | Analytics output | analytics_player_game_complete | Wide, query-ready surfaces built for common analysis patterns. |
How star schema joins work
Dimensions sit at the tips of a star; facts sit in the centre. You always join from a fact outward to dimensions using surrogate keys like player_id or team_id.
SELECT p.full_name, t.abbreviation, f.pts, f.reb, f.ast
FROM fact_player_game_traditional f
JOIN dim_player p ON f.player_id = p.player_id
JOIN dim_team t ON f.team_id = t.team_id;Bridges resolve many-to-many relationships -- for example, bridge_game_official connects games to their three referees without duplicating rows in either table.
Navigating the docs pages
Start with a curated page to understand intent, then drill into generated pages for exact column contracts.
| Page | When to use it |
|---|---|
| Schema Reference | Understand which table tiers exist and how they relate. |
| Data Dictionary | Look up a column name, suffix convention, or stat abbreviation. |
| ER Diagram | Visualise foreign-key relationships between tables. |
| Table Lineage | Trace a table back through staging to its NBA API source endpoint. |
Reading the ER diagram
The auto-generated ER diagram uses standard notation:
- Boxes are tables. The header shows the table name; rows list columns.
- Lines show foreign-key joins. Follow them to see which dimension a fact references.
- Crow's-foot marks (the fork at one end) indicate a one-to-many relationship: one dimension row maps to many fact rows.
Start from a fact table you care about and follow every line outward to discover the dimensions you can join.
Column naming conventions
Every column name in the warehouse encodes its purpose through a suffix. Learn the suffixes and you can read any table without checking the data dictionary first.
| Suffix | Meaning | Example column | Notes |
|---|---|---|---|
_id | Surrogate key | player_id, team_id | Integer assigned by nbadb. Use for joins. |
_name | Human-readable label | player_name, arena_name | Display text -- never join on these. |
_pct | Percentage (0--100) | fg_pct, win_pct | Stored as a float. 0.456 means 45.6%. |
_per48 | Per-48-minute rate | pts_per48, reb_per48 | Pace-adjusted counting stat. |
_rank | Ordinal rank | pts_rank, conf_rank | Integer. Lower is better (1 = first). |
_flag | Boolean indicator | is_active, is_current | true / false. Often used in WHERE clauses. |
_date | Calendar date | game_date, birth_date | DATE type, no time component. |
_ts | Timestamp | created_ts, updated_ts | Full datetime with timezone. |
_sk | Surrogate key (SCD2) | player_sk | Versioned key for slowly changing dimensions. |
When you see both player_id and player_sk on the same dimension,
player_id is the durable business key (same across all versions of that
player) and player_sk is the version-specific surrogate (unique per SCD2
row). Facts join on player_id; use player_sk only when you need a specific
historical snapshot.
Stat abbreviations in measure columns
Fact and aggregate tables use NBA-standard stat abbreviations as column names. Here are the most common ones:
| Column | Stat | Column | Stat |
|---|---|---|---|
pts | Points | blk | Blocks |
reb | Rebounds | tov | Turnovers |
ast | Assists | pf | Personal fouls |
stl | Steals | min | Minutes played |
fgm / fga | Field goals made / attempted | fg3m / fg3a | Three-pointers made / attempted |
ftm / fta | Free throws made / attempted | plus_minus | Plus/minus differential |
Common pitfalls
Even experienced analysts trip over these four patterns. Tape-study them before writing your first query.
1. Joining through bridge tables
Bridge tables resolve many-to-many relationships. Always join dimension to bridge to fact -- not the other way around.
-- Correct: dim → bridge → fact
SELECT o.official_name, COUNT(*) AS games_reffed
FROM dim_official o
JOIN bridge_game_official bgo ON o.official_id = bgo.official_id
JOIN fact_player_game_traditional f ON bgo.game_id = f.game_id
GROUP BY o.official_name;If you skip the bridge and try to join a dimension directly to a fact that lacks the foreign key, you will get zero rows or a silent cartesian product.
2. SCD2 temporal filters
Two dimensions use Slowly Changing Dimension Type 2 tracking: dim_player and dim_team_history. Each entity can have multiple rows -- one per historical version. The is_current flag marks the active row.
-- Always filter unless you want historical versions
SELECT full_name, team_id, position
FROM dim_player
WHERE is_current = true;If you omit the is_current filter, you will double- or triple-count players who changed teams mid-career. Only drop the filter when you intentionally need the change history (for example, tracking when a player was traded).
3. Surrogate keys vs natural keys
The warehouse mints its own integer surrogate keys (player_id, team_id, game_id). The NBA API also assigns natural keys (person_id for players, team_id from the API). They are not the same value.
| Key type | Column example | Source | Use for |
|---|---|---|---|
| Surrogate | player_id (nbadb) | nbadb pipeline | All warehouse joins |
| Natural | person_id (NBA API) | stats.nba.com | Cross-referencing external data |
Never join a fact to a dimension using the NBA natural key. Use the nbadb surrogate key that both tables share. The natural key exists solely for traceability back to the source API.
4. Season year format
Seasons are always stored as a hyphenated string, never an integer.
-- Correct
WHERE season = '2024-25'
-- Wrong -- this filters on the integer 1999, not a season
WHERE season = 2024The format follows the NBA convention: the first year is when the season starts (October) and the second is when it ends (June). If you need to do arithmetic, extract the leading four digits and cast: CAST(LEFT(season, 4) AS INTEGER).
Suggested learning path
If you are new to nbadb, walk through these five stops in order. Each one builds on the last.
Stop 1 -- Read a dimension: dim_player
Open the schema page and scan the columns. Identify the surrogate key (player_id), the SCD2 versioning columns (player_sk, valid_from, valid_to, is_current), and the descriptive fields (full_name, position, is_active). This is your template for every dimension in the warehouse.
Stop 2 -- Read a fact: fact_player_game_traditional
This is the core box-score fact table. Find the foreign keys (player_id, team_id, game_id) and compare them to the dimension you just read. Then scan the measure columns (pts, reb, ast, fg_pct). Notice the grain: one row per player per game.
Stop 3 -- Join them together
Write a two-table join in the SQL Playground:
SELECT p.full_name, g.game_date, f.pts, f.reb, f.ast
FROM fact_player_game_traditional f
JOIN dim_player p ON f.player_id = p.player_id
JOIN dim_game g ON f.game_id = g.game_id
WHERE p.is_current = true
AND p.full_name = 'LeBron James'
ORDER BY g.game_date DESC
LIMIT 10;Pay attention to the is_current = true filter -- without it you would get duplicate rows from old SCD2 versions.
Stop 4 -- Read an aggregate: agg_player_season
This table rolls the per-game fact into season totals. Compare its grain (one row per player-season) to the fact's grain (one row per player-game). Aggregates are pre-computed so you do not need to write GROUP BY for common roll-ups.
Stop 5 -- Explore a bridge: bridge_game_official
Look at the columns: game_id and official_id. This table exists because each game has three referees -- a one-to-many relationship that cannot live on the fact without row duplication. Try joining dim_official through the bridge to fact_player_game_traditional to answer "how many games has each referee worked?"
After that progression, pick any analytics view (e.g., analytics_player_game_complete) and compare its wide, query-ready shape to the normalized tables you just explored. Analytics views are the "fast break" option -- pre-joined and ready for dashboards.
Where to go next
| Goal | Page |
|---|---|
| Practice SQL against sample data | SQL Playground |
| Run queries on the real warehouse | Analytics Quickstart |
| Look up a specific stat definition | Data Dictionary |
| Explore the full table inventory | Schema Reference |
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.
Analytics Quickstart
Land quick wins fast and move from setup to analysis with intent.
SQL Playground
Rehearse DuckDB syntax and query structure in the browser before touching the full warehouse.
DuckDB Query Examples
Move from the browser sandbox into real warehouse query patterns and analyst-ready examples.
