Stat Legend
Data Dictionary
Scorer's-table lexicon for column meanings, naming patterns, and generated field inventories
Data Dictionary
Think of this section as the scorer's table for the warehouse: the place you go when a stat abbreviation, column suffix, or layer name looks familiar but not familiar enough.
The split in this section is intentional. The hand-authored pages explain meaning, naming habits, and reading strategy. The generated pages answer the contract question: which exact fields exist on a given tier right now.
Use the curated pages for interpretation and navigation. Use the generated tier pages when you need the exact field inventory for a specific schema-backed layer. If you are deciding what a field means, start curated. If you are checking whether a field exists, go generated.
Choose the decoder
Glossary
Start with Glossary when the
problem is a metric, acronym, formula, or box-score shorthand such as
TS%, PIE, or PPP.
Field Reference
Start with Field Reference when the question is really about naming habits: keys, suffixes, home/visitor labels, rating names, or row discriminators.
Tier inventories
Ownership trail
Jump to Lineage or Schema Reference when the real blocker is not meaning but ownership, join path, or transform lineage.
If you only remember one lookup order, use this one.
flowchart LR
Term["Stat term or acronym"] --> Glossary["Glossary"]
Name["Column name or suffix"] --> FieldRef["Field Reference"]
Exists["Need to know whether a field exists"] --> Tier["Raw / Staging / Star inventory"]
Ownership["Need to know where the field came from"] --> Lineage["Lineage / Schema Reference"]
style Glossary fill:#e8f5e9
style FieldRef fill:#fff3e0
style Tier fill:#e1f5fe
style Lineage fill:#fce4ecText fallback: use Glossary for meaning, Field Reference for naming patterns, the generated tier pages for exact field inventories, and Lineage or Schema Reference when you need the field's owner or dependency chain.
Fastest lookup by what you have in hand
| If you have... | Start here | Why this is the fastest lane |
|---|---|---|
| A stat term, acronym, or formula | Glossary | It decodes basketball analytics language before you worry about table ownership |
| A column name that looks familiar but unclear | Field Reference | It explains how keys, suffixes, split labels, and row types usually behave |
| A raw endpoint-shaped field | Raw | That generated page reflects the source-near inventory |
| A normalized staging column | Staging | That generated page reflects the warehouse-ready staging inventory |
| A public warehouse field | Star | That generated page reflects the analytics-facing contract surface |
| No clear starting point | Glossary, then Field Reference, then the matching generated tier page | Meaning first, pattern second, inventory last |
Curated vs generated boundary
| Surface | Owns | Best for | Maintenance path |
|---|---|---|---|
| This index, Glossary, and Field Reference | Interpretation | Meaning, naming habits, common traps, and route-finding | Hand-authored |
| Raw, Staging, and Star | Inventory | Exact field presence for each schema-backed layer | Regenerate, do not hand-edit |
Layer and prefix guide
| Prefix | Layer | Example | What it usually means |
|---|---|---|---|
raw_ | Raw schema/reference object | raw_boxscoretraditionalv3 | Endpoint-shaped payload contract closest to source naming |
stg_ | DuckDB staging target | stg_boxscoretraditionalv3__player_stats | Cleaned, typed, warehouse-ready input layer |
dim_ | Dimension | dim_player, dim_team, dim_game | Context and controlled vocabulary around the facts |
fact_ | Fact | fact_play_by_play, fact_team_game | Measured events, game lines, dashboards, or specialty outputs |
bridge_ | Bridge | bridge_game_official, bridge_play_player | Many-to-many helper that prevents repeated columns or duplicate joins |
agg_ | Aggregate | agg_player_season, agg_team_season | Reusable rollups for common analytical questions |
analytics_ | Analytics view/table | analytics_player_game_complete | Pre-joined shortcut for everyday analyst workflows |
Read a column in this order
- Find the grain anchor:
player_id,team_id,game_id,season_year, or lineup/group keys. - Check the row-type columns:
split_type,detail_type,summary_type,tracking_type, or similar. - Only then read the measures: percentages, totals, ratings, ranks, and rolling windows.
High-signal naming patterns
| Pattern | Usually means... | Example |
|---|---|---|
<entity>_id | business key | player_id, team_id, game_id |
<entity>_sk | history-aware surrogate key | player_sk |
_pct | decimal percentage | fg_pct, ts_pct |
total_ / avg_ / _rank | totals, averages, and rankings | total_pts, avg_ast, reb_rank |
is_ | boolean flag | is_current, is_weekend |
split_type / detail_type / summary_type / tracking_type | row meaning discriminator | tells you what kind of row you are reading |
Generated pages
Refresh the generated tier pages in this section with:
uv run nbadb docs-autogen --docs-root docs/content/docsThat command regenerates:
data-dictionary/raw.mdxdata-dictionary/staging.mdxdata-dictionary/star.mdx
It also refreshes the matching schema reference pages plus ER and lineage artifacts. Keep this index, the glossary, and the field reference hand-authored.
Related docs
- Schema Reference — curated family guides for dimensions, facts, bridges, aggregations, and analytics views
- Dimensions — identity, history, calendar, and lookup context tables
- Facts & Bridges — measurement tables and many-to-many connectors
- Analytics Views — 12 pre-joined convenience surfaces
- Relationships — join playbook with SQL examples
- Lineage — transform dependency chain and data flow
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.
