Stat Legend
Data Dictionary
Scorer's-table lexicon for column meanings, naming patterns, and generated field inventories
Data Dictionary
Think of this section as the scorer's table for the warehouse: the place you go when a stat abbreviation, column suffix, or layer name looks familiar but not familiar enough.
The split in this section is intentional. The hand-authored pages explain meaning, naming habits, and reading strategy. The generated pages answer the contract question: which exact fields exist on a given tier right now.
Use the curated pages for interpretation and navigation. Use the generated tier pages when you need the exact field inventory for a specific schema-backed layer. If you are deciding what a field means, start curated. If you are checking whether a field exists, go generated.
Start by question
If you only have one minute, pick the lane first and the page second.
Decode a metric, acronym, or formula
Start with Glossary when the question is about
basketball shorthand such as TS%, PIE, PPP, or box-score abbreviations.
Read a dense schema page faster
Open Field Reference when the question is really about field patterns: keys, suffixes, home/visitor labels, rating names, or discriminator columns.
Verify raw-layer field names
Use Raw for endpoint-shaped field inventories closest to source naming and payload structure.
Verify normalized staging columns
Use Staging when you need the cleaned, typed, warehouse-ready field list feeding transforms.
Verify public warehouse columns
Use Star for the final schema-backed surface exposed to analysis across dimensions, facts, bridges, aggregates, and analytics outputs.
Need the shortest possible route?
If you do not know where to start, check Glossary for meaning, Field Reference for patterns, then drop into the generated tier page that matches the layer you are touching.
Fast path by situation
Choose the right lookup sheet
| If you need to answer... | Start here | Why |
|---|---|---|
| What does this metric or shorthand mean? | Glossary | It decodes basketball analytics terms, formulas, and common abbreviations |
| How should I read key names, suffixes, and recurring column patterns? | Field Reference | It explains the warehouse's high-signal field families and naming habits |
| Which exact fields exist in the raw tier? | Raw | Generated inventory for extraction-shaped schemas |
| Which exact fields exist after normalization? | Staging | Generated inventory for DuckDB-ready staging schemas |
| Which exact fields exist on the public schema-backed surface? | Star | Generated inventory for final star-tier schemas |
Curated pages vs generated pages
Curated pages: how to read the warehouse
These are the hand-authored pages in this section:
- Glossary for metric meaning, formulas, and stat-family shorthand
- Field Reference for keys, suffixes, role labels, and discriminator columns
- this index page for route-finding, section boundaries, and maintenance expectations
Generated pages: what the schema exposes
These are the command-owned pages in this section:
- Raw for source-shaped extraction fields
- Staging for cleaned, typed normalization fields
- Star for the final analytics-facing surface
Layer and prefix guide
| Prefix | Layer | Example | What it usually means |
|---|---|---|---|
raw_ | Raw schema/reference object | raw_boxscoretraditionalv3 | Endpoint-shaped payload contract closest to source naming |
stg_ | DuckDB staging target | stg_boxscoretraditionalv3__player_stats | Cleaned, typed, warehouse-ready input layer |
dim_ | Dimension | dim_player, dim_team, dim_game | Context and controlled vocabulary around the facts |
fact_ | Fact | fact_play_by_play, fact_team_game | Measured events, game lines, dashboards, or specialty outputs |
bridge_ | Bridge | bridge_game_official, bridge_play_player | Many-to-many helper that prevents repeated columns or duplicate joins |
agg_ | Aggregate | agg_player_season, agg_team_season | Reusable rollups for common analytical questions |
analytics_ | Analytics view/table | analytics_player_game_complete | Pre-joined shortcut for everyday analyst workflows |
Column naming conventions
The short version
- Business keys:
<entity>_idsuch asplayer_id,team_id, andgame_id - History-aware surrogate keys:
<entity>_skwhere SCD Type 2 handling matters - Percentages and rates:
_pctsuffix for decimal percentages, plus names likepace,pie, orpppfor established metrics - Aggregations:
total_for totals,avg_for averages,_rankfor rank fields, and rolling-window names such aspts_roll5 - Flags:
is_prefixes for booleans likeis_currentandis_weekend - Context labels: discriminator columns such as
split_type,detail_type,summary_type, andtracking_typeoften define the meaning of a row
The 2am reading order
- Find the grain anchor:
player_id,team_id,game_id,season_year, or lineup/group keys. - Check the row-type columns:
split_type,detail_type,summary_type,tracking_type, or similar. - Only then read the measures: percentages, totals, ratings, ranks, and rolling windows.
Generated pages
The generated tier pages in this section come from schema metadata and should be refreshed with:
uv run nbadb docs-autogen --docs-root docs/content/docsThat command regenerates:
data-dictionary/raw.mdxdata-dictionary/staging.mdxdata-dictionary/star.mdx
It also refreshes the matching schema reference pages plus ER and lineage artifacts. Keep this index, the glossary, and the field reference hand-authored; treat the generated tier pages as command-owned outputs with a different job and a different maintenance path.
Related docs
Keep moving
Stay in the same possession
Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.