Data Dictionary

Think of this section as the scorer's table for the warehouse: the place you go when a stat abbreviation, column suffix, or layer name looks familiar but not familiar enough.

The split in this section is intentional. The hand-authored pages explain meaning, naming habits, and reading strategy. The generated pages answer the contract question: which exact fields exist on a given tier right now.

Best first read

Curated

start with the glossary or field reference before diving into generated inventories

Command-owned pages

raw, staging, and star field inventories are generated from schema metadata

Core question

Meaning + owner

what does this field mean, and which tier owns it?

Use the curated pages for interpretation and navigation. Use the generated tier pages when you need the exact field inventory for a specific schema-backed layer. If you are deciding what a field means, start curated. If you are checking whether a field exists, go generated.

Start at the scorer's table

Choose the decoder

Meaning first

Glossary

Start with Glossary when the problem is a metric, acronym, formula, or box-score shorthand such as TS%, PIE, or PPP.

Pattern first

Field Reference

Start with Field Reference when the question is really about naming habits: keys, suffixes, home/visitor labels, rating names, or row discriminators.

Contract first

Tier inventories

Start with Raw, Staging, or Star when you need to verify whether a field exists on a specific tier right now.

Where it came from

Ownership trail

Jump to Lineage or Schema Reference when the real blocker is not meaning but ownership, join path, or transform lineage.

If you only remember one lookup order, use this one.

Source preview

flowchart LR
    Term["Stat term or acronym"] --> Glossary["Glossary"]
    Name["Column name or suffix"] --> FieldRef["Field Reference"]
    Exists["Need to know whether a field exists"] --> Tier["Raw / Staging / Star inventory"]
    Ownership["Need to know where the field came from"] --> Lineage["Lineage / Schema Reference"]
    style Glossary fill:#e8f5e9
    style FieldRef fill:#fff3e0
    style Tier fill:#e1f5fe
    style Lineage fill:#fce4ec

Text fallback: use Glossary for meaning, Field Reference for naming patterns, the generated tier pages for exact field inventories, and Lineage or Schema Reference when you need the field's owner or dependency chain.

Fastest lookup by what you have in hand

If you have...	Start here	Why this is the fastest lane
A stat term, acronym, or formula	Glossary	It decodes basketball analytics language before you worry about table ownership
A column name that looks familiar but unclear	Field Reference	It explains how keys, suffixes, split labels, and row types usually behave
A raw endpoint-shaped field	Raw	That generated page reflects the source-near inventory
A normalized staging column	Staging	That generated page reflects the warehouse-ready staging inventory
A public warehouse field	Star	That generated page reflects the analytics-facing contract surface
No clear starting point	Glossary, then Field Reference, then the matching generated tier page	Meaning first, pattern second, inventory last

Curated vs generated boundary

Surface	Owns	Best for	Maintenance path
This index, Glossary, and Field Reference	Interpretation	Meaning, naming habits, common traps, and route-finding	Hand-authored
Raw, Staging, and Star	Inventory	Exact field presence for each schema-backed layer	Regenerate, do not hand-edit

Read the layer labels

Layer and prefix guide

Prefix	Layer	Example	What it usually means
`raw_`	Raw schema/reference object	`raw_boxscoretraditionalv3`	Endpoint-shaped payload contract closest to source naming
`stg_`	DuckDB staging target	`stg_boxscoretraditionalv3__player_stats`	Cleaned, typed, warehouse-ready input layer
`dim_`	Dimension	`dim_player`, `dim_team`, `dim_game`	Context and controlled vocabulary around the facts
`fact_`	Fact	`fact_play_by_play`, `fact_team_game`	Measured events, game lines, dashboards, or specialty outputs
`bridge_`	Bridge	`bridge_game_official`, `bridge_play_player`	Many-to-many helper that prevents repeated columns or duplicate joins
`agg_`	Aggregate	`agg_player_season`, `agg_team_season`	Reusable rollups for common analytical questions
`analytics_`	Analytics view/table	`analytics_player_game_complete`	Pre-joined shortcut for everyday analyst workflows

Read a column in this order

Find the grain anchor: player_id, team_id, game_id, season_year, or lineup/group keys.
Check the row-type columns: split_type, detail_type, summary_type, tracking_type, or similar.
Only then read the measures: percentages, totals, ratings, ranks, and rolling windows.

High-signal naming patterns

Pattern	Usually means...	Example
`<entity>_id`	business key	`player_id`, `team_id`, `game_id`
`<entity>_sk`	history-aware surrogate key	`player_sk`
`_pct`	decimal percentage	`fg_pct`, `ts_pct`
`total_` / `avg_` / `_rank`	totals, averages, and rankings	`total_pts`, `avg_ast`, `reb_rank`
`is_`	boolean flag	`is_current`, `is_weekend`
`split_type` / `detail_type` / `summary_type` / `tracking_type`	row meaning discriminator	tells you what kind of row you are reading

Command-owned inventories

Generated pages

Refresh the generated tier pages in this section with:

uv run nbadb docs-autogen --docs-root docs/content/docs

That command regenerates:

data-dictionary/raw.mdx
data-dictionary/staging.mdx
data-dictionary/star.mdx

It also refreshes the matching schema reference pages plus ER and lineage artifacts. Keep this index, the glossary, and the field reference hand-authored.

Schema Reference — curated family guides for dimensions, facts, bridges, aggregations, and analytics views
Dimensions — identity, history, calendar, and lookup context tables
Facts & Bridges — measurement tables and many-to-many connectors
Analytics Views — 12 pre-joined convenience surfaces
Relationships — join playbook with SQL examples
Lineage — transform dependency chain and data flow

Data Dictionary

Think of this section as the scorer's table for the warehouse: the place you go when a stat abbreviation, column suffix, or layer name looks familiar but not familiar enough.

Best first read

Curated

start with the glossary or field reference before diving into generated inventories

Command-owned pages

raw, staging, and star field inventories are generated from schema metadata

Core question

Meaning + owner

what does this field mean, and which tier owns it?

Start at the scorer's table

Choose the decoder

Meaning first

Glossary

Start with Glossary when the problem is a metric, acronym, formula, or box-score shorthand such as TS%, PIE, or PPP.

Pattern first

Field Reference

Start with Field Reference when the question is really about naming habits: keys, suffixes, home/visitor labels, rating names, or row discriminators.

Contract first

Tier inventories

Start with Raw, Staging, or Star when you need to verify whether a field exists on a specific tier right now.

Where it came from

Ownership trail

Jump to Lineage or Schema Reference when the real blocker is not meaning but ownership, join path, or transform lineage.

If you only remember one lookup order, use this one.

Source preview

flowchart LR
    Term["Stat term or acronym"] --> Glossary["Glossary"]
    Name["Column name or suffix"] --> FieldRef["Field Reference"]
    Exists["Need to know whether a field exists"] --> Tier["Raw / Staging / Star inventory"]
    Ownership["Need to know where the field came from"] --> Lineage["Lineage / Schema Reference"]
    style Glossary fill:#e8f5e9
    style FieldRef fill:#fff3e0
    style Tier fill:#e1f5fe
    style Lineage fill:#fce4ec

Fastest lookup by what you have in hand

If you have...	Start here	Why this is the fastest lane
A stat term, acronym, or formula	Glossary	It decodes basketball analytics language before you worry about table ownership
A column name that looks familiar but unclear	Field Reference	It explains how keys, suffixes, split labels, and row types usually behave
A raw endpoint-shaped field	Raw	That generated page reflects the source-near inventory
A normalized staging column	Staging	That generated page reflects the warehouse-ready staging inventory
A public warehouse field	Star	That generated page reflects the analytics-facing contract surface
No clear starting point	Glossary, then Field Reference, then the matching generated tier page	Meaning first, pattern second, inventory last

Curated vs generated boundary

Surface	Owns	Best for	Maintenance path
This index, Glossary, and Field Reference	Interpretation	Meaning, naming habits, common traps, and route-finding	Hand-authored
Raw, Staging, and Star	Inventory	Exact field presence for each schema-backed layer	Regenerate, do not hand-edit

Read the layer labels

Layer and prefix guide

Prefix	Layer	Example	What it usually means
`raw_`	Raw schema/reference object	`raw_boxscoretraditionalv3`	Endpoint-shaped payload contract closest to source naming
`stg_`	DuckDB staging target	`stg_boxscoretraditionalv3__player_stats`	Cleaned, typed, warehouse-ready input layer
`dim_`	Dimension	`dim_player`, `dim_team`, `dim_game`	Context and controlled vocabulary around the facts
`fact_`	Fact	`fact_play_by_play`, `fact_team_game`	Measured events, game lines, dashboards, or specialty outputs
`bridge_`	Bridge	`bridge_game_official`, `bridge_play_player`	Many-to-many helper that prevents repeated columns or duplicate joins
`agg_`	Aggregate	`agg_player_season`, `agg_team_season`	Reusable rollups for common analytical questions
`analytics_`	Analytics view/table	`analytics_player_game_complete`	Pre-joined shortcut for everyday analyst workflows

Read a column in this order

Find the grain anchor: player_id, team_id, game_id, season_year, or lineup/group keys.
Check the row-type columns: split_type, detail_type, summary_type, tracking_type, or similar.
Only then read the measures: percentages, totals, ratings, ranks, and rolling windows.

High-signal naming patterns

Pattern	Usually means...	Example
`<entity>_id`	business key	`player_id`, `team_id`, `game_id`
`<entity>_sk`	history-aware surrogate key	`player_sk`
`_pct`	decimal percentage	`fg_pct`, `ts_pct`
`total_` / `avg_` / `_rank`	totals, averages, and rankings	`total_pts`, `avg_ast`, `reb_rank`
`is_`	boolean flag	`is_current`, `is_weekend`
`split_type` / `detail_type` / `summary_type` / `tracking_type`	row meaning discriminator	tells you what kind of row you are reading

Command-owned inventories

Generated pages

Refresh the generated tier pages in this section with:

uv run nbadb docs-autogen --docs-root docs/content/docs

That command regenerates:

data-dictionary/raw.mdx
data-dictionary/staging.mdx
data-dictionary/star.mdx

It also refreshes the matching schema reference pages plus ER and lineage artifacts. Keep this index, the glossary, and the field reference hand-authored.

Schema Reference — curated family guides for dimensions, facts, bridges, aggregations, and analytics views
Dimensions — identity, history, calendar, and lookup context tables
Facts & Bridges — measurement tables and many-to-many connectors
Analytics Views — 12 pre-joined convenience surfaces
Relationships — join playbook with SQL examples
Lineage — transform dependency chain and data flow

Data Dictionary

Data Dictionary

Choose the decoder

Glossary

Field Reference

Tier inventories

Ownership trail

Fastest lookup by what you have in hand

Curated vs generated boundary

Layer and prefix guide

Read a column in this order

High-signal naming patterns

Generated pages

Stay in the same possession

On this page

Data Dictionary

Data Dictionary

Choose the decoder

Glossary

Field Reference

Tier inventories

Ownership trail

Fastest lookup by what you have in hand

Curated vs generated boundary

Layer and prefix guide

Read a column in this order

High-signal naming patterns

Generated pages

Stay in the same possession

On this page

Data Dictionary

Glossary

Field Reference

Tier inventories

Ownership trail

Stay in the same possession

Glossary

Field Reference

Star Dictionary

On this page

Data Dictionary

Glossary

Field Reference

Tier inventories

Ownership trail

Stay in the same possession

Glossary

Field Reference

Star Dictionary

On this page