nbadbArena Data Lab
LineagePossession ChainColumn Lineage25 waypoints

Ball Movement

Column Lineage

Column-level lineage examples showing field transformations across pipeline stages

Column Lineage

This page traces individual columns through the pipeline -- from their NBA API source field through raw, staging, and star schema layers. Use it like replay review for one touch in the possession: the exact field that drifted, changed names, or started failing validation.

Start here when the bug is field-shaped: a wrong percentage, a renamed key, a surprising nullability change, or a foreign key that no longer lands where you expect.

Quick navigation

Entry surface

Trace identity fields

Start with Player identity lineage when the issue is a key, natural identifier, or rename across layers.

Entry surface

Check metric math

Use Shooting stats lineage or Advanced metrics lineage when a percentage or rating changed unexpectedly.

Entry surface

Follow context keys

Jump to Game context lineage or Team lineage when the breakage is about joins rather than metric math.

Under the hood

Inspect metadata sources

Go to Lineage metadata in code when you need to confirm how schema metadata and transformer dependencies encode the replay.

Scan modes

If the issue looks like…Start hereWhy
A renamed or drifting identifierPlayer identity lineageKeys usually reveal where naming changed between API, staging, and star layers
A wrong percentage or derived metricShooting stats lineageMetric examples show where values are passed through versus recomputed
A join or season-context mismatchGame context lineageShared keys like game_id and season_year explain most warehouse joins
A denormalization or dimension splitShot chart lineageThese examples show how raw fields become normalized dimensions and foreign keys
A code-generation questionLineage metadata in codeThe schema metadata and depends_on declarations power the generated lineage surface

How Column Lineage Works

Each column passes through up to four stages:

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["NBA API Field
(UPPER_CASE)"] --> B["Raw Schema
(UPPER_CASE)"]
    B --> C["Staging Schema
(snake_case)"]
    C --> D["Star Schema
(snake_case + metadata)"]

The source metadata on staging schemas and description + fk_ref metadata on star schemas encode this lineage.

FrameTypical changeWhat to watch for
API → RawUsually a straight pass-throughNullable or mixed-type payloads
Raw → StagingRenames to snake_case + validationContract tightening, parsed types, and nullability changes
Staging → StarModeling decisions and FK wiringSurrogate keys, dimension resolution, and derived fields
Star → Analytics/AggsConvenience joins or recomputationSemantic renames and metric rollups
Field-level replay

Player Identity Lineage

player_id

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["CommonPlayerInfo
PERSON_ID"] --> B["raw_player_info
PERSON_ID (int|None)"]
    B --> C["stg_player_info
person_id (int, not null, gt=0)"]
    C --> D["dim_player
player_id (int, not null)
+ player_sk (surrogate)"]
    D --> E["fact_box_score_player
player_id (FK: dim_player)"]
StageColumn NameTypeConstraints
API ResponsePERSON_IDvariesnone
RawPERSON_IDint | Nonenullable
Stagingperson_idintnot null, gt=0
Star (dim)player_idintnot null, gt=0, NK
Star (fact)player_idintnot null, FK: dim_player.player_id

Key transformation: Raw PERSON_ID is renamed to person_id in staging. In dim_player, it becomes the natural key alongside the generated player_sk surrogate key. SCD2 logic creates multiple rows per player when team/position/jersey changes.

player_name

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["CommonPlayerInfo
DISPLAY_FIRST_LAST"] --> B["raw_player_info
DISPLAY_FIRST_LAST"]
    B --> C["stg_player_info
display_first_last"]
    C --> D["dim_player
full_name"]
    D --> E["analytics_player_game_complete
player_name"]

Key transformation: Renamed at each stage. The current analytics_* outputs use player_name for user-friendly querying.

Metric math

Shooting Stats Lineage

fg_pct (Field Goal Percentage)

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["BoxScoreTraditionalV3
FG_PCT"] --> B["raw_box_score_traditional
FG_PCT (float|None)"]
    B --> C["stg_box_score_traditional
fg_pct (float, ge=0, le=1)"]
    C --> D["fact_player_game_traditional
fg_pct (float)"]
    D --> E["agg_player_season
fg_pct = SUM(fgm)/SUM(fga)"]
StageColumnNotes
APIFG_PCTPre-computed by NBA
RawFG_PCTPassed through
Stagingfg_pctValidated: 0.0 - 1.0
Factfg_pctPer-game value
Aggregatefg_pctRe-computed from season totals for accuracy

Key transformation: In agg_player_season, the season fg_pct is recomputed as SUM(fgm) / SUM(fga) rather than averaging per-game percentages, which would be statistically incorrect.

ts_pct (True Shooting Percentage)

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["BoxScoreAdvancedV3
TS_PCT"] --> B["raw_box_score_advanced
TS_PCT"]
    B --> C["stg_box_score_advanced
ts_pct (ge=0, le=1)"]
    C --> D["fact_player_game_advanced
ts_pct"]
    D --> E["agg_player_season
avg_ts_pct = AVG(ts_pct)"]

Key transformation: Season-level avg_ts_pct is computed as a simple average of per-game values in the current implementation. For more accurate results, recompute from totals: PTS / (2 * (FGA + 0.44 * FTA)).

Shared context keys

Game Context Lineage

game_id

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["Multiple Endpoints
GAME_ID"] --> B["raw_* tables
GAME_ID (str)"]
    B --> C["stg_* tables
game_id (str, not null)"]
    C --> D["dim_game
game_id (PK)"]
    D --> E["All fact tables
game_id (FK: dim_game)"]

The game_id is the most widely referenced key in the schema. It flows unchanged through all stages but gains FK constraints in the star layer.

season_year

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["ScheduleLeagueV2
SEASON_ID"] --> B["raw_schedule
SEASON_ID"]
    B --> C["stg_schedule
season_id"]
    C --> D["dim_game
season_year (int)"]
    D --> E["analytics_player_game_complete
season_year"]

Key transformation: The API returns SEASON_ID as a string like "22024" (type prefix + year). The staging layer parses this to extract the integer year. dim_game stores it as season_year (int).

Team Lineage

team_id (in game context)

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["BoxScoreTraditionalV3
TEAM_ID"] --> B["raw_box_score_traditional
TEAM_ID"]
    B --> C["stg_box_score_traditional
team_id"]
    C --> D["fact_player_game_traditional
team_id"]
    D --> E["analytics_player_game_complete
team_id"]

Key transformation: Player game rows carry team_id directly from the box score feed into fact_player_game_traditional, and analytics_player_game_complete preserves that team context alongside season and date metadata.

Location and dimension resolution

Shot Chart Lineage

loc_x, loc_y (Court Coordinates)

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["ShotChartDetail
LOC_X, LOC_Y"] --> B["raw_shot_chart
LOC_X, LOC_Y"]
    B --> C["stg_shot_chart
loc_x, loc_y (float)"]
    C --> D["fact_shot_chart
loc_x, loc_y"]

Coordinate system: LOC_X ranges from -250 to 250 (tenths of feet from basket center, left-right). LOC_Y ranges from -50 to 890 (tenths of feet from basket, towards half-court). The basket is at (0, 0). Current analytics rollups summarize shot zones in agg_shot_zones, but the raw coordinates remain available at fact_shot_chart grain.

shot_zone (Dimension Resolution)

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["ShotChartDetail
SHOT_ZONE_BASIC
SHOT_ZONE_AREA
SHOT_ZONE_RANGE"] --> B["stg_shot_chart
shot_zone_basic
shot_zone_area
shot_zone_range"]
    B --> C["dim_shot_zone
zone_id (PK)
zone_basic
zone_area

Key transformation: The three zone fields are denormalized in the API response. The transform extracts distinct combinations into dim_shot_zone and replaces the three text columns with a single zone_id FK in the fact table.

Ratings and rollups

Advanced Metrics Lineage

off_rating / def_rating / net_rating

Mermaid diagram

Showing Mermaid source preview until the SVG diagram hydrates.

Preparing board
Source preview
flowchart LR
    A["BoxScoreAdvancedV3
OFF_RATING / DEF_RATING / NET_RATING"] --> B["stg_box_score_advanced"]
    B --> C["fact_player_game_advanced
off_rating, def_rating, net_rating"]
    C --> D["agg_player_season
avg_off_rating, avg_def_rating, avg_net_rating"]
    C --> E["agg_team_pace_and_efficiency
avg_ortg, avg_drtg, avg_net_rtg"]

Key transformation: Per-game ratings flow directly to the fact table. Aggregate tables compute player-season averages in agg_player_season and team-level pace/efficiency summaries in agg_team_pace_and_efficiency.

How the replay gets encoded

Lineage Metadata in Code

Staging: source metadata

Staging schemas track the original API column name:

person_id: int = pa.Field(
    nullable=False,
    gt=0,
    metadata={"source": "PERSON_ID"},
)

Star: fk_ref metadata

Star schemas track foreign key relationships:

team_id: int | None = pa.Field(
    nullable=True,
    gt=0,
    metadata={
        "description": "Team identifier",
        "fk_ref": "dim_team.team_id",
    },
)

Transform: depends_on class variable

Transformers declare their upstream dependencies:

class AggPlayerSeasonTransformer(BaseTransformer):
    output_table = "agg_player_season"
    depends_on = [
        "fact_player_game_traditional",
        "fact_player_game_advanced",
        "fact_player_game_misc",
    ]

Together, these three metadata sources (source, fk_ref, depends_on) enable fully automated lineage generation via nbadb.docs_gen.lineage.

Next replay angle

Next steps from column lineage

Next stop

Zoom back out to table-level movement

Continue to Table Lineage when the issue has spread beyond one field and you need the full upstream/downstream dependency chain.

Next stop

Check naming and semantic intent

Use Field Reference or the Glossary when the lineage is clear but the meaning of the metric, suffix, or field family still is not.

Next stop

Verify the exact generated contract

Open Staging Reference or Star Reference when you need the current schema-backed type, nullability, and constraint details for the field you just traced.

Keep moving

Stay in the same possession

Keep the mental model warm with adjacent pages, section hubs, and search-friendly routes into the same topic cluster.

Section hub

On this page