Reference

Advanced Mapping Patterns

This guide covers advanced mapping techniques for production data integration scenarios. All patterns are fully supported and tested.


Per-Attribute Ingestion Strategy Override

Individual attributes can override the table-level ingestion_strategy. This is the most common advanced pattern — it enables efficient INCREMENTAL processing for most attributes while giving specific attributes access to the full event history.

The INCREMENTAL + FULL_LOG Hybrid

Set the table to INCREMENTAL for performance, then override specific attributes to FULL_LOG when they need full history:

tables:
  - connection: dev
    table: stage.member_events
    primary_keys: [member_id]
    ingestion_strategy: INCREMENTAL
    batch_expression: ingest_ts
    entity_effective_timestamp_expression: event_ts
    attributes:
      # Inherits INCREMENTAL — only reads new events since last watermark
      - id: member_id_attr
        transformation_expression: member_id

      # Override to FULL_LOG — reads ALL events every run
      # Needed because the window function requires full history
      - id: member_status_attr
        ingestion_strategy: FULL_LOG
        transformation_expression: >
          MAX(CASE WHEN state = 'ACTIVE' THEN 1 ELSE 0 END)
          OVER (PARTITION BY member_id ORDER BY event_ts
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        attribute_effective_timestamp_expression: event_ts

How it works:

  • INCREMENTAL attributes use the batch filter (ingest_ts > watermark AND ingest_ts <= now). Only new data is read. The watermark advances with each run.
  • FULL_LOG attributes clear the batch filter to 1=1. All source data is read every run. Truedelta detects which values actually changed and only inserts new rows.

Each attribute gets its own pipeline with a unique DATA_KEY, so they operate independently within the same target table.

When to Use Per-Attribute Override

ScenarioTable StrategyAttribute OverrideWhy
Running aggregate over all eventsINCREMENTALFULL_LOGWindow function needs full history
Activation detection (has entity ever been X?)INCREMENTALFULL_LOGMust scan all events, not just new ones
Forward-fill with FIRST_VALUE IGNORE NULLSINCREMENTALFULL_LOGNULL gaps span beyond the batch window
Simple column value from latest eventINCREMENTAL(inherit)Only the latest value matters — INCREMENTAL is efficient

Rules

  • All strategy combinations are valid. There is no compiler validation on which strategies can be combined. FULL_LOG on an INCREMENTAL table is the most common pattern, but INCREMENTAL on a FULL table or any other combination also works.
  • attribute_effective_timestamp_expression is recommended but not required for FULL_LOG overrides. If omitted, the table-level entity_effective_timestamp_expression is used.
  • Deploy and execute both respect the override. The per-attribute strategy determines the SQL template, batch filtering, proc_key, and data_key for that attribute's pipeline.

Activation Capture

Detect whether an entity has ever reached a specific state, even if it has since changed:

- id: has_ever_been_active_attr
  ingestion_strategy: FULL_LOG
  transformation_expression: >
    MAX(CASE WHEN status = 'ACTIVE' THEN 1 ELSE 0 END)
    OVER (PARTITION BY entity_id ORDER BY event_ts
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  attribute_effective_timestamp_expression: event_ts

This produces 1 once the entity has ever been ACTIVE, and stays 1 even if the entity later becomes INACTIVE. The MAX window function acts as a monotonic latch.


High-Water Mark Accumulator

Track a value that only increases (e.g., highest tier reached):

- id: highest_tier_attr
  ingestion_strategy: FULL_LOG
  transformation_expression: >
    MAX(tier_level)
    OVER (PARTITION BY entity_id ORDER BY event_ts
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  attribute_effective_timestamp_expression: event_ts

Forward-Fill with FIRST_VALUE

Fill NULL gaps by carrying forward the last known value:

- id: last_known_region_attr
  ingestion_strategy: FULL_LOG
  transformation_expression: >
    FIRST_VALUE(region IGNORE NULLS)
    OVER (PARTITION BY entity_id ORDER BY event_ts DESC
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  attribute_effective_timestamp_expression: event_ts

Sequential Counter

Count occurrences using DENSE_RANK:

- id: event_sequence_attr
  ingestion_strategy: FULL_LOG
  transformation_expression: >
    DENSE_RANK() OVER (PARTITION BY entity_id ORDER BY event_ts)
  attribute_effective_timestamp_expression: event_ts

Per-Attribute WHERE Filtering

Different attributes can read from different subsets of the same source table:

attributes:
  - id: preferred_language_attr
    transformation_expression: pref_value
    where: "pref_key = 'LANGUAGE'"

  - id: notification_opt_in_attr
    transformation_expression: pref_value
    where: "pref_key = 'NOTIFICATIONS'"

Each attribute's where clause filters which source rows provide values for that attribute. Useful for EAV (Entity-Attribute-Value) tables where different rows represent different attributes.


Batch Expression

The batch_expression field controls how INCREMENTAL and TRANSACTIONAL strategies filter source data. It supports three levels of configuration:

Resolution Order

  1. Table-level batch_expression — highest priority, per-table override
  2. Workflow-level workflow.advanced.batch_expression — default for all tables
  3. No expression — defaults to 1=1 (reads everything)

Simple Column

When batch_expression is a simple column name, the framework auto-generates the comparison filter:

batch_expression: ingest_ts
# Generates: ingest_ts > '${BATCH_START}' AND ingest_ts <= '${BATCH_END}'

Raw SQL

When batch_expression contains ${BATCH_START} or ${BATCH_END} placeholders, it's treated as raw SQL and used as-is:

batch_expression: "batch_id IN (SELECT id FROM batch_mgmt WHERE ts > '${BATCH_START}')"

Watermark Precision

Watermarks use full RFC3339 timestamp precision (e.g., 2026-04-21T14:28:55Z), enabling multiple runs per day. Each run reads only data ingested since the previous run's exact timestamp — no gaps, no overlap.

BigQuery DATE Columns

For BigQuery, the batch filter automatically handles both DATE and TIMESTAMP column types. The framework wraps comparisons in CAST(column AS TIMESTAMP) > TIMESTAMP(value) — you don't need to worry about column type compatibility in your batch_expression.


Multi-Source Entity Mapping

When the same entity's data comes from multiple source systems, use multiple mapping groups. Each group gets its own pipelines with unique DATA_KEY values, ensuring truedelta isolation. See How DATA_KEY Enables Multi-Source Pipelines for the full explanation, and Relationship Mappings for relationship-specific multi-source patterns.

entity_id: CUSTOMER

mapping_groups:
  - name: crm_data
    tables:
      - connection: crm
        table: crm.customers
        primary_keys: [customer_id]
        ingestion_strategy: FULL
        attributes:
          - id: customer_name_attr
            transformation_expression: full_name

  - name: billing_data
    tables:
      - connection: billing
        table: billing.accounts
        primary_keys: [account_id]
        ingestion_strategy: INCREMENTAL
        batch_expression: updated_at
        attributes:
          - id: customer_billing_status_attr
            transformation_expression: status

Both groups write to the same customer_entity_desc table, but each group's truedelta operates independently (filtered by DATA_KEY). The semantic view merges both contributions transparently.

Previous
Glossary