Reference
Advanced Mapping Patterns
This guide covers advanced mapping techniques for production data integration scenarios. All patterns are fully supported and tested.
Per-Attribute Ingestion Strategy Override
Individual attributes can override the table-level ingestion_strategy. This is the most common advanced pattern — it enables efficient INCREMENTAL processing for most attributes while giving specific attributes access to the full event history.
The INCREMENTAL + FULL_LOG Hybrid
Set the table to INCREMENTAL for performance, then override specific attributes to FULL_LOG when they need full history:
tables:
- connection: dev
table: stage.member_events
primary_keys: [member_id]
ingestion_strategy: INCREMENTAL
batch_expression: ingest_ts
entity_effective_timestamp_expression: event_ts
attributes:
# Inherits INCREMENTAL — only reads new events since last watermark
- id: member_id_attr
transformation_expression: member_id
# Override to FULL_LOG — reads ALL events every run
# Needed because the window function requires full history
- id: member_status_attr
ingestion_strategy: FULL_LOG
transformation_expression: >
MAX(CASE WHEN state = 'ACTIVE' THEN 1 ELSE 0 END)
OVER (PARTITION BY member_id ORDER BY event_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
attribute_effective_timestamp_expression: event_ts
How it works:
- INCREMENTAL attributes use the batch filter (
ingest_ts > watermark AND ingest_ts <= now). Only new data is read. The watermark advances with each run. - FULL_LOG attributes clear the batch filter to
1=1. All source data is read every run. Truedelta detects which values actually changed and only inserts new rows.
Each attribute gets its own pipeline with a unique DATA_KEY, so they operate independently within the same target table.
When to Use Per-Attribute Override
| Scenario | Table Strategy | Attribute Override | Why |
|---|---|---|---|
| Running aggregate over all events | INCREMENTAL | FULL_LOG | Window function needs full history |
| Activation detection (has entity ever been X?) | INCREMENTAL | FULL_LOG | Must scan all events, not just new ones |
| Forward-fill with FIRST_VALUE IGNORE NULLS | INCREMENTAL | FULL_LOG | NULL gaps span beyond the batch window |
| Simple column value from latest event | INCREMENTAL | (inherit) | Only the latest value matters — INCREMENTAL is efficient |
Rules
- All strategy combinations are valid. There is no compiler validation on which strategies can be combined. FULL_LOG on an INCREMENTAL table is the most common pattern, but INCREMENTAL on a FULL table or any other combination also works.
attribute_effective_timestamp_expressionis recommended but not required for FULL_LOG overrides. If omitted, the table-levelentity_effective_timestamp_expressionis used.- Deploy and execute both respect the override. The per-attribute strategy determines the SQL template, batch filtering, proc_key, and data_key for that attribute's pipeline.
Activation Capture
Detect whether an entity has ever reached a specific state, even if it has since changed:
- id: has_ever_been_active_attr
ingestion_strategy: FULL_LOG
transformation_expression: >
MAX(CASE WHEN status = 'ACTIVE' THEN 1 ELSE 0 END)
OVER (PARTITION BY entity_id ORDER BY event_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
attribute_effective_timestamp_expression: event_ts
This produces 1 once the entity has ever been ACTIVE, and stays 1 even if the entity later becomes INACTIVE. The MAX window function acts as a monotonic latch.
High-Water Mark Accumulator
Track a value that only increases (e.g., highest tier reached):
- id: highest_tier_attr
ingestion_strategy: FULL_LOG
transformation_expression: >
MAX(tier_level)
OVER (PARTITION BY entity_id ORDER BY event_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
attribute_effective_timestamp_expression: event_ts
Forward-Fill with FIRST_VALUE
Fill NULL gaps by carrying forward the last known value:
- id: last_known_region_attr
ingestion_strategy: FULL_LOG
transformation_expression: >
FIRST_VALUE(region IGNORE NULLS)
OVER (PARTITION BY entity_id ORDER BY event_ts DESC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
attribute_effective_timestamp_expression: event_ts
Sequential Counter
Count occurrences using DENSE_RANK:
- id: event_sequence_attr
ingestion_strategy: FULL_LOG
transformation_expression: >
DENSE_RANK() OVER (PARTITION BY entity_id ORDER BY event_ts)
attribute_effective_timestamp_expression: event_ts
Per-Attribute WHERE Filtering
Different attributes can read from different subsets of the same source table:
attributes:
- id: preferred_language_attr
transformation_expression: pref_value
where: "pref_key = 'LANGUAGE'"
- id: notification_opt_in_attr
transformation_expression: pref_value
where: "pref_key = 'NOTIFICATIONS'"
Each attribute's where clause filters which source rows provide values for that attribute. Useful for EAV (Entity-Attribute-Value) tables where different rows represent different attributes.
Batch Expression
The batch_expression field controls how INCREMENTAL and TRANSACTIONAL strategies filter source data. It supports three levels of configuration:
Resolution Order
- Table-level
batch_expression— highest priority, per-table override - Workflow-level
workflow.advanced.batch_expression— default for all tables - No expression — defaults to
1=1(reads everything)
Simple Column
When batch_expression is a simple column name, the framework auto-generates the comparison filter:
batch_expression: ingest_ts
# Generates: ingest_ts > '${BATCH_START}' AND ingest_ts <= '${BATCH_END}'
Raw SQL
When batch_expression contains ${BATCH_START} or ${BATCH_END} placeholders, it's treated as raw SQL and used as-is:
batch_expression: "batch_id IN (SELECT id FROM batch_mgmt WHERE ts > '${BATCH_START}')"
Watermark Precision
Watermarks use full RFC3339 timestamp precision (e.g., 2026-04-21T14:28:55Z), enabling multiple runs per day. Each run reads only data ingested since the previous run's exact timestamp — no gaps, no overlap.
BigQuery DATE Columns
For BigQuery, the batch filter automatically handles both DATE and TIMESTAMP column types. The framework wraps comparisons in CAST(column AS TIMESTAMP) > TIMESTAMP(value) — you don't need to worry about column type compatibility in your batch_expression.
Multi-Source Entity Mapping
When the same entity's data comes from multiple source systems, use multiple mapping groups. Each group gets its own pipelines with unique DATA_KEY values, ensuring truedelta isolation. See How DATA_KEY Enables Multi-Source Pipelines for the full explanation, and Relationship Mappings for relationship-specific multi-source patterns.
entity_id: CUSTOMER
mapping_groups:
- name: crm_data
tables:
- connection: crm
table: crm.customers
primary_keys: [customer_id]
ingestion_strategy: FULL
attributes:
- id: customer_name_attr
transformation_expression: full_name
- name: billing_data
tables:
- connection: billing
table: billing.accounts
primary_keys: [account_id]
ingestion_strategy: INCREMENTAL
batch_expression: updated_at
attributes:
- id: customer_billing_status_attr
transformation_expression: status
Both groups write to the same customer_entity_desc table, but each group's truedelta operates independently (filtered by DATA_KEY). The semantic view merges both contributions transparently.