DMDL Reference

Mapping

Mappings define how source data from your operational databases maps to your Daana data model.

Mapping Structure

A mapping file connects source tables to a single entity in your model. It has four main components:

  1. Entity target - Which entity this mapping populates
  2. Mapping groups - Container for table mappings (typically one group)
  3. Tables - Source tables with column-to-attribute mappings
  4. Relationships - How this entity connects to other entities (foreign keys)
entity_id: CUSTOMER

mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false

    tables:
      - connection: dev
        table: public.customers

        primary_keys:
          - customer_id

        ingestion_strategy: FULL

        # Default timestamp expression for all attributes with effective_timestamp: true
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP

        attributes:
          - id: customer_id
            transformation_expression: customer_id
            where: customer_id IS NOT NULL

          - id: customer_name
            transformation_expression: customer_name
            where: customer_name IS NOT NULL

Casing convention. Throughout the mapping examples, attribute ids use lower_snake_case so they line up with the lowercase columns in the generated view_<entity> views, while entity ids, relationship ids, and atomic-context values use UPPER_SNAKE_CASE.

Tip: Always use schema.table format (e.g., public.customers, stage.raw_data) to avoid ambiguity across database schemas.

Composite and Computed Primary Keys

primary_keys accepts two forms. The simple form is a list of one or more source column names; that is what the generator emits and what most mappings use:

primary_keys:
  - customer_id

When the source has no single column that uniquely identifies a row, you can use a SQL expression directly. The expression is evaluated against the source table on every read and its result becomes the natural-key string Daana hashes into the surrogate key:

# Composite key: order_line_id only unique within an order
primary_keys:
  - CONCAT(CAST(order_id AS TEXT), '-', CAST(order_line_id AS TEXT))

Tutorial Chapter 5 walks through this for ORDER_LINE (Step 5: Edit the Order Line Mapping).

Stability matters. Daana derives <entity>_key deterministically from the natural-key value the expression returns for each row. Two expressions that produce the same string for the same row (for example CONCAT(a, '-', b)) yield the same surrogate keys. Changing the expression so that it returns a different string (for example dropping a separator, or changing the order of columns) does change every surrogate key, so treat any change to the value shape the same way you would treat changing a primary-key column.

Ingestion Strategies

The ingestion_strategy field determines how Daana loads data from your source table. Choose based on your data characteristics:

StrategyBest ForDescription
FULLSource has current state only (1 row per entity), small enough to scan each runComplete snapshot of all data for each delivery
FULL_LOGSource already contains history (N rows per entity), small enough to scan each runComplete history of all changes, with multiple rows per instance
INCREMENTALSource is mutable (rows can be updated in place) and large enough that watermarked reads pay offOnly changed or new data since last load (requires watermark column)
TRANSACTIONALSource is an append-only event log (rows are written once and never updated)Append-only data where each instance is delivered exactly once

For narrative coverage of when to pick each strategy, walked through with a real source for every choice, see Tutorial Chapter 4: Loading Strategies. For per-attribute overrides (e.g., INCREMENTAL table with a FULL_LOG attribute), see Advanced Mapping Patterns.

Effective Timestamps

When your model has attributes with effective_timestamp: true, Daana tracks when values change over time. Instead of manually creating timestamp entries in your mapping, use the DRY expression fields.

entity_effective_timestamp_expression

Set a default timestamp expression for all attributes with effective_timestamp: true at the table level:

tables:
  - connection: dev
    table: public.customers
    primary_keys:
      - customer_id
    ingestion_strategy: FULL

    # Default for ALL attributes with effective_timestamp: true
    entity_effective_timestamp_expression: CURRENT_TIMESTAMP

    attributes:
      - id: customer_name
        transformation_expression: customer_name
      - id: customer_email
        transformation_expression: email

With this setup, any attribute that has effective_timestamp: true in the model will automatically use CURRENT_TIMESTAMP for its effective timestamp.

attribute_effective_timestamp_expression

Override the table default for specific attributes that need a different timestamp source:

tables:
  - connection: dev
    table: public.customers
    primary_keys:
      - customer_id
    ingestion_strategy: FULL

    # Default timestamp expression
    entity_effective_timestamp_expression: CURRENT_TIMESTAMP

    attributes:
      - id: customer_name
        transformation_expression: customer_name
        # Uses entity default: CURRENT_TIMESTAMP

      - id: customer_address
        transformation_expression: address
        # Override: use the actual update timestamp from the source
        attribute_effective_timestamp_expression: updated_at

When to use each

ScenarioUse
Same timestamp for all attributesentity_effective_timestamp_expression only
Most attributes share a timestamp, some differBoth (entity default + attribute overrides)
Each attribute has a unique timestampattribute_effective_timestamp_expression on each

Transformation Expressions

Transformation expressions define how to extract and transform data from source columns. These expressions use the SQL dialect of your source database, so syntax varies by platform (PostgreSQL, Snowflake, BigQuery, etc.).

Note: The examples below use PostgreSQL syntax. Consult your database documentation for platform-specific functions and operators.

Direct Column Mapping

- id: customer_name
  transformation_expression: customer_name

SQL Functions

- id: customer_name
  transformation_expression: UPPER(TRIM(customer_name))

- id: created_date
  transformation_expression: TO_TIMESTAMP(created_at, 'YYYY-MM-DD HH24:MI:SS')

- id: total_amount
  transformation_expression: CAST(amount AS DECIMAL(10,2))

Concatenation

- id: full_name
  transformation_expression: first_name || ' ' || last_name

Multiline Expressions

For complex SQL expressions like CASE statements, subqueries, or nested function calls, use YAML's multiline syntax. This makes your mappings much more readable.

Folded Style (>)

The folded style (>) joins lines with spaces, ideal for SQL that would normally be written on one line:

- id: discount_amount
  transformation_expression: >
    CASE
      WHEN total > 1000 THEN total * 0.15
      WHEN total > 500 THEN total * 0.10
      WHEN total > 100 THEN total * 0.05
      ELSE 0
    END

Literal Style (|)

The literal style (|) preserves line breaks exactly, useful when formatting matters:

- id: category_name
  transformation_expression: |
    (SELECT category_name
     FROM categories c
     WHERE c.id = products.category_id)

Multiline with Modifiers

You can add modifiers to control trailing newlines:

IndicatorDescription
> or |Default: single trailing newline
>- or |-Strip: no trailing newline
>+ or |+Keep: preserve all trailing newlines
# Strip trailing newline with >-
- id: status_code
  transformation_expression: >-
    COALESCE(
      status,
      'UNKNOWN'
    )

Practical Examples

Complex CASE expression:

- id: order_priority
  transformation_expression: >
    CASE
      WHEN is_expedited = true THEN 'HIGH'
      WHEN order_date < CURRENT_DATE - INTERVAL '30 days' THEN 'HIGH'
      WHEN total_amount > 10000 THEN 'MEDIUM'
      ELSE 'LOW'
    END

Nested function calls:

- id: formatted_address
  transformation_expression: >
    CONCAT_WS(', ',
      NULLIF(TRIM(street), ''),
      NULLIF(TRIM(city), ''),
      NULLIF(TRIM(state), ''),
      NULLIF(TRIM(postal_code), '')
    )

Correlated subquery:

- id: latest_order_date
  transformation_expression: |
    (SELECT MAX(order_date)
     FROM orders o
     WHERE o.customer_id = customers.customer_id)

Tip: Use folded style (>) for most SQL expressions. Use literal style (|) when you need exact line breaks preserved (rare).

Where Clauses

Filter conditions to select relevant data:

- id: email
  transformation_expression: email
  where: email IS NOT NULL AND email LIKE '%@%'

- id: active_status
  transformation_expression: is_active
  where: deleted_at IS NULL

Relationship Mappings

If your model defines relationships between entities, you map them by specifying how to extract the foreign key from your source data. Relationships are defined at the mapping group level (not inside tables).

Basic Relationship Mapping

Suppose your model has an ORDER_ITEM entity with a relationship to ORDER, defined with name: BELONGS_TO, source_entity_id: ORDER_ITEM, and target_entity_id: ORDER. Daana derives the relationship id ORDER_ITEM_BELONGS_TO_ORDER. In the order item mapping you reference that id and specify how to find the order ID:

entity_id: ORDER_ITEM

mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false

    tables:
      - connection: dev
        table: stage.order_items
        primary_keys:
          - order_id || ' ' || order_item_id
        ingestion_strategy: FULL

        attributes:
          - id: order_item_id
            transformation_expression: order_item_id
          - id: price
            transformation_expression: price

    # Relationships are defined at the mapping group level
    relationships:
      - id: ORDER_ITEM_BELONGS_TO_ORDER
        source_table: stage.order_items
        target_transformation_expression: order_id

Relationship Fields

FieldDescription
idThe relationship's id, derived by Daana from the model definition following the pattern {SOURCE_ENTITY_ID}_{RELATIONSHIP_NAME}_{TARGET_ENTITY_ID} (e.g., ORDER_IS_PLACED_BY_CUSTOMER)
source_tableThe table containing the foreign key data (use schema.table format)
target_transformation_expressionSQL expression that returns the target entity's identifier

Finding the correct id value: id follows a fixed pattern: a model-relationship with name: "IS_PLACED_BY" between source entity ORDER and target entity CUSTOMER gives the mapping-relationship id ORDER_IS_PLACED_BY_CUSTOMER. The easiest way to get the exact value is to run daana-cli generate mapping, which pre-fills it for each relationship.

Multi-Source Relationships

When a relationship's data comes from multiple sources, or from a different table than the entity's attributes, you have several patterns available.

Multi-Group Pattern

Two mapping groups can drive the same relationship independently. Each group generates its own pipeline with a unique DATA_KEY, and truedelta operates independently per group:

entity_id: ORDER

mapping_groups:
  - name: online_orders
    tables:
      - connection: dev
        table: online.orders
        primary_keys: [order_id]
        ingestion_strategy: INCREMENTAL
        batch_expression: created_at
        attributes:
          - id: order_total_attr
            transformation_expression: total_amount
    relationships:
      - id: ORDER_PLACED_BY_CUSTOMER
        source_table: online.orders
        target_transformation_expression: customer_id

  - name: retail_orders
    tables:
      - connection: dev
        table: retail.pos_transactions
        primary_keys: [transaction_id]
        ingestion_strategy: FULL
        attributes:
          - id: order_total_attr
            transformation_expression: amount
    relationships:
      - id: ORDER_PLACED_BY_CUSTOMER
        source_table: retail.pos_transactions
        target_transformation_expression: loyalty_customer_id

Both groups populate the same relationship (ORDER_PLACED_BY_CUSTOMER) from different source tables. The DATA_KEY isolation ensures each group's truedelta operates independently: online order relationships never interfere with retail order relationships.

Relationship-Only Source Tables

A table entry can have empty attributes (attributes: []) and exist solely to provide primary_keys, ingestion_strategy, and batch_expression for a relationship's source_table. This is useful when a relationship's source data comes from a different table than the entity's attributes.

entity_id: CONTENT

mapping_groups:
  - name: content_group
    tables:
      # Primary source: provides attributes
      - connection: dev
        table: catalog.content_items
        primary_keys: [content_id]
        ingestion_strategy: FULL
        attributes:
          - id: content_title_attr
            transformation_expression: title

      # Relationship-only source: no attributes, provides key derivation + batch config
      - connection: dev
        table: catalog.content_contributors
        primary_keys: [content_id]
        ingestion_strategy: INCREMENTAL
        batch_expression: updated_at
        attributes: []

    relationships:
      - id: CONTENT_HAS_CONTRIBUTOR
        source_table: catalog.content_contributors
        target_transformation_expression: contributor_id

The relationship pipeline uses catalog.content_contributors as its source, deriving entity keys from that table's primary_keys and applying batch filtering from its ingestion_strategy and batch_expression. No TYPE pipelines are generated for this table since it has no attributes.

Both daana-cli check and daana-cli deploy accept this pattern.

source_table Resolution

The source_table field must exactly match one of the tables in the same mapping group's tables list. If no match is found, the relationship is silently skipped (no error is produced). Double-check for typos if a relationship isn't appearing in your output.

The relationship inherits these properties from the matched source table:

  • primary_keys: used to derive the entity key
  • ingestion_strategy: determines the SQL template and batch filtering
  • batch_expression: controls watermark-based incremental reads

Multi-Table Mappings

Map a single entity from multiple source tables:

entity_id: CUSTOMER

mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false

    tables:
      - connection: dev
        table: archive.legacy_customers
        primary_keys:
          - old_customer_id
        ingestion_strategy: FULL
        entity_effective_timestamp_expression: migrated_at
        attributes:
          - id: customer_id
            transformation_expression: old_customer_id

      - connection: dev
        table: public.new_customers
        primary_keys:
          - customer_uuid
        ingestion_strategy: FULL
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP
        attributes:
          - id: customer_id
            transformation_expression: customer_uuid

Multiple Identifiers

By default, each entity has a single identifier (primary key). The allow_multiple_identifiers setting is an advanced feature for rare scenarios involving multi-source data integration.

When Multiple Identifiers Are Needed

This feature is designed for multi-source data integration scenarios where the same real-world entity exists in different source systems with different identifiers, and you need to preserve both:

ScenarioExample
Multi-source integrationSame customer exists as CRM_ID: 12345 in Salesforce and LEGACY_ID: A789 in your legacy ERP - both IDs must remain queryable
M&A data consolidationAfter acquiring a company, customer ACME_ID: 100 must coexist with your CORP_ID: 5000 for the same person
Composite business keysOrder line items identified by both order_id + line_number as a composite key from source

When NOT to Use Multiple Identifiers

For most use cases, a single surrogate key is preferred:

  • Customer lookup by email/phone - Use attributes with indexes, not multiple identifiers
  • Alternative lookup keys - Model these as regular attributes, not identifiers
  • Product SKU vs barcode - One should be the identifier, the other an attribute

Industry best practice: Data warehouses typically use surrogate keys rather than composite business keys. Multiple identifiers add complexity and are only justified when you must preserve source system identity relationships.

Enabling Multiple Identifiers

Set allow_multiple_identifiers: true in your mapping group:

entity_id: CUSTOMER

mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: true  # Enable multiple identifiers

    tables:
      # Source 1: CRM system. Single identifier (customer_id).
      - connection: crm_db
        table: crm.customers
        primary_keys:
          - customer_id
        ingestion_strategy: FULL
        attributes:
          - id: customer_id
            transformation_expression: customer_id

      # Source 2: Legacy ERP. Carries its own identifier plus the CRM
      # customer_id as a cross-reference (the overlapping identifier that
      # binds the two sources together).
      - connection: legacy_db
        table: erp.customers
        primary_keys:
          - legacy_id
          - customer_id
        ingestion_strategy: FULL
        attributes:
          - id: customer_id
            transformation_expression: customer_id
          - id: legacy_id
            transformation_expression: legacy_id

The overlap is intentional: every legacy row must carry the customer_id value of the same customer in the CRM, so that resolution at INTEGRATION time matches the existing IDFR row and reuses the CRM-assigned surrogate. The CRM source does not need to know about legacy_id. The overlap on customer_id is sufficient for the legacy system's identifiers to attach to the existing entity.

How resolution works

Daana supports up to 5 identifier types per entity, stored in slots IDFR_01 through IDFR_05. When allow_multiple_identifiers: true, each source row carries one identifier value per slot (drawn from the mapping group's primary_keys), and the pipeline resolves the row to an entity by co-occurrence:

  1. READER populates the IDFR slots with the identifier values from the source row.
  2. INTEGRATION looks each slot value up against the existing <entity>_idfr table. The lookup is a LEFT JOIN across all five slots; any match returns the entity's surrogate key.
  3. TRUEDELTA classifies the row:
    • If at least one slot value matches an existing identifier, Daana reuses that surrogate key for all identifiers on the row. The row carries an existing entity.
    • If no slot value matches anything, Daana generates a fresh UUID. The row carries a new entity.
  4. INSERT writes only the new identifier values to the <entity>_idfr table. Existing identifiers are not re-inserted.

This is the mechanism that binds different source systems together. Every time a source row carries at least one identifier already known to the system, the other identifiers it carries also become attached to the same entity.

Example. A CRM source emits a row for one customer with customer_id = C-12345. Daana finds no match in IDFR and assigns a new surrogate UUID-AAA. The IDFR table now has one row:

customer_idfr | customer_key | row_st
C-12345       | UUID-AAA     | Y

A legacy ERP source later emits a row for the same customer carrying legacy_id = L-A789 and customer_id = C-12345 together. Daana's lookup finds C-12345 → UUID-AAA, classifies the row as an existing entity, and inserts only the new identifier:

customer_idfr | customer_key | row_st
C-12345       | UUID-AAA     | Y
L-A789        | UUID-AAA     | Y

Both identifiers resolve to the same surrogate. Downstream queries on customer_key = UUID-AAA return both source-side identifiers as different facets of one entity.

Two invariants of resolution worth stating explicitly:

  • Resolution matches values, not column names. The IDFR table stores identifier values in a single column (<entity>_idfr); the source column the value originated from is not preserved. If the legacy source's cross-reference column were named crm_ref instead of customer_id, resolution would still work as long as the value C-12345 appears in some row of the legacy source. What binds sources together is the overlap of values, not the overlap of column names.
  • The order of primary_keys in the mapping does not affect resolution. The pipeline checks every slot against the IDFR table; any match returns the existing surrogate. Writing primary_keys: [legacy_id, customer_id] versus primary_keys: [customer_id, legacy_id] produces identical behavior. Slot indices are structural (they keep up to 5 identifier values on one source row distinct inside the pipeline); they are not priority ranks.

Identifier transformations

The primary_keys field accepts SQL expressions, not just column names (Composite and Computed Primary Keys covers the syntax). For multi-identifier setups, this matters: each source's primary_keys is what lands in the IDFR table, and a normalization expression lets two sources that store the same logical identifier in different shapes still produce matching values.

For example, if the CRM stores customer IDs as plain strings (12345) but the legacy source prefixes them (CRM-12345), the CRM mapping can normalize on read:

      - connection: crm_db
        primary_keys:
          - CONCAT('CRM-', CAST(customer_id AS TEXT))
        ...
      - connection: legacy_db
        primary_keys:
          - external_ref      # already prefixed 'CRM-...' in legacy
          - legacy_id
        ...

Both sources now write CRM-12345 to IDFR for the same customer, and resolution proceeds normally. The same pattern handles composite natural keys (one source carries tenant_id + customer_id, the other carries a pre-combined global_customer_id): a CONCAT expression collapses the composite into a single value that can overlap with the other source.

The overlap requirement still applies to the transformed values. If the expression on each side produces different strings for the same real-world entity, the result is identifier orphaning.

Identifier orphaning

The resolution above requires that overlapping sources share at least one identifier value per row. If they don't, Daana cannot tell that two source rows describe the same real-world entity and creates two separate surrogates.

Concrete failure mode. Assume the legacy ERP source arrives without the customer_id cross-reference (carrying only legacy_id = L-A789):

customer_idfr | customer_key
L-A789        | UUID-XXX

The CRM source then arrives with customer_id = C-12345 and no overlapping identifier. Daana finds no match and assigns a fresh surrogate:

customer_idfr | customer_key
L-A789        | UUID-XXX
C-12345       | UUID-YYY

Two surrogate keys now exist for one real-world customer. Downstream tables keyed by customer_key treat them as separate entities. The warehouse silently carries a duplicate.

Operational requirement. For multi-identifier resolution to work, every legacy row must carry the corresponding customer_id value (or whichever identifier is the cross-reference). In the example above, the working configuration has the legacy ERP source emit both legacy_id and customer_id on every row; the failing configuration has it emit only legacy_id.

If a source system emits rows with only its own identifiers and no overlap with any other source, multi-identifier resolution is not the right tool. Solve the integration upstream (data cleaning, ID-mapping table, or a master-data-management system) before the data reaches Daana.

Adding identifiers later

Within the 5-slot ceiling, identifier evolution is rowwise, not schema-wise:

  • New identifier types from existing sources flow as new rows in <entity>_idfr without any DDL change. If a CRM row that previously carried only customer_id starts also carrying customer_email (a new identifier type), the new value lands in an unused slot, becomes a new IDFR row, and resolves like any other identifier.
  • New source systems that emit identifier types overlapping with existing sources attach via the same mechanism. They contribute new rows that resolve via co-occurrence; existing rows are unchanged.
  • No schema migrations are required for either case. The IDFR table's column shape is fixed at install time.

Warning: This setting is irreversible. Once you enable multiple identifiers for an entity and materialize the mapping, you cannot go back to single-identifier mode. This is a permanent structural change to how the entity is stored.

Why irreversible. When allow_multiple_identifiers: true, the IDFR table becomes the authoritative source of entity identity: surrogate keys are generated by co-occurrence resolution in IDFR and propagated to every <entity>_desc, <entity>_focal, and relationship _x row. Disabling multi-identifier mode requires regenerating every surrogate from a different path (the f_<entity>() derivation used in single-identifier mode), which cascades to every downstream row that holds the old surrogate. There is no in-place migration. Recovery requires dropping the entity's four physical tables (_idfr, _focal, _desc, and each _x), re-running install and deploy, and re-executing the workflow to rebuild from source. Downstream consumers (Layer 3 views, BI dashboards) see new surrogate keys for every entity during the rebuild.

Key Considerations

  1. Exhaust alternatives first - Can you solve this with a single identifier and additional attributes?
  2. Plan ahead - Decide whether you need multiple identifiers before your first deployment
  3. Performance impact - Multiple identifiers increase storage and may affect query performance
  4. No rollback - If you enable this by mistake, you'll need to recreate the entity from scratch

Complete Example

entity_id: ORDER

mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false

    tables:
      - connection: dev
        table: public.orders

        primary_keys:
          - order_id

        ingestion_strategy: FULL

        # Default timestamp for most attributes
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP

        attributes:
          # Identifier
          - id: order_id
            transformation_expression: order_id
            where: order_id IS NOT NULL

          # Order date
          - id: order_date
            transformation_expression: order_date
            where: order_date IS NOT NULL

          # Total amount - uses source's updated_at instead of default
          - id: total_amount
            transformation_expression: CAST(total_amount AS DECIMAL(10,2))
            where: total_amount > 0
            attribute_effective_timestamp_expression: updated_at

          # Status - uses source's status_changed_at
          - id: status
            transformation_expression: UPPER(status)
            where: status IN ('pending', 'completed', 'cancelled')
            attribute_effective_timestamp_expression: status_changed_at

    # Relationships are defined at the mapping group level
    relationships:
      - id: ORDER_IS_PLACED_BY_CUSTOMER
        source_table: public.orders
        target_transformation_expression: customer_id

Quick Reference

Looking up a specific field? Here's the complete reference for all fields in mapping YAML files.

Mapping Root Fields

FieldTypeRequiredDescription
entity_idstring
Target entity ID from your model
mapping_groupslist of MappingGroup
List of mapping groups for this entity

✓ = required, ○ = optional

Mapping Group Fields

FieldTypeRequiredDescription
namestring
Name of the mapping group
allow_multiple_identifiersbool
Allow multiple identifier values per entity
This setting is irreversible - once enabled, you cannot go back to single identifier modeOnly enable if you need multiple primary keys per entity
tableslist of Table
List of source tables in this mapping group
relationshipslist of RelationshipMapping
Relationship mappings at the group level

✓ = required, ○ = optional

Table Fields

FieldTypeRequiredDescription
connectionstring
Connection profile name
tablestring
Source table name with schema
primary_keyslist of string
Primary key columns or SQL expression for the source table
A single entry containing SQL operators or function calls is evaluated as an expression against the source tableMultiple entries are treated as column names and combined into the natural key in the listed order
ingestion_strategyIngestionStrategy
How to load data from this table
wherestring
SQL WHERE clause to filter source data
Syntax is database-specific - consult your platform's SQL documentation
entity_effective_timestamp_expressionstring
Default timestamp expression for all attributes
Syntax is database-specific - consult your platform's SQL documentation
Used as default for attributes with effective_timestamp: true in the model
batch_expressionstring
Batch filtering expression (overrides workflow default)
Only applies to INCREMENTAL and TRANSACTIONAL ingestion strategiesIf not set, falls back to the workflow-level batch_expressionIf value contains ${BATCH_START} or ${BATCH_END}, it's used as raw SQL
attributeslist of AttributeMapping
List of attribute mappings

✓ = required, ○ = optional

Attribute Mapping Fields

FieldTypeRequiredDescription
idstring
Attribute ID from the model
transformation_expressionstring
SQL expression to extract/transform the value
Syntax is database-specific - consult your platform's SQL documentation
ingestion_strategyIngestionStrategy
Override ingestion strategy for this attribute
wherestring
SQL WHERE clause for this attribute
Syntax is database-specific - consult your platform's SQL documentation
attribute_effective_timestamp_expressionstring
Override timestamp expression for this attribute
Syntax is database-specific - consult your platform's SQL documentation
Only needed when this attribute has a different timestamp source than other attributes

✓ = required, ○ = optional

Relationship Mapping Fields

FieldTypeRequiredDescription
idstring
Relationship ID from the model
Do not add a separate `atomic_context:` line. The mapping schema only defines `id` for relationships; an extra `atomic_context:` line is silently ignored by the YAML parser, so it looks like it works but contributes nothing
source_tablestring
Source table name with schema
target_transformation_expressionstring
Expression to get the target entity identifier
Syntax is database-specific - consult your platform's SQL documentation

✓ = required, ○ = optional

Best Practices

  1. Generate templates first - Start with generated templates and customize as needed
  2. Always use schema.table format - Specify schema.table (e.g., public.customers, stage.raw_data) to avoid ambiguity
  3. Use descriptive expressions - Make transformation logic clear and maintainable
  4. Add where clauses - Filter out invalid or incomplete data at the source
  5. Use entity_effective_timestamp_expression - Set a DRY default at the table level
  6. Override when needed - Use attribute_effective_timestamp_expression for attributes with different timestamp sources
  7. Validate data types - Use CAST functions to ensure correct data types
  8. Choose the right ingestion strategy - Match the strategy to the shape of the source: FULL for small dimensions with one row per entity, FULL_LOG when the source already carries history, INCREMENTAL for large mutable tables (use a staging clock for batch_expression), TRANSACTIONAL for append-only event logs

Next Steps

Previous
Model