Tutorial

Chapter 5 - Mastering DMDL

Goal: Expand your model with new entities (PRODUCT, ORDER_LINE) to master Relationships and advanced Attribute Types.

Prerequisites: You must have completed Chapter 4: Loading Strategies.


The Scenario

Marketing comes to you with a request: "We need to know which Books are selling the best."

Currently, your model only has CUSTOMER and ORDER. We need to add PRODUCT and the link between them (ORDER_LINE).

Step 1: Define the New Entities

Edit model.yaml and add these two new entities after your existing CUSTOMER and ORDER:

    # ... existing entities (CUSTOMER, ORDER) ...

    - id: PRODUCT
      name: PRODUCT
      definition: Items available for sale
      description: Book catalog
      attributes:
        - id: product_id
          name: product_id
          definition: Unique product identifier
          type: STRING

        - id: product_title
          name: product_title
          definition: Title of the book
          type: STRING
          effective_timestamp: true

        - id: product_format
          name: product_format
          definition: Format of the book (Hardcover, Paperback)
          type: STRING
          effective_timestamp: true

    - id: ORDER_LINE
      name: ORDER_LINE
      definition: Individual item within an order
      description: Link table between Orders and Products
      attributes:
        - id: order_line_id
          name: order_line_id
          definition: Unique ID for the line item
          type: STRING

        - id: quantity
          name: quantity
          definition: Number of units purchased
          type: NUMBER
          effective_timestamp: true

        - id: line_amount
          name: line_amount
          definition: Total value of this line item
          type: NUMBER
          effective_timestamp: true

Step 2: Add the Relationships

Add these relationships to your existing relationships section:

  # ... existing relationship

    # NEW: Order line references a product
    - name: IS_FOR_PRODUCT
      definition: Line item references a product
      source_entity_id: ORDER_LINE
      target_entity_id: PRODUCT

    # NEW: Order contains order lines
    - name: BELONGS_TO
      definition: Order line belongs to an order
      source_entity_id: ORDER_LINE
      target_entity_id: ORDER

Key Insight: Relationships are defined from the "many" side to the "one" side. An ORDER_LINE belongs to exactly one ORDER, but an ORDER can have many ORDER_LINEs.

Step 3: Generate Mappings for the New Entities

Generate mapping templates for all entities that don't have mappings yet:

daana-cli generate mapping

This command scans your model and generates templates only for new entities. Existing mappings are safely skipped:

Generating mapping templates...

Model: model.yaml
Output: mappings/

✓ Generated 2 mapping template(s)
  Generated PRODUCT: product-mapping.yaml
  Generated ORDER_LINE: order_line-mapping.yaml

⊘ Skipped 2 existing mapping(s)
  Skipped CUSTOMER: customer-mapping.yaml already exists
  Skipped ORDER: order-mapping.yaml already exists

To update existing mappings, use: daana-cli merge mapping

Note: If you need to update existing mappings after model changes, use daana-cli merge mapping instead - it preserves your transformation expressions while adding new attributes.

Step 4: Edit the Product Mapping

Edit mappings/product-mapping.yaml to configure the source table:

entity_id: PRODUCT

mapping_groups:
  - name: default_mapping_group
    tables:
      - connection: dev
        table: stage.products
        primary_keys:
          - product_id
        ingestion_strategy: FULL
        entity_effective_timestamp_expression: updated_at
        attributes:
          - id: product_id
            transformation_expression: product_id
          - id: product_title
            transformation_expression: title
          - id: product_format
            transformation_expression: format

Run daana-cli check mapping to see progress. The check is supposed to fail partially at this point: you've only updated the product mapping; order_line-mapping.yaml still points at a non-existent table until you edit it in Step 5. Use the output to confirm which files pass (including product-mapping.yaml) and that the remaining error is only the order line schema issue shown below.

daana-cli check mapping
Checking 4 mapping(s) in directory: mappings

→ customer-mapping.yaml ✓
→ order-mapping.yaml ✓
→ order_line-mapping.yaml ✘
→ product-mapping.yaml ✓

✘ order_line-mapping.yaml:
    • [schema] Table 'public.order_lines' not found in database

Summary:
  Total: 4
  Valid: 3
  Invalid: 1

The product mapping is now valid. Fix the order line mapping next.

Step 5: Edit the Order Line Mapping

Edit mappings/order_line-mapping.yaml:

entity_id: ORDER_LINE

mapping_groups:
  - name: default_mapping_group
    tables:
      - connection: dev
        table: stage.sales_order_lines
        primary_keys:
          - CONCAT(CAST(order_id AS TEXT), '-', CAST(order_line_id AS TEXT))
        ingestion_strategy: FULL
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP
        attributes:
          - id: order_line_id
            transformation_expression: CONCAT(CAST(order_id AS TEXT), '-', CAST(order_line_id AS TEXT))
          - id: quantity
            transformation_expression: quantity
          - id: line_amount
            transformation_expression: (quantity * unit_price) - discount_amount

    relationships:
      # ORDER_LINE -> PRODUCT relationship
      - id: ORDER_LINE_IS_FOR_PRODUCT_PRODUCT
        source_table: stage.sales_order_lines
        target_transformation_expression: product_id

      # ORDER_LINE -> ORDER relationship
      - id: ORDER_LINE_BELONGS_TO_ORDER
        source_table: stage.sales_order_lines
        target_transformation_expression: order_id

Important:

  • The relationship id in mappings follows the pattern: SOURCE_ENTITY_RELATIONSHIP_NAME_TARGET_ENTITY.
  • Composite Primary Keys: In real-world scenarios, order_line_id typically restarts for each order (1, 2, 3...). To create a unique identifier, we use a CONCAT expression in primary_keys combining order_id and order_line_id.

See Relationship Mappings for the full field reference.

Validate all mappings are now correct:

daana-cli check mapping
Checking 4 mapping(s) in directory: mappings

→ customer-mapping.yaml ✓
→ order-mapping.yaml ✓
→ order_line-mapping.yaml ✓
→ product-mapping.yaml ✓

Summary:
  Total: 4
  Valid: 4
  Invalid: 0

All mappings are valid. Deploy them next.

Step 6: Update Workflow & Deploy

Add the new mappings to workflow.yaml:

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/product-mapping.yaml       # NEW
    - mappings/order_line-mapping.yaml    # NEW

Validate your workflow:

daana-cli check workflow
Checking workflow: workflow.yaml

Workflow: BOOK_RETAILER_WORKFLOW
  Workflow ID: 1424339281
  Model: ✓

Entity Coverage:
  - CUSTOMER: mapped ✓
  - ORDER: mapped ✓
  - PRODUCT: mapped ✓
  - ORDER_LINE: mapped ✓

✓ Workflow valid

All four entities are now mapped. Deploy and execute:

daana-cli deploy
daana-cli execute

Step 7: Query Your Complete Model

Open a psql shell against the local container so you can run the upcoming SQL queries:

docker exec -it daana-customerdb psql -U dev -d customerdb

Now you can answer Marketing's question:

-- Which books are selling the best?
SELECT
  product_title,
  product_format,
  SUM(quantity)::int                  AS total_units_sold,
  ROUND(SUM(line_amount)::numeric, 2) AS total_revenue
FROM daana_dw.view_product_with_rel
WHERE order_line_key IS NOT NULL
GROUP BY product_title, product_format
ORDER BY total_revenue DESC
LIMIT 10;

Daana stores NUMBER attributes as full-precision NUMERIC, so SUM(quantity) returns 3.00000000 and SUM(line_amount) returns 98.97000000. The ::int cast and ROUND(..., 2) are pure cosmetic formatting for the report.

Why view_product_with_rel and not view_order_line_with_rel?

Daana materializes related-entity columns on the target side of each relationship. ORDER_LINE is the source side of both its relationships (to PRODUCT and to ORDER), so view_order_line_with_rel carries only ORDER_LINE's own attributes, no joined product or order columns. The symmetrical target-side view (here view_product_with_rel) is where the order line attributes live alongside the product attributes.

Result:

          product_title           | product_format | total_units_sold | total_revenue
----------------------------------+----------------+------------------+---------------
 The Way of Kings                 | Hardcover      |                3 |         98.97
 Children of Blood and Bone       | Hardcover      |                4 |         68.96
 Words of Radiance                | Hardcover      |                2 |         58.98
 Becoming                         | Hardcover      |                2 |         56.98
 The Testaments                   | Hardcover      |                2 |         49.98
 It Ends with Us                  | Paperback      |                3 |         49.97
 American Gods                    | Hardcover      |                2 |         45.98
 Norwegian Wood                   | Paperback      |                3 |         39.17
 The Ocean at the End of the Lane | Hardcover      |                2 |         31.98
 Americanah                       | Paperback      |                2 |         28.48
(10 rows)

Order lines next to product info, no JOIN

The point of the target-side _with_rel views is to eliminate joins for the common case. view_product_with_rel already pairs each product with the order lines that reference it, so reading order line data alongside product data is a single-table query:

SELECT
  order_line_id,
  quantity::int                  AS quantity,
  product_title,
  product_format,
  ROUND(line_amount::numeric, 2) AS line_amount
FROM daana_dw.view_product_with_rel
WHERE order_line_id LIKE '1-%'
ORDER BY order_line_id;
 order_line_id | quantity |    product_title    | product_format | line_amount
---------------+----------+---------------------+----------------+-------------
 1-1           |        1 | The Handmaid's Tale | Hardcover      |       16.99
 1-2           |        1 | American Gods       | Hardcover      |       22.99
 1-3           |        1 | It Ends with Us     | Paperback      |       15.99
(3 rows)

The same applies to view_order_with_rel (carries order line attributes alongside ORDER) and view_customer_with_rel (carries ORDER attributes alongside CUSTOMER). The rule is simple: query the target-side view of the relationship you care about and the joined columns are already there.

Traversing the full chain

The _with_rel views compose. To answer "which customer ordered which book on which date", chain the three target-side views together: each one adds the next step. Two JOINs cover the full customer -> order -> order_line -> product chain.

SELECT
  c.customer_name,
  c.order_id,
  c.order_purchase_ts,
  p.product_title,
  p.quantity::int                  AS quantity,
  ROUND(p.line_amount::numeric, 2) AS line_amount
FROM daana_dw.view_customer_with_rel c
JOIN daana_dw.view_order_with_rel    o
  ON c.order_key = o.order_key
JOIN daana_dw.view_product_with_rel  p
  ON o.order_line_key = p.order_line_key
ORDER BY c.order_id::int, p.order_line_id
LIMIT 8;
 customer_name | order_id |  order_purchase_ts  |    product_title    | quantity | line_amount
---------------+----------+---------------------+---------------------+----------+-------------
 John Smith    | 1        | 2024-11-01 10:30:00 | The Handmaid's Tale |        1 |       16.99
 John Smith    | 1        | 2024-11-01 10:30:00 | American Gods       |        1 |       22.99
 John Smith    | 1        | 2024-11-01 10:30:00 | It Ends with Us     |        1 |       15.99
 Emily Johnson | 2        | 2024-11-02 14:15:00 | Sapiens             |        1 |       22.49
 Michael Brown | 3        | 2024-11-03 09:20:00 | Americanah          |        1 |       13.49
 Michael Brown | 3        | 2024-11-03 09:20:00 | Norwegian Wood      |        2 |       25.18
 Michael Brown | 3        | 2024-11-03 09:20:00 | Becoming            |        1 |       26.99
 Sarah Davis   | 4        | 2024-11-04 16:45:00 | The Way of Kings    |        1 |       32.99
(8 rows)

Each JOIN spans one relationship by its shared key (order_key, then order_line_key). No junction views, no manual key resolution. The _with_rel view of the entity at each end of the chain carries the entity's own attributes; the views in the middle (here view_order_with_rel) connect adjacent relationships and also expose useful columns of their own.

The per-relationship junction views (e.g. view_order_line_is_for_product_product) expose the related entity keys plus their activation timestamps and nothing else. Reach for them when you need only the keys, for example when joining Daana's output to an external table that does not live in the model.


Summary of Key DMDL Concepts

Attribute Types

  • STRING - Text values
  • NUMBER - Numeric values (integers, decimals)
  • START_TIMESTAMP - When an entity lifecycle begins
  • END_TIMESTAMP - When an entity lifecycle ends
  • UNIT - Measurement units (currency, weight units, etc.)

Ingestion Strategies

  • FULL - Complete table refresh
  • INCREMENTAL - Delta processing with watermark
  • FULL_LOG - Snapshot series for attribute-level history
  • TRANSACTIONAL - Append-only events

Relationship Direction

Relationships are defined from the child (many) to the parent (one):

  • ORDER_LINE → PRODUCT (many lines can reference one product)
  • ORDER_LINE → ORDER (many lines belong to one order)
  • ORDER → CUSTOMER (many orders belong to one customer)

Group Attributes

  • Bundle related attributes (Amount + Currency)
  • Enforce data quality at the model level
  • Semantic clarity in your data warehouse

Next: Chapter 6 - Going to Production