Tutorial
1. Jump Start

Chapter 1: Jump Start

Goal: See how Daana handles complex real-world scenarios automatically.

What you'll build: Merge customer data from two sources (customers + address snapshots) with history tracking, then query any point in time.

Time: 10 minutes

Before starting: Complete the Quickstart Tutorial.


The Challenge

You have a Book Retailer with:

  • Customer data (stage.customers) - core customer information
  • Address snapshots (stage.customer_addresses) - daily snapshots of customer addresses

Your Source Data

Customers (stage.customers):

+-------------+--------------+------------------+---------------+----------------+---------------------+
| customer_id | first_name   | last_name        | email         | account_status | updated_at          |
+-------------+--------------+------------------+---------------+----------------+---------------------+
| 1           | John         | Smith            | john.smith@.. | Active         | 2024-11-09 17:44:59 |
| 2           | Emily        | Johnson          | emily.john@.. | Active         | 2024-11-04 18:24:50 |
| 3           | Michael      | Brown            | michael.br@.. | Active         | 2024-10-16 10:37:12 |
| ...         | ...          | ...              | ...           | ...            | ...                 |
| 1           | John         | Smith            | john.smith@.. | Inactive       | 2024-11-25 14:30:22 | ← Status changed
| 3           | Michael      | Brown            | michael.br@.. | Suspended      | 2024-11-27 16:45:18 | ← Status changed
+-------------+--------------+------------------+---------------+----------------+---------------------+

Customer Address Snapshots (stage.customer_addresses):

+-------------+---------------+------------------+--------------+-------+---------------+
| customer_id | address_type  | address_line1    | city         | state | snapshot_date |
+-------------+---------------+------------------+--------------+-------+---------------+
| 3           | Shipping      | 789 Pine Road    | Brooklyn     | NY    | 2024-10-17    |
| 2           | Shipping      | 456 Oak Avenue   | Boston       | MA    | 2024-11-05    |
| 1           | Shipping      | 123 Main Street  | New York     | NY    | 2024-11-10    |
| ...         | ...           | ...              | ...          | ...   | ...           |
| 2           | Shipping      | 789 Market Street| Philadelphia | PA    | 2024-11-29    | ← Emily moved!
+-------------+---------------+------------------+--------------+-------+---------------+

Sales Orders (stage.sales_orders):

+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| order_id | order_number    | customer_id | order_date          | order_status | total_amount | currency |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| 1        | ORD-2024-000001 | 1           | 2024-11-01 10:30:00 | Delivered    | 60.33        | USD      |
| 2        | ORD-2024-000002 | 2           | 2024-11-02 14:15:00 | Delivered    | 24.49        | USD      |
| 3        | ORD-2024-000003 | 3           | 2024-11-03 09:20:00 | Shipped      | 71.39        | USD      |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+

Business Requirements

  • One unified CUSTOMER entity combining customers and address data
  • Time-travel queries: "Where was Emily living on November 27th?"
  • Automatic duplicate handling (only real attribute changes create new rows)
  • History tracking for every attribute change

In dbt/SQL: This requires hours to days of work with complex merge logic, SCD2 tables, conflict resolution, and manual history tracking.

In Daana: A few declarations, and it's done.


Step 1: Define Your Model (2 min)

Note: If you used --example quickstart, model and mappings are already created and deployed. Skip to Step 4.

Edit model.yaml with the following content:

model:
  id: "BOOK_RETAILER_MODEL"
  name: "BookRetailerModel"
  definition: "Unified customer and order data from multiple sources"
  description: "Merges customer data with address snapshots for time-travel analytics"
 
  entities:
    - id: "CUSTOMER"
      name: "CUSTOMER"
      definition: "Customer who places orders"
      description: "Unified customer data from customers table and address snapshots"
      attributes:
        - id: "CUSTOMER_ID"
          name: "CUSTOMER_ID"
          definition: "Unique customer identifier"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_NAME"
          name: "CUSTOMER_NAME"
          definition: "Full name of the customer"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_STATUS"
          name: "CUSTOMER_STATUS"
          definition: "Account status (Active, Inactive, Suspended)"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_CITY"
          name: "CUSTOMER_CITY"
          definition: "City where customer is located"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_STATE"
          name: "CUSTOMER_STATE"
          definition: "State where customer is located"
          type: "STRING"
          effective_timestamp: true
 
    - id: "ORDER"
      name: "ORDER"
      definition: "Purchase order"
      description: "Orders placed by customers"
      attributes:
        - id: "ORDER_ID"
          name: "ORDER_ID"
          definition: "Unique order identifier"
          type: "STRING"
          effective_timestamp: true
 
        - id: "ORDER_STATUS"
          name: "ORDER_STATUS"
          definition: "Current order status"
          type: "STRING"
          effective_timestamp: true
 
        - id: "ORDER_PURCHASE_TS"
          name: "ORDER_PURCHASE_TS"
          definition: "When order was placed"
          type: "START_TIMESTAMP"
 
        - id: "ORDER_TOTAL_AMOUNT"
          name: "ORDER_TOTAL_AMOUNT"
          definition: "Total monetary value of the order"
          type: "NUMBER"
          effective_timestamp: true
 
  relationships:
    - name: "IS_PLACED_BY"
      definition: "Order is placed by a customer"
      source_entity_id: "ORDER"
      target_entity_id: "CUSTOMER"

Validate the model:

daana-cli check model model.yaml

Step 2: Create Mappings (3 min)

Note: If you used --example quickstart, mappings are already created. Skip to Step 4.

Why Mappings?

A model defines what you want (CUSTOMER entity with name, city, state). A mapping maps the model to source data, hence it defines where the data comes from.

Understanding Effective Timestamps

When you set effective_timestamp: true in your model, Daana tracks when each attribute changed. This enables:

  • Time-travel queries ("Where was Emily living in November?")
  • Only storing rows when attributes actually change
  • Full SCD2 history tracking

In your mapping, you specify the entity_effective_timestamp_expression to tell Daana which source column provides the timestamp for that table's data.

Generate Mapping Templates

Generate mapping templates for all entities:

daana-cli generate mapping

This creates mappings/customer-mapping.yaml and mappings/order-mapping.yaml.

Edit Customer Mapping

Edit mappings/customer-mapping.yaml to map both sources:

entity_id: CUSTOMER
 
mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false
 
    tables:
      # Source 1: Customer core data (includes change history)
      - connection: dev
        table: stage.customers
        primary_keys:
          - customer_id
        ingestion_strategy: FULL_LOG
        entity_effective_timestamp_expression: updated_at
        attributes:
          - id: CUSTOMER_ID
            transformation_expression: customer_id
          - id: CUSTOMER_NAME
            transformation_expression: CONCAT(first_name, ' ', last_name)
          - id: CUSTOMER_STATUS
            transformation_expression: account_status
 
      # Source 2: Address snapshots - automatically merged with customers above
      - connection: dev
        table: stage.customer_addresses
        primary_keys:
          - customer_id
        ingestion_strategy: FULL_LOG
        entity_effective_timestamp_expression: snapshot_date
        attributes:
          - id: CUSTOMER_CITY
            transformation_expression: city
          - id: CUSTOMER_STATE
            transformation_expression: state

Key points:

  • Two tables in one mapping group = automatic merge
  • FULL_LOG = source contains multiple rows per entity (complete history)
  • entity_effective_timestamp_expression tells Daana when each version was valid

Edit Order Mapping

Edit mappings/order-mapping.yaml:

entity_id: ORDER
 
mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false
 
    tables:
      - connection: dev
        table: stage.sales_orders
        primary_keys:
          - order_id
        ingestion_strategy: FULL
        entity_effective_timestamp_expression: order_date
        attributes:
          - id: ORDER_ID
            transformation_expression: order_id
          - id: ORDER_STATUS
            transformation_expression: order_status
          - id: ORDER_PURCHASE_TS
            transformation_expression: order_date
          - id: ORDER_TOTAL_AMOUNT
            transformation_expression: total_amount
 
    relationships:
      - id: ORDER_IS_PLACED_BY_CUSTOMER
        atomic_context: ORDER_IS_PLACED_BY_CUSTOMER
        source_table: stage.sales_orders
        target_transformation_expression: customer_id

Step 3: Deploy and Execute (2 min)

Deploy and execute:

# Deploy the workflow
daana-cli deploy
 
# Execute the transformation
daana-cli execute

Step 4: See What Daana Built (2 min)

Connect to the database:

docker exec -it daana-customerdb psql -U dev -d customerdb

Query Current State

-- Simple query: current customer data
SELECT
  customer_key,
  "CUSTOMER_NAME",
  "CUSTOMER_CITY",
  "CUSTOMER_STATE",
  "CUSTOMER_STATUS"
FROM daana_dw.view_customer
WHERE customer_key = '2';

Result:

 customer_key | CUSTOMER_NAME | CUSTOMER_CITY | CUSTOMER_STATE | CUSTOMER_STATUS
--------------+---------------+---------------+----------------+-----------------
 2            | Emily Johnson | Philadelphia  | PA             | Active

What happened automatically:

  • Customers and addresses merged into one entity
  • Duplicates handled (only real changes create new rows)
  • Both sources contribute attributes

Query Historical Data (Time-Travel)

-- Where was Emily living on November 27th (before she moved)?
SELECT
  customer_key,
  customer_name,
  customer_city,
  customer_state,
  eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
  AND eff_tmstp <= '2024-11-27'
ORDER BY eff_tmstp DESC
LIMIT 1;

Result:

 customer_key | customer_name | customer_city | customer_state |       eff_tmstp
--------------+---------------+---------------+----------------+------------------------
 2            | Emily Johnson | Boston        | MA             | 2024-11-05 00:00:00
-- Where is Emily living now (after she moved)?
SELECT
  customer_key,
  customer_name,
  customer_city,
  customer_state,
  eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
  AND eff_tmstp <= '2024-11-29'
ORDER BY eff_tmstp DESC
LIMIT 1;

Result:

 customer_key | customer_name | customer_city | customer_state |       eff_tmstp
--------------+---------------+---------------+----------------+------------------------
 2            | Emily Johnson | Philadelphia  | PA             | 2024-11-29 00:00:00

Daana natively handles the temporal awareness, keeping track of where Emily was living on any date - without writing complex SQL!

What happened automatically:

  • SCD2 history tracking (from effective_timestamp: true)
  • Snapshot series processed (from FULL_LOG)
  • Time-travel queries work without manual joins

Query with Relationship

Important: To query relationships, use the _with_rel views. These include related entity attributes automatically.

-- Orders with customer info (using pre-built relationship view)
SELECT
  order_key,
  "ORDER_ID",
  "ORDER_STATUS",
  "ORDER_TOTAL_AMOUNT",
  "CUSTOMER_NAME",
  "CUSTOMER_CITY"
FROM daana_dw.view_order_with_rel
LIMIT 10;

Result:

 order_key | ORDER_ID | ORDER_STATUS | ORDER_TOTAL_AMOUNT |   CUSTOMER_NAME   | CUSTOMER_CITY
-----------+----------+--------------+--------------------+-------------------+---------------
 1         | 1        | Delivered    |              60.33 | John Smith        | New York
 10        | 10       | Processing   |              65.77 | Maria Garcia      | Boston
 2         | 2        | Delivered    |              24.49 | Emily Johnson     | Philadelphia
 3         | 3        | Shipped      |              71.39 | Michael Brown     | Brooklyn
 4         | 4        | Processing   |              44.03 | Sarah Davis       | Cambridge
 5         | 5        | Delivered    |              19.52 | David Wilson      | Queens
 6         | 6        | Delivered    |              92.38 | Jennifer Martinez | New York
 7         | 7        | Shipped      |              57.07 | Robert Anderson   | San Francisco
 8         | 8        | Delivered    |              72.50 | Lisa Taylor       | San Francisco
 9         | 9        | Delivered    |              38.60 | James Thomas      | Los Angeles

What Just Happened?

The Complexity You Avoided

In SQL (dbt, Dataform etc.) building this would require:

  1. Merging two sources: Write complex FULL OUTER JOINs with conflict resolution logic
  2. Building SCD2 logic: Manually create valid_from/valid_to columns, handle gaps in snapshots
  3. Handling snapshots: Process series of snapshots, deduplicate, merge overlapping time periods
  4. Time-travel queries: Write complex WHERE clauses with valid_from/valid_to logic

Total effort: Weeks of work, multiple SQL files, extensive testing, ongoing maintenance

What Daana Does Automatically

In Daana, you simply:

  1. Declare your model (entities, attributes, relationships)
  2. Map both sources (add both tables to one mapping group)
  3. Set FULL_LOG (snapshot series handling)
  4. Set effective_timestamp: true (SCD2 automatic)

Total effort: Minutes, no SQL, handles everything automatically


Key Takeaways

  1. Multi-source merging: Add multiple tables to one mapping group = automatic merge
  2. Snapshot handling: FULL_LOG + effective_timestamp: true = automatic SCD2
  3. Time-travel queries: Query any point in time without manual history tables
  4. Duplicate handling: Only real attribute changes create new rows
  5. No SQL required: Declare what you want, Daana generates the complex logic

Next Steps

You've seen the magic. Now let's slow down and understand how the pieces actually fit together.

Want to practice? Try deleting my-tutorial/ and running daana-cli init my-tutorial again (without --example quickstart) to manually create the model and generate mappings yourself.

Next: Chapter 2 - The Building Blocks