Tutorial
Chapter 1 — Jump Start
Goal: See how Daana handles complex real-world scenarios automatically.
What you'll build: Merge customer data from two sources (customers + address snapshots) with history tracking, then query any point in time.
Time: 10 minutes
Before starting: Complete the Quickstart Tutorial.
The Challenge
You have a Book Retailer with:
- Customer data (
stage.customers) - core customer information - Address snapshots (
stage.customer_addresses) - daily snapshots of customer addresses
Your Source Data
Customers (stage.customers):
+-------------+--------------+------------------+---------------+----------------+---------------------+
| customer_id | first_name | last_name | email | account_status | updated_at |
+-------------+--------------+------------------+---------------+----------------+---------------------+
| 1 | John | Smith | john.smith@.. | Active | 2024-11-09 17:44:59 |
| 2 | Emily | Johnson | emily.john@.. | Active | 2024-11-04 18:24:50 |
| 3 | Michael | Brown | michael.br@.. | Active | 2024-10-16 10:37:12 |
| ... | ... | ... | ... | ... | ... |
| 1 | John | Smith | john.smith@.. | Inactive | 2024-11-25 14:30:22 | ← Status changed
| 3 | Michael | Brown | michael.br@.. | Suspended | 2024-11-27 16:45:18 | ← Status changed
+-------------+--------------+------------------+---------------+----------------+---------------------+
Customer Address Snapshots (stage.customer_addresses):
+-------------+---------------+------------------+--------------+-------+---------------+
| customer_id | address_type | address_line1 | city | state | snapshot_date |
+-------------+---------------+------------------+--------------+-------+---------------+
| 3 | Shipping | 789 Pine Road | Brooklyn | NY | 2024-10-17 |
| 2 | Shipping | 456 Oak Avenue | Boston | MA | 2024-11-05 |
| 1 | Shipping | 123 Main Street | New York | NY | 2024-11-10 |
| ... | ... | ... | ... | ... | ... |
| 2 | Shipping | 789 Market Street| Philadelphia | PA | 2024-11-29 | ← Emily moved!
+-------------+---------------+------------------+--------------+-------+---------------+
Sales Orders (stage.sales_orders):
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| order_id | order_number | customer_id | order_date | order_status | total_amount | currency |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| 1 | ORD-2024-000001 | 1 | 2024-11-01 10:30:00 | Delivered | 60.33 | USD |
| 2 | ORD-2024-000002 | 2 | 2024-11-02 14:15:00 | Delivered | 24.49 | USD |
| 3 | ORD-2024-000003 | 3 | 2024-11-03 09:20:00 | Shipped | 71.39 | USD |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
Business Requirements
- One unified CUSTOMER entity combining customers and address data
- Time-travel queries: "Where was Emily living on November 27th?"
- Automatic duplicate handling (only real attribute changes create new rows)
- History tracking for every attribute change
In dbt/SQL: This requires hours to days of work with complex merge logic, SCD2 tables, conflict resolution, and manual history tracking.
In Daana: A few declarations, and it's done.
Step 1: Define Your Model (2 min)
A model defines what your business is about (customers, orders, products etc).
Note: To spare you from copy/pasting during this tutorial, we have already created this
model.yamlin yourmy-tutorialproject.
Edit model.yaml with the following content:
model:
id: BOOK_RETAILER_MODEL
name: BookRetailerModel
definition: Unified customer and order data from multiple sources
description: Merges customer data with address snapshots for time-travel analytics
entities:
- id: CUSTOMER
name: CUSTOMER
definition: Customer who places orders
description: Unified customer data from customers table and address snapshots
attributes:
- id: CUSTOMER_ID
name: CUSTOMER_ID
definition: Unique customer identifier
type: STRING
effective_timestamp: true # effective_timestamp tracks when an attribute changes — enabling time-travelling
- id: CUSTOMER_NAME
name: CUSTOMER_NAME
definition: Full name of the customer
type: STRING
effective_timestamp: true
- id: CUSTOMER_STATUS
name: CUSTOMER_STATUS
definition: Account status (Active, Inactive, Suspended)
type: STRING
effective_timestamp: true
- id: CUSTOMER_CITY
name: CUSTOMER_CITY
definition: City where customer is located
type: STRING
effective_timestamp: true
- id: CUSTOMER_STATE
name: CUSTOMER_STATE
definition: State where customer is located
type: STRING
effective_timestamp: true
- id: ORDER
name: ORDER
definition: Purchase order
description: Orders placed by customers
attributes:
- id: ORDER_ID
name: ORDER_ID
definition: Unique order identifier
type: STRING
effective_timestamp: true
- id: ORDER_STATUS
name: ORDER_STATUS
definition: Current order status
type: STRING
effective_timestamp: true
- id: ORDER_PURCHASE_TS
name: ORDER_PURCHASE_TS
definition: When order was placed
type: START_TIMESTAMP
- id: ORDER_TOTAL_AMOUNT
name: ORDER_TOTAL_AMOUNT
definition: Total monetary value of the order
type: NUMBER
effective_timestamp: true
relationships:
- name: IS_PLACED_BY
definition: Order is placed by a customer
source_entity_id: ORDER
target_entity_id: CUSTOMER
See Effective Timestamps for a deeper explanation of effective_timestamp.
Validate the model:
daana-cli check workflow
Step 2: Create Mappings (3 min)
Why Mappings?
A mapping maps the model to source data, hence it defines where the data comes from.
Understanding Effective Timestamps
When you set effective_timestamp: true in your model, Daana tracks when each attribute changed. This enables:
- Time-travel queries ("Where was Emily living in November?")
- Only storing rows when attributes actually change
- Full SCD2 history tracking
In your mapping, you specify the entity_effective_timestamp_expression to tell Daana which source column provides the timestamp for that table's data.
Generate Mapping Templates
Note: To spare you from copy/pasting during this tutorial, we have already created
customer-mapping.yamlandorder-mapping.yamlin yourmy-tutorialproject, so no need to generate mapping or edit those files.
Generate mapping templates for all entities:
daana-cli generate mapping
This creates mappings/customer-mapping.yaml and mappings/order-mapping.yaml.
Edit Customer Mapping
Edit mappings/customer-mapping.yaml to map both sources:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
tables:
# Source 1: Customer core data (includes change history)
- connection: dev
table: stage.customers
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG # ingestion_strategy define how data is integrated. FULL_LOG keeps full history, each row is a version.
entity_effective_timestamp_expression: updated_at # which column tells Daana when this version was valid
attributes:
- id: CUSTOMER_ID
transformation_expression: customer_id
- id: CUSTOMER_NAME
transformation_expression: CONCAT(first_name, ' ', last_name)
- id: CUSTOMER_STATUS
transformation_expression: account_status
# Source 2: Address snapshots - automatically merged with customers above
- connection: dev
table: stage.customer_addresses
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: snapshot_date
attributes:
- id: CUSTOMER_CITY
transformation_expression: city
- id: CUSTOMER_STATE
transformation_expression: state
Key points:
- Two tables in one mapping group = automatic merge
FULL_LOG= source contains multiple rows per entity (complete history)entity_effective_timestamp_expressiontells Daana when each version was valid
See Ingestion Strategies for a deeper explanation of ingestion_strategy.
Edit Order Mapping
Edit mappings/order-mapping.yaml:
entity_id: ORDER
mapping_groups:
- name: default_mapping_group
tables:
- connection: dev
table: stage.sales_orders
primary_keys:
- order_id
ingestion_strategy: FULL
entity_effective_timestamp_expression: order_date
attributes:
- id: ORDER_ID
transformation_expression: order_id
- id: ORDER_STATUS
transformation_expression: order_status
- id: ORDER_PURCHASE_TS
transformation_expression: order_date
- id: ORDER_TOTAL_AMOUNT
transformation_expression: total_amount
relationships:
- id: ORDER_IS_PLACED_BY_CUSTOMER
atomic_context: ORDER_IS_PLACED_BY_CUSTOMER
source_table: stage.sales_orders
target_transformation_expression: customer_id
Step 3: Deploy and Execute (2 min)
Deploy and execute:
# Deploy the workflow
daana-cli deploy
# Execute the transformation
daana-cli execute
Step 4: See What Daana Built (2 min)
Connect to the database:
docker exec -it daana-customerdb psql -U dev -d customerdb
For each entity you define, Daana generates a set of views in the daana_dw schema:
| View | Description |
|---|---|
view_<entity> | Current state — one row per entity, always up to date |
view_<entity>_hist | Full change history — one row per tracked version, timestamped with eff_tmstp |
view_<entity>_with_rel | Current state enriched with related entity attributes |
These views are the building blocks for your analytics layer. Use them directly for ad-hoc analytics, or as a foundation for dimensional mart modeling.
Query Current State
-- Simple query: current customer data
SELECT
customer_key,
"CUSTOMER_NAME",
"CUSTOMER_CITY",
"CUSTOMER_STATE",
"CUSTOMER_STATUS"
FROM daana_dw.view_customer
WHERE customer_key = '2';
Result:
customer_key | CUSTOMER_NAME | CUSTOMER_CITY | CUSTOMER_STATE | CUSTOMER_STATUS
--------------+---------------+---------------+----------------+-----------------
2 | Emily Johnson | Philadelphia | PA | Active
What happened automatically:
- Customers and addresses merged into one entity
- Duplicates handled (only real changes create new rows)
- Both sources contribute attributes
Query Historical Data (Time-Travel)
Note: The
_histviews contain the full change history — one row per tracked version, witheff_tmstprecording when each version became effective. They are the foundation for SCD2 dimensions and point-in-time queries.
-- Where was Emily living on November 27th (before she moved)?
SELECT
customer_key,
customer_name,
customer_city,
customer_state,
eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
AND eff_tmstp <= '2024-11-27'
ORDER BY eff_tmstp DESC
LIMIT 1;
Result:
customer_key | customer_name | customer_city | customer_state | eff_tmstp
--------------+---------------+---------------+----------------+------------------------
2 | Emily Johnson | Boston | MA | 2024-11-05 00:00:00
Daana natively handles the temporal awareness, keeping track of where Emily was living on any date - without writing complex SQL!
What happened automatically:
- SCD2 history tracking (from
effective_timestamp: true) - Snapshot series processed (from
FULL_LOG) - Time-travel queries work without manual joins
Query with Relationship
Important: To query relationships, use the
_with_relviews. These include related entity attributes automatically.
-- Orders with customer info (using pre-built relationship view)
SELECT
order_key,
"ORDER_ID",
"ORDER_STATUS",
"ORDER_TOTAL_AMOUNT",
"CUSTOMER_NAME",
"CUSTOMER_CITY"
FROM daana_dw.view_order_with_rel
LIMIT 10;
Result:
order_key | ORDER_ID | ORDER_STATUS | ORDER_TOTAL_AMOUNT | CUSTOMER_NAME | CUSTOMER_CITY
-----------+----------+--------------+--------------------+-------------------+---------------
1 | 1 | Delivered | 60.33 | John Smith | New York
10 | 10 | Processing | 65.77 | Maria Garcia | Boston
2 | 2 | Delivered | 24.49 | Emily Johnson | Philadelphia
3 | 3 | Shipped | 71.39 | Michael Brown | Brooklyn
4 | 4 | Processing | 44.03 | Sarah Davis | Cambridge
5 | 5 | Delivered | 19.52 | David Wilson | Queens
6 | 6 | Delivered | 92.38 | Jennifer Martinez | New York
7 | 7 | Shipped | 57.07 | Robert Anderson | San Francisco
8 | 8 | Delivered | 72.50 | Lisa Taylor | San Francisco
9 | 9 | Delivered | 38.60 | James Thomas | Los Angeles
What Just Happened?
The Complexity You Avoided
In SQL (dbt, Dataform etc.) building this would require:
- Merging two sources: Write complex FULL OUTER JOINs with conflict resolution logic
- Building SCD2 logic: Manually create valid_from/valid_to columns, handle gaps in snapshots
- Handling snapshots: Process series of snapshots, deduplicate, merge overlapping time periods
- Time-travel queries: Write complex WHERE clauses with valid_from/valid_to logic
Total effort: Weeks of work, multiple SQL files, extensive testing, ongoing maintenance
What Daana Does Automatically
In Daana, you simply:
- Declare your model (entities, attributes, relationships)
- Map both sources (add both tables to one mapping group)
- Set
FULL_LOG(snapshot series handling) - Set
effective_timestamp: true(SCD2 automatic)
Total effort: Minutes, no SQL, handles everything automatically
Key Takeaways
- Multi-source merging: Add multiple tables to one mapping group = automatic merge
- Snapshot handling:
FULL_LOG+effective_timestamp: true= automatic SCD2 - Time-travel queries: Query any point in time without manual history tables
- Duplicate handling: Only real attribute changes create new rows
- No SQL required: Declare what you want, Daana generates the complex logic
Next Steps
You've seen the results. Now let's slow down and understand how the pieces actually fit together.
Want to practice? Try deleting my-tutorial/ and running daana-cli init my-tutorial --example quickstart again, then clear model.yaml and the mappings/ files to build them from scratch following the steps above.