Chapter 1: Jump Start
Goal: See how Daana handles complex real-world scenarios automatically.
What you'll build: Merge customer data from two sources (customers + address snapshots) with history tracking, then query any point in time.
Time: 10 minutes
Before starting: Complete the Quickstart Tutorial.
The Challenge
You have a Book Retailer with:
- Customer data (
stage.customers) - core customer information - Address snapshots (
stage.customer_addresses) - daily snapshots of customer addresses
Your Source Data
Customers (stage.customers):
+-------------+--------------+------------------+---------------+----------------+---------------------+
| customer_id | first_name | last_name | email | account_status | updated_at |
+-------------+--------------+------------------+---------------+----------------+---------------------+
| 1 | John | Smith | john.smith@.. | Active | 2024-11-09 17:44:59 |
| 2 | Emily | Johnson | emily.john@.. | Active | 2024-11-04 18:24:50 |
| 3 | Michael | Brown | michael.br@.. | Active | 2024-10-16 10:37:12 |
| ... | ... | ... | ... | ... | ... |
| 1 | John | Smith | john.smith@.. | Inactive | 2024-11-25 14:30:22 | ← Status changed
| 3 | Michael | Brown | michael.br@.. | Suspended | 2024-11-27 16:45:18 | ← Status changed
+-------------+--------------+------------------+---------------+----------------+---------------------+Customer Address Snapshots (stage.customer_addresses):
+-------------+---------------+------------------+--------------+-------+---------------+
| customer_id | address_type | address_line1 | city | state | snapshot_date |
+-------------+---------------+------------------+--------------+-------+---------------+
| 3 | Shipping | 789 Pine Road | Brooklyn | NY | 2024-10-17 |
| 2 | Shipping | 456 Oak Avenue | Boston | MA | 2024-11-05 |
| 1 | Shipping | 123 Main Street | New York | NY | 2024-11-10 |
| ... | ... | ... | ... | ... | ... |
| 2 | Shipping | 789 Market Street| Philadelphia | PA | 2024-11-29 | ← Emily moved!
+-------------+---------------+------------------+--------------+-------+---------------+Sales Orders (stage.sales_orders):
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| order_id | order_number | customer_id | order_date | order_status | total_amount | currency |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+
| 1 | ORD-2024-000001 | 1 | 2024-11-01 10:30:00 | Delivered | 60.33 | USD |
| 2 | ORD-2024-000002 | 2 | 2024-11-02 14:15:00 | Delivered | 24.49 | USD |
| 3 | ORD-2024-000003 | 3 | 2024-11-03 09:20:00 | Shipped | 71.39 | USD |
+----------+-----------------+-------------+---------------------+--------------+--------------+----------+Business Requirements
- One unified CUSTOMER entity combining customers and address data
- Time-travel queries: "Where was Emily living on November 27th?"
- Automatic duplicate handling (only real attribute changes create new rows)
- History tracking for every attribute change
In dbt/SQL: This requires hours to days of work with complex merge logic, SCD2 tables, conflict resolution, and manual history tracking.
In Daana: A few declarations, and it's done.
Step 1: Define Your Model (2 min)
Note: If you used
--example quickstart, model and mappings are already created and deployed. Skip to Step 4.
Edit model.yaml with the following content:
model:
id: "BOOK_RETAILER_MODEL"
name: "BookRetailerModel"
definition: "Unified customer and order data from multiple sources"
description: "Merges customer data with address snapshots for time-travel analytics"
entities:
- id: "CUSTOMER"
name: "CUSTOMER"
definition: "Customer who places orders"
description: "Unified customer data from customers table and address snapshots"
attributes:
- id: "CUSTOMER_ID"
name: "CUSTOMER_ID"
definition: "Unique customer identifier"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_NAME"
name: "CUSTOMER_NAME"
definition: "Full name of the customer"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_STATUS"
name: "CUSTOMER_STATUS"
definition: "Account status (Active, Inactive, Suspended)"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_CITY"
name: "CUSTOMER_CITY"
definition: "City where customer is located"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_STATE"
name: "CUSTOMER_STATE"
definition: "State where customer is located"
type: "STRING"
effective_timestamp: true
- id: "ORDER"
name: "ORDER"
definition: "Purchase order"
description: "Orders placed by customers"
attributes:
- id: "ORDER_ID"
name: "ORDER_ID"
definition: "Unique order identifier"
type: "STRING"
effective_timestamp: true
- id: "ORDER_STATUS"
name: "ORDER_STATUS"
definition: "Current order status"
type: "STRING"
effective_timestamp: true
- id: "ORDER_PURCHASE_TS"
name: "ORDER_PURCHASE_TS"
definition: "When order was placed"
type: "START_TIMESTAMP"
- id: "ORDER_TOTAL_AMOUNT"
name: "ORDER_TOTAL_AMOUNT"
definition: "Total monetary value of the order"
type: "NUMBER"
effective_timestamp: true
relationships:
- name: "IS_PLACED_BY"
definition: "Order is placed by a customer"
source_entity_id: "ORDER"
target_entity_id: "CUSTOMER"Validate the model:
daana-cli check model model.yamlStep 2: Create Mappings (3 min)
Note: If you used
--example quickstart, mappings are already created. Skip to Step 4.
Why Mappings?
A model defines what you want (CUSTOMER entity with name, city, state). A mapping maps the model to source data, hence it defines where the data comes from.
Understanding Effective Timestamps
When you set effective_timestamp: true in your model, Daana tracks when each attribute changed. This enables:
- Time-travel queries ("Where was Emily living in November?")
- Only storing rows when attributes actually change
- Full SCD2 history tracking
In your mapping, you specify the entity_effective_timestamp_expression to tell Daana which source column provides the timestamp for that table's data.
Generate Mapping Templates
Generate mapping templates for all entities:
daana-cli generate mappingThis creates mappings/customer-mapping.yaml and mappings/order-mapping.yaml.
Edit Customer Mapping
Edit mappings/customer-mapping.yaml to map both sources:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
# Source 1: Customer core data (includes change history)
- connection: dev
table: stage.customers
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: updated_at
attributes:
- id: CUSTOMER_ID
transformation_expression: customer_id
- id: CUSTOMER_NAME
transformation_expression: CONCAT(first_name, ' ', last_name)
- id: CUSTOMER_STATUS
transformation_expression: account_status
# Source 2: Address snapshots - automatically merged with customers above
- connection: dev
table: stage.customer_addresses
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: snapshot_date
attributes:
- id: CUSTOMER_CITY
transformation_expression: city
- id: CUSTOMER_STATE
transformation_expression: stateKey points:
- Two tables in one mapping group = automatic merge
FULL_LOG= source contains multiple rows per entity (complete history)entity_effective_timestamp_expressiontells Daana when each version was valid
Edit Order Mapping
Edit mappings/order-mapping.yaml:
entity_id: ORDER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
- connection: dev
table: stage.sales_orders
primary_keys:
- order_id
ingestion_strategy: FULL
entity_effective_timestamp_expression: order_date
attributes:
- id: ORDER_ID
transformation_expression: order_id
- id: ORDER_STATUS
transformation_expression: order_status
- id: ORDER_PURCHASE_TS
transformation_expression: order_date
- id: ORDER_TOTAL_AMOUNT
transformation_expression: total_amount
relationships:
- id: ORDER_IS_PLACED_BY_CUSTOMER
atomic_context: ORDER_IS_PLACED_BY_CUSTOMER
source_table: stage.sales_orders
target_transformation_expression: customer_idStep 3: Deploy and Execute (2 min)
Deploy and execute:
# Deploy the workflow
daana-cli deploy
# Execute the transformation
daana-cli executeStep 4: See What Daana Built (2 min)
Connect to the database:
docker exec -it daana-customerdb psql -U dev -d customerdbQuery Current State
-- Simple query: current customer data
SELECT
customer_key,
"CUSTOMER_NAME",
"CUSTOMER_CITY",
"CUSTOMER_STATE",
"CUSTOMER_STATUS"
FROM daana_dw.view_customer
WHERE customer_key = '2';Result:
customer_key | CUSTOMER_NAME | CUSTOMER_CITY | CUSTOMER_STATE | CUSTOMER_STATUS
--------------+---------------+---------------+----------------+-----------------
2 | Emily Johnson | Philadelphia | PA | ActiveWhat happened automatically:
- Customers and addresses merged into one entity
- Duplicates handled (only real changes create new rows)
- Both sources contribute attributes
Query Historical Data (Time-Travel)
-- Where was Emily living on November 27th (before she moved)?
SELECT
customer_key,
customer_name,
customer_city,
customer_state,
eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
AND eff_tmstp <= '2024-11-27'
ORDER BY eff_tmstp DESC
LIMIT 1;Result:
customer_key | customer_name | customer_city | customer_state | eff_tmstp
--------------+---------------+---------------+----------------+------------------------
2 | Emily Johnson | Boston | MA | 2024-11-05 00:00:00-- Where is Emily living now (after she moved)?
SELECT
customer_key,
customer_name,
customer_city,
customer_state,
eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
AND eff_tmstp <= '2024-11-29'
ORDER BY eff_tmstp DESC
LIMIT 1;Result:
customer_key | customer_name | customer_city | customer_state | eff_tmstp
--------------+---------------+---------------+----------------+------------------------
2 | Emily Johnson | Philadelphia | PA | 2024-11-29 00:00:00Daana natively handles the temporal awareness, keeping track of where Emily was living on any date - without writing complex SQL!
What happened automatically:
- SCD2 history tracking (from
effective_timestamp: true) - Snapshot series processed (from
FULL_LOG) - Time-travel queries work without manual joins
Query with Relationship
Important: To query relationships, use the
_with_relviews. These include related entity attributes automatically.
-- Orders with customer info (using pre-built relationship view)
SELECT
order_key,
"ORDER_ID",
"ORDER_STATUS",
"ORDER_TOTAL_AMOUNT",
"CUSTOMER_NAME",
"CUSTOMER_CITY"
FROM daana_dw.view_order_with_rel
LIMIT 10;Result:
order_key | ORDER_ID | ORDER_STATUS | ORDER_TOTAL_AMOUNT | CUSTOMER_NAME | CUSTOMER_CITY
-----------+----------+--------------+--------------------+-------------------+---------------
1 | 1 | Delivered | 60.33 | John Smith | New York
10 | 10 | Processing | 65.77 | Maria Garcia | Boston
2 | 2 | Delivered | 24.49 | Emily Johnson | Philadelphia
3 | 3 | Shipped | 71.39 | Michael Brown | Brooklyn
4 | 4 | Processing | 44.03 | Sarah Davis | Cambridge
5 | 5 | Delivered | 19.52 | David Wilson | Queens
6 | 6 | Delivered | 92.38 | Jennifer Martinez | New York
7 | 7 | Shipped | 57.07 | Robert Anderson | San Francisco
8 | 8 | Delivered | 72.50 | Lisa Taylor | San Francisco
9 | 9 | Delivered | 38.60 | James Thomas | Los AngelesWhat Just Happened?
The Complexity You Avoided
In SQL (dbt, Dataform etc.) building this would require:
- Merging two sources: Write complex FULL OUTER JOINs with conflict resolution logic
- Building SCD2 logic: Manually create valid_from/valid_to columns, handle gaps in snapshots
- Handling snapshots: Process series of snapshots, deduplicate, merge overlapping time periods
- Time-travel queries: Write complex WHERE clauses with valid_from/valid_to logic
Total effort: Weeks of work, multiple SQL files, extensive testing, ongoing maintenance
What Daana Does Automatically
In Daana, you simply:
- Declare your model (entities, attributes, relationships)
- Map both sources (add both tables to one mapping group)
- Set
FULL_LOG(snapshot series handling) - Set
effective_timestamp: true(SCD2 automatic)
Total effort: Minutes, no SQL, handles everything automatically
Key Takeaways
- Multi-source merging: Add multiple tables to one mapping group = automatic merge
- Snapshot handling:
FULL_LOG+effective_timestamp: true= automatic SCD2 - Time-travel queries: Query any point in time without manual history tables
- Duplicate handling: Only real attribute changes create new rows
- No SQL required: Declare what you want, Daana generates the complex logic
Next Steps
You've seen the magic. Now let's slow down and understand how the pieces actually fit together.
Want to practice? Try deleting my-tutorial/ and running daana-cli init my-tutorial again (without --example quickstart) to manually create the model and generate mappings yourself.