Tutorial
Chapter 1 - Jump Start
Goal: Observe how Daana generates an integrated, time-aware data warehouse from four different source-data shapes, using only YAML declarations and three CLI commands.
What you'll see: Customer data merged from three independent source systems and full change history tracked automatically, all available for query immediately.
Time: typically 10-15 minutes
Before starting: Complete the Quickstart Tutorial to set up your project and databases.
The Source Data
The Book Retailer's customer data resides in four independent source tables. The columns differ between tables, and the shape of each table differs as well. Daana represents each shape with one value of a single mapping field, ingestion_strategy. The four shapes used in this tutorial are described below.
stage.customers (shape: change log). One row is written each time CRM data changes, timestamped with updated_at. Customer 1 appears twice because their status changed:
customer_id | first_name | last_name | account_status | updated_at
-------------+------------+-----------+----------------+---------------------
1 | John | Smith | Active | 2024-11-09 17:44:59
2 | Emily | Johnson | Active | 2024-11-04 18:24:50
3 | Michael | Brown | Active | 2024-10-16 10:37:12
...
1 | John | Smith | Inactive | 2024-11-25 14:30:22 <- status changed
3 | Michael | Brown | Suspended | 2024-11-27 16:45:18 <- status changed
stage.customer_addresses (shape: daily snapshot). The source writes one row per customer per night, regardless of whether the address changed. Most rows are duplicates of the previous day's row (275 in total, one per customer per day):
customer_id | city | state | snapshot_date
-------------+--------------+-------+---------------
2 | Boston | MA | 2024-11-05
2 | Boston | MA | 2024-11-28
2 | Philadelphia | PA | 2024-11-29 <- address changed
...
stage.loyalty_memberships (shape: current-state lookup). One row per active member, updated in place when the tier changes. The source system stores no history of its own; only the current state is available:
customer_id | tier_level | join_date
-------------+------------+------------
1 | Gold | 2023-01-10
2 | Silver | 2023-02-14
4 | Gold | 2023-04-20
6 | Platinum | 2023-06-10
...
When a tier value changes between runs, Daana writes a new version instead of overwriting; chapter 4 covers how.
stage.sales_orders (shape: transactional). Immutable order events. Each row is written once when the order is placed and never updated, and the row carries both a monetary amount and its currency code:
order_id | customer_id | order_status | total_amount | currency | order_date
----------+-------------+--------------+--------------+----------+---------------------
1 | 1 | Delivered | 60.33 | USD | 2024-11-01 10:30:00
2 | 2 | Delivered | 24.49 | USD | 2024-11-02 14:15:00
6 | 6 | Delivered | 92.38 | USD | 2024-11-06 13:20:00
Why this is hard without Daana
Producing a single unified, time-aware customer view from these four shapes the traditional way requires four different ingestion patterns inside one pipeline:
- CDC merge for the change log, picking the latest version per
updated_at. - Snapshot deduplication for the daily address dumps, reducing 275 rows to one row per actual address change.
- SCD2 reconstruction for loyalty memberships, because the source overwrites in place and history has to be derived downstream.
- Append-only inserts with deduplication for the orders, plus a separate decision about how to keep
total_amountandcurrencyaligned if either ever changes.
Then a FULL OUTER JOIN across the four streams, key reconciliation, and valid_from/valid_to columns so analysts can answer "what did this customer look like on 27 November?". The result is a custom ETL implementation that has to be re-checked whenever the source schema drifts.
In Daana, each shape maps to one ingestion_strategy. This chapter uses three: FULL_LOG for both the change log and the daily snapshot, FULL for the loyalty lookup, and TRANSACTIONAL for the orders. This configuration is enough for Daana to generate the merging, attribute-level history, and snapshot deduplication automatically. Chapter 4 walks through each strategy in detail.
Author Your Model and Mappings
You've already run daana-cli init my-tutorial --example quickstart, which set up the data, Docker, and a workflow scaffold. Two pieces remain for you to author by hand:
model.yamlat the project root, which declares the business entities, attributes, and relationships. The file ships with a placeholder so this chapter can replace it directly.- Two mapping files in
mappings/, which connect each entity to its source tables. The directory ships empty; you generate the skeletons withdaana-cli generate mappingand complete them.
This is the same authoring loop used on real projects. The remainder of this chapter walks through each file.
Author the Model
Start with the model file. Open model.yaml in your editor and replace its contents with the following:
model:
id: BOOK_RETAILER_MODEL
name: BookRetailerModel
definition: Unified customer and order data from multiple sources
description: Merges customer data with address snapshots for time-travel analytics
entities:
- id: CUSTOMER
name: CUSTOMER
definition: Customer who places orders
description: Unified customer data from customers table and address snapshots
attributes:
- id: customer_id
name: customer_id
definition: Unique customer identifier
type: STRING
- id: customer_name
name: customer_name
definition: Full name of the customer
type: STRING
effective_timestamp: true
- id: customer_status
name: customer_status
definition: Account status (Active, Inactive, Suspended)
type: STRING
effective_timestamp: true
- id: customer_city
name: customer_city
definition: City where customer is located
type: STRING
effective_timestamp: true
- id: customer_state
name: customer_state
definition: State where customer is located
type: STRING
effective_timestamp: true
- id: customer_loyalty_tier
name: customer_loyalty_tier
definition: Loyalty program tier level (Bronze, Silver, Gold, Platinum)
type: STRING
effective_timestamp: true
- id: ORDER
name: ORDER
definition: Purchase order
description: Orders placed by customers
attributes:
- id: order_id
name: order_id
definition: Unique order identifier
type: STRING
- id: order_status
name: order_status
definition: Current order status
type: STRING
effective_timestamp: true
- id: order_purchase_ts
name: order_purchase_ts
definition: When order was placed
type: START_TIMESTAMP
- id: order_value
name: order_value
definition: Total monetary value of the order
description: Amount and currency bundled as an atomic concept
effective_timestamp: true
group:
- id: order_value_amount
name: order_value_amount
definition: Monetary amount
type: NUMBER
- id: order_value_currency
name: order_value_currency
definition: Currency code (ISO 4217, e.g. USD, EUR)
type: UNIT
relationships:
- name: IS_PLACED_BY
definition: Order is placed by a customer
source_entity_id: ORDER
target_entity_id: CUSTOMER
Validate the file:
daana-cli check model
The model declares two entities (CUSTOMER and ORDER) and the IS_PLACED_BY relationship between them. Most attributes carry effective_timestamp: true so Daana tracks their change history independently. Chapter 2 walks through that and the other field types in detail. One type is worth flagging now because it explains an asymmetry between the two mappings you are about to write:
type: START_TIMESTAMPmarksorder_purchase_tsas the event timestamp that places the order on the timeline. Daana uses it as the entity's effective timestamp for the order, so the order mapping does not need to setentity_effective_timestamp_expressionfor this attribute. See START_TIMESTAMP.
Generate Mapping Skeletons
Bootstrap one mapping file per entity:
daana-cli generate mapping
Output:
Generating mapping templates...
Model: model.yaml
Output: mappings/
Generated 2 mapping template(s)
Generated CUSTOMER: customer-mapping.yaml
Generated ORDER: order-mapping.yaml
This writes mappings/customer-mapping.yaml and mappings/order-mapping.yaml. Each skeleton lists every entity attribute once with a placeholder transformation_expression, sets primary_keys to the natural-key attribute id (customer_id, order_id), defaults ingestion_strategy to FULL_LOG, and pre-fills connection, table, and (where applicable) entity_effective_timestamp_expression with placeholder values you replace. The order skeleton also lists each model relationship once with the correct id and a placeholder target_transformation_expression. None of the placeholders are validated against the source schema; the next two sections replace them with the real expressions.
Each skeleton starts with a single source table block. CUSTOMER is sourced from three systems in this tutorial, so the next section adds two more table blocks under tables:; ORDER is sourced from a single table and stays one block.
Complete the Customer Mapping
Open mappings/customer-mapping.yaml and replace its contents with the following:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
tables:
# Source 1: CRM data, core customer records with full status change history
- connection: dev
table: stage.customers
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: updated_at
attributes:
- id: customer_id
transformation_expression: customer_id
- id: customer_name
transformation_expression: CONCAT(first_name, ' ', last_name)
- id: customer_status
transformation_expression: account_status
# Source 2: Address snapshots, merged automatically with Source 1
- connection: dev
table: stage.customer_addresses
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: snapshot_date
attributes:
- id: customer_city
transformation_expression: city
- id: customer_state
transformation_expression: state
# Source 3: Loyalty tier, merged automatically with Sources 1 and 2
- connection: dev
table: stage.loyalty_memberships
primary_keys:
- customer_id
ingestion_strategy: FULL
entity_effective_timestamp_expression: join_date
attributes:
- id: customer_loyalty_tier
transformation_expression: tier_level
The mapping assembles CUSTOMER from three source tables in a single mapping group: the CRM change log (stage.customers, FULL_LOG), the daily address snapshots (stage.customer_addresses, FULL_LOG), and the current-state loyalty roster (stage.loyalty_memberships, FULL). Each table block specifies the connection, the source table, the primary key column, the ingestion strategy, the column that supplies the effective timestamp, and the transformation expression for each attribute. Merging across tables is generated by Daana when the entity is materialized.
connection: dev resolves against the dev profile in connections.yaml, which --example quickstart set to point at the local PostgreSQL container. See Connection Profiles for the full reference.
The mapping_groups list has a single entry in this chapter; Chapter 2 covers when an entity needs more than one.
Complete the Order Mapping
Open mappings/order-mapping.yaml and replace its contents with the following:
entity_id: ORDER
mapping_groups:
- name: default_mapping_group
tables:
- connection: dev
table: stage.sales_orders
primary_keys:
- order_id
ingestion_strategy: TRANSACTIONAL
batch_expression: ingested_at
entity_effective_timestamp_expression: order_date
attributes:
- id: order_id
transformation_expression: order_id
- id: order_status
transformation_expression: order_status
- id: order_purchase_ts
transformation_expression: order_date
- id: order_value_amount # child attribute of the order_value group
transformation_expression: total_amount
- id: order_value_currency # child attribute of the order_value group
transformation_expression: currency
relationships:
- id: ORDER_IS_PLACED_BY_CUSTOMER
source_table: stage.sales_orders
target_transformation_expression: customer_id
The order is sourced from a single table (stage.sales_orders, TRANSACTIONAL). The two child attributes of the order_value group are mapped individually (order_value_amount and order_value_currency), and the relationships block resolves IS_PLACED_BY by pointing at the customer_id column on the source table. daana-cli generate mapping pre-filled the relationship id. Chapter 2 walks through the naming rule.
Validate
daana-cli check workflow
A successful check confirms the model and both mappings parse, the entity and attribute references resolve, and the mappings cover every attribute declared on the model. deploy and execute invoke check internally, so this step is optional, but running it once after authoring catches typos before any database changes happen.
The check also surfaces three warnings about the dev connection profile: the password and user fields are hardcoded rather than read from environment variables, and sslmode is disable. These are intentional for the local sandbox so the tutorial runs without extra setup. Chapter 6 covers the production hardening (secrets, TLS, separate profiles).
Deploy and Execute
You already ran daana-cli install during the tutorial setup. Install is environment-level and runs once per database. From here, deploy the workflow and execute it:
daana-cli deploy
daana-cli execute
Explore What Daana Built
For each entity in your model, Daana generates three types of views in the daana_dw schema:
| View | Contents |
|---|---|
view_<entity> | Current state, one row per entity |
view_<entity>_hist | Full change history, one row per tracked version |
view_<entity>_with_rel | Current state with all related entities joined |
Daana also generates per-relationship junction views (such as view_order_is_placed_by_customer) and internal v_<entity>_desc views (here v_customer_desc and v_order_desc) over the raw EAV-style storage (Entity-Attribute-Value: each attribute lives in its own row keyed by entity id and attribute name, instead of as a wide table column); Chapter 2 lists the full set and explains the underlying tables.
These views form a complete, historized business layer. They can be queried directly or used as building blocks for a traditional dimensional mart (dimensions and fact tables). Chapter 7 walks through that process. Future versions of Daana will generate dimensional marts automatically.
customer_key is each customer's surrogate key (always typed character varying); the ::integer cast in ORDER BY clauses sorts numerically. Chapter 2 explains how Daana derives it.
Open a psql shell against the local container so you can run the upcoming SQL queries:
docker exec -it daana-customerdb psql -U dev -d customerdb
Query 1: Three sources, one unified view
All three source systems are merged into a single row per customer:
SELECT
customer_key,
customer_name,
customer_status,
customer_city,
customer_state,
customer_loyalty_tier
FROM daana_dw.view_customer
ORDER BY customer_key::integer;
Result:
customer_key | customer_name | customer_status | customer_city | customer_state | customer_loyalty_tier
--------------+---------------------+-----------------+---------------+----------------+-----------------------
1 | John Smith | Inactive | New York | NY | Gold
2 | Emily Johnson | Active | Philadelphia | PA | Silver
3 | Michael Brown | Suspended | Brooklyn | NY | Bronze
4 | Sarah Davis | Active | Cambridge | MA | Gold
5 | David Wilson | Active | Queens | NY | Silver
6 | Jennifer Martinez | Active | New York | NY | Platinum
7 | Robert Anderson | Active | San Francisco | CA |
8 | Lisa Taylor | Active | San Francisco | CA | Bronze
9 | James Thomas | Active | Los Angeles | CA | Silver
10 | Maria Garcia | Active | Boston | MA | Bronze
(10 rows)
One row per customer, assembled from three independent source tables. Robert Anderson (customer 7) has no loyalty membership record; his customer_loyalty_tier is empty. Daana handles missing data across sources without additional configuration.
The 12 rows in stage.customers reduce to 10 here because customers 1 and 3 each carry an additional status-change row (Active -> Inactive, Active -> Suspended); FULL_LOG ingestion preserves those as historical versions rather than duplicate customers, and the current-state view shows only the latest version per entity.
Query 2: The full change history for one customer
Emily (customer 2) has dozens of rows in stage.customer_addresses and two rows in stage.customers. Daana deduplicates the snapshots, detects only actual changes, and stores one row per distinct version:
SELECT
customer_name,
customer_city,
customer_state,
customer_status,
customer_loyalty_tier,
eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
ORDER BY eff_tmstp;
Result:
customer_name | customer_city | customer_state | customer_status | customer_loyalty_tier | eff_tmstp
---------------+---------------+----------------+-----------------+-----------------------+---------------------
| | | | Silver | 2023-02-14 00:00:00
Emily Johnson | | | Active | Silver | 2024-11-04 18:24:50
Emily Johnson | Boston | MA | Active | Silver | 2024-11-05 00:00:00
Emily Johnson | Philadelphia | PA | Active | Silver | 2024-11-29 00:00:00
(4 rows)
The first row (2023-02-14) originates from stage.loyalty_memberships. On that date Emily's loyalty tier (Silver) is the only thing the business knows about her; the CRM has not yet created a record and no address snapshot exists, so name, city, state, and status remain empty. Each effective_timestamp: true attribute is historized independently, so a single timestamp can carry a partial picture without invalidating the others. Subsequent rows fill in CRM and address attributes as they arrive. Emily's 25 daily address snapshots reduce to 2 actual change events (Boston, then Philadelphia).
Query 3: Point-in-time query
Query Emily's address on 27 November 2024, prior to her relocation:
SELECT
customer_name,
customer_city,
customer_state,
eff_tmstp
FROM daana_dw.view_customer_hist
WHERE customer_key = '2'
AND eff_tmstp <= '2024-11-27'
ORDER BY eff_tmstp DESC
LIMIT 1;
Result:
customer_name | customer_city | customer_state | eff_tmstp
---------------+---------------+----------------+------------------------
Emily Johnson | Boston | MA | 2024-11-05 00:00:00
No additional query structure is required. The history is stored and timestamped by Daana automatically.
Daana also exposes a parameterized helper, daana_dw.f_customer(p_eff_tmstp), that returns the same point-in-time snapshot in one call. The function applies to every customer at once, where the view_customer_hist query above returned only one row:
SELECT
customer_key,
customer_name,
customer_city,
customer_state,
customer_status
FROM daana_dw.f_customer('2024-11-27')
WHERE customer_key = '2';
Result:
customer_key | customer_name | customer_city | customer_state | customer_status
--------------+---------------+---------------+----------------+-----------------
2 | Emily Johnson | Boston | MA | Active
Drop the WHERE clause to get the full per-customer snapshot. See Chapter 2: Time-travel queries for the full reference.
Query 4: Relationship view
The _with_rel view resolves the IS_PLACED_BY relationship by attaching each customer's order columns alongside the customer's own attributes, so a single query returns customer state and the order they placed:
SELECT
order_purchase_ts,
customer_key,
customer_name,
customer_loyalty_tier,
order_id,
order_status,
order_value_amount,
order_value_currency
FROM daana_dw.view_customer_with_rel
ORDER BY order_purchase_ts DESC;
Result:
order_purchase_ts | customer_key | customer_name | customer_loyalty_tier | order_id | order_status | order_value_amount | order_value_currency
---------------------+--------------+-------------------+-----------------------+----------+--------------+--------------------+----------------------
2024-11-25 16:45:00 | 6 | Jennifer Martinez | Platinum | 13 | Delivered | 56.53000000 | USD
2024-11-20 11:00:00 | 2 | Emily Johnson | Silver | 12 | Delivered | 35.88000000 | USD
2024-11-15 13:30:00 | 1 | John Smith | Gold | 11 | Delivered | 48.38000000 | USD
2024-11-10 14:20:00 | 10 | Maria Garcia | Bronze | 10 | Processing | 65.77000000 | USD
2024-11-09 09:45:00 | 9 | James Thomas | Silver | 9 | Delivered | 38.60000000 | USD
2024-11-08 15:30:00 | 8 | Lisa Taylor | Bronze | 8 | Delivered | 72.50000000 | USD
2024-11-07 10:00:00 | 7 | Robert Anderson | | 7 | Shipped | 57.07000000 | USD
2024-11-06 13:20:00 | 6 | Jennifer Martinez | Platinum | 6 | Delivered | 92.38000000 | USD
2024-11-05 11:30:00 | 5 | David Wilson | Silver | 5 | Delivered | 19.52000000 | USD
2024-11-04 16:45:00 | 4 | Sarah Davis | Gold | 4 | Processing | 44.03000000 | USD
2024-11-03 09:20:00 | 3 | Michael Brown | Bronze | 3 | Shipped | 71.39000000 | USD
2024-11-02 14:15:00 | 2 | Emily Johnson | Silver | 2 | Delivered | 24.49000000 | USD
2024-11-01 10:30:00 | 1 | John Smith | Gold | 1 | Delivered | 60.33000000 | USD
(13 rows)
The IS_PLACED_BY relationship was declared as ORDER -> CUSTOMER and Daana materializes it on both sides: view_order_with_rel returns each order with the customer's columns attached, and view_customer_with_rel (queried above) returns each customer with the order columns attached. The relationship is many-orders-per-customer: customers 1, 2, and 6 each appear twice in the result above (a recent November order at the top and an earlier order further down). An aggregate query (e.g. COUNT(order_id) GROUP BY customer_key) rolls these back up to one row per customer. No manual JOIN logic is required at query time.
What Happened Automatically
Four capabilities were applied without any custom code:
Multi-source merge. Three independent tables (stage.customers, stage.customer_addresses, stage.loyalty_memberships) were consolidated into a single CUSTOMER entity. Daana matched rows by customer_id across all three sources, assembled the attributes, and handled customers with no loyalty membership (such as Robert Anderson) without special-case logic.
Snapshot deduplication and attribute-level history. Emily's 25 address snapshot rows (out of 275 in the dataset) contained mostly identical records; most rows reflected the same address from day to day. Daana identified only the genuine changes and stored four historical versions: loyalty registration, first customer record, first address, and the move to Philadelphia. Every attribute with effective_timestamp: true receives this treatment automatically.
Grouped monetary attribute. order_value_amount and order_value_currency were declared as a single concept (order_value). They are stored together and will always change together in the historical record; an amount value cannot exist without its corresponding currency. Chapter 3 shows where the grouping lives in storage and why it matters.
Relationship views. The IS_PLACED_BY relationship between ORDER and CUSTOMER was resolved into pre-joined views on both sides: view_order_with_rel (each order with customer columns) and view_customer_with_rel (each customer with order columns). The customer-side view returns one row per order, as Query 4 showed for John, Emily, and Jennifer; aggregate (COUNT, SUM, etc.) when you need one row per customer. No manual JOIN logic is required at query time.
What You Avoided
To implement this manually in SQL (dbt, Dataform, or stored procedures), the following work would be required:
| Requirement | Manual implementation |
|---|---|
| Three-source merge | FULL OUTER JOIN across three tables with COALESCE conflict resolution |
| Address snapshot deduplication | Window function change detection using LAG() across 275 rows |
| SCD2 history table | valid_from/valid_to columns, gap detection, row hashing |
| Point-in-time queries | WHERE valid_from <= :date AND valid_to > :date on every query |
| Grouped monetary attribute | Convention enforcement in application code or separate constraint tables |
| Order-customer join view | Manual CREATE VIEW with JOIN on surrogate keys |
Manual effort: Multiple SQL files, significant development time, and ongoing maintenance whenever a source schema changes.
In Daana: Six attributes in a model file and three source table entries in a mapping.
Next Steps
This chapter demonstrated the end result. The following chapters unpack how it was built:
- Chapter 2: The Building Blocks: Open
model.yaml,customer-mapping.yaml, andworkflow.yaml. Understand what each line means and how the three-command lifecycle works. - Chapter 7: Building Your Analytics Layer: Build traditional dimensions and fact tables from Daana's output views, including point-in-time joins for historical accuracy.