Tutorial
Chapter 2 - The Building Blocks
Goal: Understand the core components of a Daana project and how they work together.
Prerequisites: You should have completed Chapter 1: Jump Start.
Understanding How It Works
In Chapter 1, you ran three commands and produced a fully merged, history-tracking data warehouse. This chapter opens the configuration files and explains how each piece contributes to that result.
Every Daana project is built from three components:
- The Model (What)
- The Mapping (Where)
- The Workflow (How)
1. The Model (What)
The Model is your business representation. It describes the entities and attributes of your business domain, independent of any source system.
In Chapter 1, the CUSTOMER entity was defined with six attributes sourced from three independent tables:
entities:
- id: CUSTOMER
attributes:
- id: customer_name
type: STRING
effective_timestamp: true
- id: customer_loyalty_tier
type: STRING
effective_timestamp: true
# ... customer_status, customer_city, customer_state, customer_id
The ORDER entity included a grouped attribute that bundles amount and currency as one atomic concept:
- id: ORDER
attributes:
- id: order_value
effective_timestamp: true
group:
- id: order_value_amount
type: NUMBER
- id: order_value_currency
type: UNIT
Key observations:
- Entities are the nouns of your business (Customer, Order, Product).
- Attributes are the descriptive properties (Name, City, Status).
- Relationships are the verbs (Customer places Order).
- Grouped attributes bundle properties that are semantically inseparable (amount + currency).
Models are declarative. You don't specify CREATE TABLE with VARCHAR(255). You declare "I have a Customer Name", and Daana determines the optimal physical storage for your database platform.
Field-level conventions
A few patterns repeat across every entity and attribute in the model file you authored in Chapter 1:
idandnameare repeated on every entity and attribute.idis the internal identifier used for cross-references (in mappings and relationships);nameis the display name. They are usually identical. See Naming Convention.definitionis the concise, single-line statement used in metadata.descriptionis the optional longer-form business context. See Definition vs Description.effective_timestamp: trueopts an attribute into independent attribute-level history tracking when Daana materializes the entity. It is omitted on the natural-key attributes (customer_id,order_id) because their values do not change, and onorder_purchase_tsbecause aSTART_TIMESTAMPattribute is itself the anchor in time.type: UNITmarksorder_value_currencyas the qualifier of a measurement (here, the currency of an amount). The groupedorder_valuebundles amount and currency as an atomic concept that always change together. Chapter 3 covers grouped attributes in depth; see also UNIT and Group Attributes.- Relationship
idfollows a fixed pattern,<SOURCE_ENTITY>_<RELATIONSHIP_NAME>_<TARGET_ENTITY>. TheIS_PLACED_BYrelationship between sourceORDERand targetCUSTOMERbecomesORDER_IS_PLACED_BY_CUSTOMERin the order mapping. Daana writes this id into the relationship table for every materialized row, so each row can be traced back to its model definition. When a model declares multiple relationships of the same shape between the same entities (e.g.ORDER_PLACED_BY_CUSTOMERandORDER_RETURNED_BY_CUSTOMER), distinctidvalues keep them apart.daana-cli generate mappingpre-fills the relationshipid. Chapter 5 covers richer relationship topologies; see Relationship Mappings for the full field reference. <entity>_keyis the warehouse surrogate key Daana exposes on every view, always typedcharacter varying. It is built by casting the source primary key (the attribute markedprimary_keysin the mapping) to text, so sourcecustomer_id = 2becomes warehousecustomer_key = '2'. Numeric ordering therefore needs an explicit::integercast inORDER BY, as you saw in chapter 1's queries.
2. The Mapping (Where)
The Mapping connects your clean Model to your source data.
In Chapter 1, the CUSTOMER entity was mapped from three tables within a single mapping group:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
tables:
- connection: dev
table: stage.customers
primary_keys:
- customer_id
ingestion_strategy: FULL_LOG
entity_effective_timestamp_expression: updated_at
attributes:
- id: customer_name
transformation_expression: CONCAT(first_name, ' ', last_name)
- table: stage.customer_addresses
# connection, primary_keys, ingestion_strategy, entity_effective_timestamp_expression
# repeat per table; full file is in chapter 1.
attributes:
- id: customer_city
transformation_expression: city
- table: stage.loyalty_memberships
attributes:
- id: customer_loyalty_tier
transformation_expression: tier_level
Key observations:
connectionnames the profile inconnections.yamlused to read the source table. The same mapping file can reach into different databases by listing tables under different connections.primary_keysidentifies the column (or columns) that uniquely keys each source row. Daana uses it to derive the surrogate<entity>_keyand to deduplicate snapshots across runs.entity_effective_timestamp_expressionis the column or SQL expression that places each source row on the timeline. Attribute-level history versioning uses this value as the row'seff_tmstpunless an attribute carries its owneffective_timestamp_expressionoverride.- Transformation expressions on each attribute contain SQL logic that cleans or combines source columns before data enters the warehouse.
- Mapping groups tell Daana that one or more tables describe the same entity within a single coherent rule set. All three tables above contribute attributes to CUSTOMER, matched by
customer_id, so they live in one group calleddefault_mapping_group. An entity uses more than one group when different sources need distinct settings, such as separate primary key strategies or independent relationship resolution; Chapter 5 covers the multi-group case. - Ingestion strategies (FULL_LOG, FULL, TRANSACTIONAL, INCREMENTAL) tell Daana how each source table delivers its data. Chapter 4 covers these in detail.
3. The Workflow (How)
The Workflow orchestrates the execution:
workflow:
id: BOOK_RETAILER_WORKFLOW
name: BookRetailerWorkflow
definition: Merge customer and address data with orders
description: Demonstrates multi-source merging and time-travel
model: model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/order-mapping.yaml
connection: dev
The workflow header carries the same id, name, definition, and description fields the model uses (see Field-level conventions above). The body ties the abstract Model and Mappings to a concrete environment via the connection profile, so the same logic can be deployed to Dev, Staging, and Production by changing only the profile.
The workflow's connection is the target warehouse profile (where Daana writes); each tables block in the mapping declares its own source connection (where Daana reads from). They happen to point at the same dev profile in this tutorial, but a real project often pulls source data from one or more upstream databases into a separate warehouse.
The Lifecycle: Init, Install, Check, Deploy, Execute
A Daana project moves through five command-level stages. You exercised most of them in Chapter 1. Here is what each one does and when it runs:
1. Init (daana-cli init)
Input: Project name and (optional) example template. Action: Scaffolds a base project. Without --example, it creates a placeholder model.yaml with example entities, a workflow.yaml scaffold, connections.yaml, config.yaml, and an empty mappings/ directory. Run once when you start a project. The --example quickstart flag overlays the tutorial dataset (data/*.sql, docker-compose.yml, tutorial_setup.sh) and a workflow.yaml referencing the two mappings; model.yaml ships as a placeholder that Chapter 1 walks through replacing, and the mappings/ directory ships empty so the reader runs through the same authoring loop used on real projects.
2. Install (daana-cli install)
Input: Connection profile. Action: Creates the internal control schemas (daana_metadata, daana_stage) and registers the framework in the target database. Run once per environment, not on every deploy.
3. Check (daana-cli check)
Input: Your YAML files. Action: Validates syntax, resolves entity and attribute references, and verifies that the model and mappings are consistent. This is the equivalent of a compiler catching errors before runtime. deploy and execute invoke check internally, so you only need to call it explicitly when you want to validate without doing any work in the warehouse (for example, in CI).
4. Deploy (daana-cli deploy)
Input: Your workflow file. Output: Physical tables and views in your data warehouse. Action: Generates and executes CREATE TABLE and CREATE VIEW statements. Prepares the infrastructure for data loading.
5. Execute (daana-cli execute)
Action: Loads and transforms the data. Runs the INSERT, MERGE, and SELECT statements generated during deployment.
In Chapter 1 the explicit sequence was init -> install -> deploy -> execute; the check step ran implicitly inside deploy and execute.
Try It Yourself
Exercise: Break the Model-Mapping Contract
The Model and Mapping must stay in sync. To see this in action:
In
model.yaml, comment out thecustomer_statusattribute block:# - id: customer_status # name: customer_status # definition: Account status (Active, Inactive, Suspended) # type: STRING # effective_timestamp: trueRun validation:
daana-cli check workflowObserve the output. Two findings appear:
- An error:
Attribute 'customer_status' not found in entity 'CUSTOMER' (Did you mean: customer_state, customer_city, customer_name?). The mapping references an attribute that no longer exists in the model. The "Did you mean" suggestion is a fuzzy-match Daana surfaces whenever an id is close to a known one, useful for catching typos on real projects. - A follow-up warning that
Entity 'CUSTOMER' defined in model but has no mapping. The mapping file is still on disk; this warning fires because the failed mapping was dropped from coverage tracking. Restoring the attribute clears both findings.
- An error:
This demonstrates the tight coupling between Model and Mapping. Changes to one require corresponding updates to the other.
Restore the attribute by uncommenting the block before continuing.
Explore What Daana Created
Open a psql shell against the local container so you can run the upcoming SQL queries:
docker exec -it daana-customerdb psql -U dev -d customerdb
List the views and tables Daana generated in the daana_dw schema:
\dt daana_dw.*
\dv daana_dw.*
Tables (internal storage):
Schema | Name | Type | Owner
----------+------------------+-------+-------
daana_dw | customer_desc | table | dev
daana_dw | customer_focal | table | dev
daana_dw | customer_idfr | table | dev
daana_dw | order_customer_x | table | dev
daana_dw | order_desc | table | dev
daana_dw | order_focal | table | dev
daana_dw | order_idfr | table | dev
(7 rows)
Views (your query interface):
Schema | Name | Type | Owner
----------+----------------------------------+------+-------
daana_dw | v_customer_desc | view | dev
daana_dw | v_order_desc | view | dev
daana_dw | v_order_is_placed_by_customer | view | dev
daana_dw | view_customer | view | dev
daana_dw | view_customer_hist | view | dev
daana_dw | view_customer_with_rel | view | dev
daana_dw | view_order | view | dev
daana_dw | view_order_hist | view | dev
daana_dw | view_order_is_placed_by_customer | view | dev
daana_dw | view_order_with_rel | view | dev
(10 rows)
The 10 views fall into three families: the user-facing view_<entity>[_hist|_with_rel] views you saw in Chapter 1, per-relationship junction views (view_order_is_placed_by_customer) that surface the relationship table, and internal v_* views over Daana's raw EAV-style storage that the user-facing views are built on.
Physical layer
Daana stores entity data in an EAV-style layout (Entity-Attribute-Value: each attribute lives in its own row keyed by the entity's surrogate key and an internal attribute identifier, instead of as a wide table column). The user-facing views pivot that layout back into the columnar shape you query.
If you SELECT * FROM daana_dw.customer_desc, you'll see additional bigint columns (type_key, data_key, inst_key, inst_row_key) and a popln_tmstp column. These are internal bookkeeping the views use to assemble the user-facing shape; treat them as opaque and stick to the view_* family for queries.
Each entity produces a small set of physical tables, and each relationship produces an _x table:
<entity>_focal: empty in this tutorial. With the strategies the chapter uses (FULL,FULL_LOG,TRANSACTIONAL), surrogate keys are derived directly from_descrows viaf_<entity>(), so_focalis not materialized. Daana populates it only when a runtime configuration emits explicit entity-lifecycle events; see Understanding Daana Output for the architectural picture.<entity>_desc: the descriptor table holding attribute values plus attribute-level history columns (eff_tmstp,end_tmstp). One row per(entity, attribute, version). The matchingv_<entity>_descview adds anamecolumn that identifies which attribute each row belongs to; the name carries the entity prefix (for exampleORDER_order_id,ORDER_order_status). These views are intended for internal inspection of the EAV layer, not for end-user queries.<entity>_idfr: the alternate-identifier table, populated only when an entity declares additional natural keys beyond its primary one. In this tutorial the customer entity has only a single natural key (customer_id), socustomer_idfrexists but stays empty.<rel_source>_<rel_target>_x: the relationship table between two focal entities (here,order_customer_xforORDER -> CUSTOMER).
The views are what you usually query directly. As described in Chapter 1, these views can be queried directly or used as building blocks for a traditional dimensional mart (dimensions and fact tables).
Time-travel queries
Each entity also gets a parameterised f_<entity>(p_eff_tmstp) function that returns the entity's full state as of the given timestamp. Chapter 1, Query 3 called it with an explicit date. Called with no argument, it returns the latest snapshot, because p_eff_tmstp defaults to the sentinel '9999-12-31':
SELECT customer_key, customer_name, customer_city, customer_status
FROM daana_dw.f_customer()
WHERE customer_key = '2';
customer_key | customer_name | customer_city | customer_status
--------------+---------------+---------------+-----------------
2 | Emily Johnson | Philadelphia | Active
Drop the WHERE clause to get one row per customer at that point in time. This is the recommended way to do point-in-time queries: the function call is a one-liner and works across the entire entity at once.
Learn more: See Understanding Daana Output for deeper coverage of the
_focal,_desc,_idfr, and_xtables.
Summary
- Models define the business entities, attributes, and relationships.
- Mappings define how source data feeds into the model.
- Workflows define the execution context and connection profile.
These three components, combined with the check/deploy/execute lifecycle, are the foundation of every Daana project.