Tutorial

Chapter 2 - The Building Blocks

Goal: Understand the core components of a Daana project and how they work together.

Prerequisites: You should have completed Chapter 1: Jump Start.


Understanding How It Works

In Chapter 1, you ran three commands and produced a fully merged, history-tracking data warehouse. This chapter opens the configuration files and explains how each piece contributes to that result.

Every Daana project is built from three components:

  1. The Model (What)
  2. The Mapping (Where)
  3. The Workflow (How)

1. The Model (What)

The Model is your business representation. It describes the entities and attributes of your business domain, independent of any source system.

In Chapter 1, the CUSTOMER entity was defined with six attributes sourced from three independent tables:

entities:
  - id: CUSTOMER
    attributes:
      - id: customer_name
        type: STRING
        effective_timestamp: true
      - id: customer_loyalty_tier
        type: STRING
        effective_timestamp: true
      # ... customer_status, customer_city, customer_state, customer_id

The ORDER entity included a grouped attribute that bundles amount and currency as one atomic concept:

  - id: ORDER
    attributes:
      - id: order_value
        effective_timestamp: true
        group:
          - id: order_value_amount
            type: NUMBER
          - id: order_value_currency
            type: UNIT

Key observations:

  • Entities are the nouns of your business (Customer, Order, Product).
  • Attributes are the descriptive properties (Name, City, Status).
  • Relationships are the verbs (Customer places Order).
  • Grouped attributes bundle properties that are semantically inseparable (amount + currency).

Models are declarative. You don't specify CREATE TABLE with VARCHAR(255). You declare "I have a Customer Name", and Daana determines the optimal physical storage for your database platform.

Field-level conventions

A few patterns repeat across every entity and attribute in the model file you authored in Chapter 1:

  • id and name are repeated on every entity and attribute. id is the internal identifier used for cross-references (in mappings and relationships); name is the display name. They are usually identical. See Naming Convention.
  • definition is the concise, single-line statement used in metadata. description is the optional longer-form business context. See Definition vs Description.
  • effective_timestamp: true opts an attribute into independent attribute-level history tracking when Daana materializes the entity. It is omitted on the natural-key attributes (customer_id, order_id) because their values do not change, and on order_purchase_ts because a START_TIMESTAMP attribute is itself the anchor in time.
  • type: UNIT marks order_value_currency as the qualifier of a measurement (here, the currency of an amount). The grouped order_value bundles amount and currency as an atomic concept that always change together. Chapter 3 covers grouped attributes in depth; see also UNIT and Group Attributes.
  • Relationship id follows a fixed pattern, <SOURCE_ENTITY>_<RELATIONSHIP_NAME>_<TARGET_ENTITY>. The IS_PLACED_BY relationship between source ORDER and target CUSTOMER becomes ORDER_IS_PLACED_BY_CUSTOMER in the order mapping. Daana writes this id into the relationship table for every materialized row, so each row can be traced back to its model definition. When a model declares multiple relationships of the same shape between the same entities (e.g. ORDER_PLACED_BY_CUSTOMER and ORDER_RETURNED_BY_CUSTOMER), distinct id values keep them apart. daana-cli generate mapping pre-fills the relationship id. Chapter 5 covers richer relationship topologies; see Relationship Mappings for the full field reference.
  • <entity>_key is the warehouse surrogate key Daana exposes on every view, always typed character varying. It is built by casting the source primary key (the attribute marked primary_keys in the mapping) to text, so source customer_id = 2 becomes warehouse customer_key = '2'. Numeric ordering therefore needs an explicit ::integer cast in ORDER BY, as you saw in chapter 1's queries.

2. The Mapping (Where)

The Mapping connects your clean Model to your source data.

In Chapter 1, the CUSTOMER entity was mapped from three tables within a single mapping group:

entity_id: CUSTOMER

mapping_groups:
  - name: default_mapping_group
    tables:
      - connection: dev
        table: stage.customers
        primary_keys:
          - customer_id
        ingestion_strategy: FULL_LOG
        entity_effective_timestamp_expression: updated_at
        attributes:
          - id: customer_name
            transformation_expression: CONCAT(first_name, ' ', last_name)

      - table: stage.customer_addresses
        # connection, primary_keys, ingestion_strategy, entity_effective_timestamp_expression
        # repeat per table; full file is in chapter 1.
        attributes:
          - id: customer_city
            transformation_expression: city

      - table: stage.loyalty_memberships
        attributes:
          - id: customer_loyalty_tier
            transformation_expression: tier_level

Key observations:

  • connection names the profile in connections.yaml used to read the source table. The same mapping file can reach into different databases by listing tables under different connections.
  • primary_keys identifies the column (or columns) that uniquely keys each source row. Daana uses it to derive the surrogate <entity>_key and to deduplicate snapshots across runs.
  • entity_effective_timestamp_expression is the column or SQL expression that places each source row on the timeline. Attribute-level history versioning uses this value as the row's eff_tmstp unless an attribute carries its own effective_timestamp_expression override.
  • Transformation expressions on each attribute contain SQL logic that cleans or combines source columns before data enters the warehouse.
  • Mapping groups tell Daana that one or more tables describe the same entity within a single coherent rule set. All three tables above contribute attributes to CUSTOMER, matched by customer_id, so they live in one group called default_mapping_group. An entity uses more than one group when different sources need distinct settings, such as separate primary key strategies or independent relationship resolution; Chapter 5 covers the multi-group case.
  • Ingestion strategies (FULL_LOG, FULL, TRANSACTIONAL, INCREMENTAL) tell Daana how each source table delivers its data. Chapter 4 covers these in detail.

3. The Workflow (How)

The Workflow orchestrates the execution:

workflow:
  id: BOOK_RETAILER_WORKFLOW
  name: BookRetailerWorkflow
  definition: Merge customer and address data with orders
  description: Demonstrates multi-source merging and time-travel

  model: model.yaml

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml

  connection: dev

The workflow header carries the same id, name, definition, and description fields the model uses (see Field-level conventions above). The body ties the abstract Model and Mappings to a concrete environment via the connection profile, so the same logic can be deployed to Dev, Staging, and Production by changing only the profile.

The workflow's connection is the target warehouse profile (where Daana writes); each tables block in the mapping declares its own source connection (where Daana reads from). They happen to point at the same dev profile in this tutorial, but a real project often pulls source data from one or more upstream databases into a separate warehouse.


The Lifecycle: Init, Install, Check, Deploy, Execute

A Daana project moves through five command-level stages. You exercised most of them in Chapter 1. Here is what each one does and when it runs:

1. Init (daana-cli init)

Input: Project name and (optional) example template. Action: Scaffolds a base project. Without --example, it creates a placeholder model.yaml with example entities, a workflow.yaml scaffold, connections.yaml, config.yaml, and an empty mappings/ directory. Run once when you start a project. The --example quickstart flag overlays the tutorial dataset (data/*.sql, docker-compose.yml, tutorial_setup.sh) and a workflow.yaml referencing the two mappings; model.yaml ships as a placeholder that Chapter 1 walks through replacing, and the mappings/ directory ships empty so the reader runs through the same authoring loop used on real projects.

2. Install (daana-cli install)

Input: Connection profile. Action: Creates the internal control schemas (daana_metadata, daana_stage) and registers the framework in the target database. Run once per environment, not on every deploy.

3. Check (daana-cli check)

Input: Your YAML files. Action: Validates syntax, resolves entity and attribute references, and verifies that the model and mappings are consistent. This is the equivalent of a compiler catching errors before runtime. deploy and execute invoke check internally, so you only need to call it explicitly when you want to validate without doing any work in the warehouse (for example, in CI).

4. Deploy (daana-cli deploy)

Input: Your workflow file. Output: Physical tables and views in your data warehouse. Action: Generates and executes CREATE TABLE and CREATE VIEW statements. Prepares the infrastructure for data loading.

5. Execute (daana-cli execute)

Action: Loads and transforms the data. Runs the INSERT, MERGE, and SELECT statements generated during deployment.

In Chapter 1 the explicit sequence was init -> install -> deploy -> execute; the check step ran implicitly inside deploy and execute.


Try It Yourself

Exercise: Break the Model-Mapping Contract

The Model and Mapping must stay in sync. To see this in action:

  1. In model.yaml, comment out the customer_status attribute block:

    # - id: customer_status
    #   name: customer_status
    #   definition: Account status (Active, Inactive, Suspended)
    #   type: STRING
    #   effective_timestamp: true
    
  2. Run validation:

    daana-cli check workflow
    
  3. Observe the output. Two findings appear:

    • An error: Attribute 'customer_status' not found in entity 'CUSTOMER' (Did you mean: customer_state, customer_city, customer_name?). The mapping references an attribute that no longer exists in the model. The "Did you mean" suggestion is a fuzzy-match Daana surfaces whenever an id is close to a known one, useful for catching typos on real projects.
    • A follow-up warning that Entity 'CUSTOMER' defined in model but has no mapping. The mapping file is still on disk; this warning fires because the failed mapping was dropped from coverage tracking. Restoring the attribute clears both findings.

This demonstrates the tight coupling between Model and Mapping. Changes to one require corresponding updates to the other.

Restore the attribute by uncommenting the block before continuing.

Explore What Daana Created

Open a psql shell against the local container so you can run the upcoming SQL queries:

docker exec -it daana-customerdb psql -U dev -d customerdb

List the views and tables Daana generated in the daana_dw schema:

\dt daana_dw.*
\dv daana_dw.*

Tables (internal storage):

  Schema  |       Name       | Type  | Owner
----------+------------------+-------+-------
 daana_dw | customer_desc    | table | dev
 daana_dw | customer_focal   | table | dev
 daana_dw | customer_idfr    | table | dev
 daana_dw | order_customer_x | table | dev
 daana_dw | order_desc       | table | dev
 daana_dw | order_focal      | table | dev
 daana_dw | order_idfr       | table | dev
(7 rows)

Views (your query interface):

  Schema  |               Name               | Type | Owner
----------+----------------------------------+------+-------
 daana_dw | v_customer_desc                  | view | dev
 daana_dw | v_order_desc                     | view | dev
 daana_dw | v_order_is_placed_by_customer    | view | dev
 daana_dw | view_customer                    | view | dev
 daana_dw | view_customer_hist               | view | dev
 daana_dw | view_customer_with_rel           | view | dev
 daana_dw | view_order                       | view | dev
 daana_dw | view_order_hist                  | view | dev
 daana_dw | view_order_is_placed_by_customer | view | dev
 daana_dw | view_order_with_rel              | view | dev
(10 rows)

The 10 views fall into three families: the user-facing view_<entity>[_hist|_with_rel] views you saw in Chapter 1, per-relationship junction views (view_order_is_placed_by_customer) that surface the relationship table, and internal v_* views over Daana's raw EAV-style storage that the user-facing views are built on.

Physical layer

Daana stores entity data in an EAV-style layout (Entity-Attribute-Value: each attribute lives in its own row keyed by the entity's surrogate key and an internal attribute identifier, instead of as a wide table column). The user-facing views pivot that layout back into the columnar shape you query.

If you SELECT * FROM daana_dw.customer_desc, you'll see additional bigint columns (type_key, data_key, inst_key, inst_row_key) and a popln_tmstp column. These are internal bookkeeping the views use to assemble the user-facing shape; treat them as opaque and stick to the view_* family for queries.

Each entity produces a small set of physical tables, and each relationship produces an _x table:

  • <entity>_focal: empty in this tutorial. With the strategies the chapter uses (FULL, FULL_LOG, TRANSACTIONAL), surrogate keys are derived directly from _desc rows via f_<entity>(), so _focal is not materialized. Daana populates it only when a runtime configuration emits explicit entity-lifecycle events; see Understanding Daana Output for the architectural picture.
  • <entity>_desc: the descriptor table holding attribute values plus attribute-level history columns (eff_tmstp, end_tmstp). One row per (entity, attribute, version). The matching v_<entity>_desc view adds a name column that identifies which attribute each row belongs to; the name carries the entity prefix (for example ORDER_order_id, ORDER_order_status). These views are intended for internal inspection of the EAV layer, not for end-user queries.
  • <entity>_idfr: the alternate-identifier table, populated only when an entity declares additional natural keys beyond its primary one. In this tutorial the customer entity has only a single natural key (customer_id), so customer_idfr exists but stays empty.
  • <rel_source>_<rel_target>_x: the relationship table between two focal entities (here, order_customer_x for ORDER -> CUSTOMER).

The views are what you usually query directly. As described in Chapter 1, these views can be queried directly or used as building blocks for a traditional dimensional mart (dimensions and fact tables).

Time-travel queries

Each entity also gets a parameterised f_<entity>(p_eff_tmstp) function that returns the entity's full state as of the given timestamp. Chapter 1, Query 3 called it with an explicit date. Called with no argument, it returns the latest snapshot, because p_eff_tmstp defaults to the sentinel '9999-12-31':

SELECT customer_key, customer_name, customer_city, customer_status
FROM daana_dw.f_customer()
WHERE customer_key = '2';
 customer_key | customer_name | customer_city | customer_status
--------------+---------------+---------------+-----------------
 2            | Emily Johnson | Philadelphia  | Active

Drop the WHERE clause to get one row per customer at that point in time. This is the recommended way to do point-in-time queries: the function call is a one-liner and works across the entire entity at once.

Learn more: See Understanding Daana Output for deeper coverage of the _focal, _desc, _idfr, and _x tables.


Summary

  • Models define the business entities, attributes, and relationships.
  • Mappings define how source data feeds into the model.
  • Workflows define the execution context and connection profile.

These three components, combined with the check/deploy/execute lifecycle, are the foundation of every Daana project.

Next: Chapter 3 - Structuring Business Logic