Tutorial

Chapter 7 - Building Your Analytics Layer

Goal: Build dimensional views (dimensions and facts) from Daana's output views, creating a familiar analytics consumption layer.

Prerequisites: You must have completed Chapter 5: Mastering DMDL.


The Three-Layer Architecture

Daana operates within a three-layer data architecture:

Layer 1: Raw Ingestion (DAS)     Your source systems, staging tables


Layer 2: Business Entities (DAB)  Daana creates this layer


Layer 3: Analytics (DAR)          You build this from Daana's output

Layer 2 is what Daana generates: conformed, historized entities with views like view_customer, view_customer_hist, and view_order_with_rel. These views contain the full truth, but their structure reflects the underlying ensemble model rather than a traditional star schema.

Layer 3 is the analytics consumption layer: dimensions and facts that BI tools, dashboards, and analysts expect. You build Layer 3 from Layer 2 using standard SQL.

Daana does not yet generate Layer 3 views. This chapter shows how to build those views from Layer 2 by hand.


Column names in Layer 2

Daana's output views use lowercase column names for attributes (matching attribute IDs in the model). You can reference them in SQL without double quotes.


Exercise 1: Type 1 Dimension (Current State)

A Type 1 dimension shows only the current state of each entity. Previous values are overwritten when attributes change.

Open a psql shell against the local container so you can run the upcoming SQL queries:

docker exec -it daana-customerdb psql -U dev -d customerdb

First create the datamart schema that will hold all of your Layer 3 views:

CREATE SCHEMA IF NOT EXISTS datamart;

Daana's view_customer is already a Type 1 dimension. Wrapping it in a named view in the datamart schema makes it discoverable for BI tools and gives you a stable consumption layer:

CREATE OR REPLACE VIEW datamart.dim_customer_current AS
SELECT
  customer_key,
  customer_name,
  customer_status,
  customer_city,
  customer_state,
  customer_loyalty_tier
FROM daana_dw.view_customer;

Note: view_customer only carries the current snapshot, so it does not expose eff_tmstp. If you need a "last changed" timestamp on the dimension, derive it from view_customer_hist (see Exercise 2).

Query it:

SELECT * FROM datamart.dim_customer_current ORDER BY customer_key::int LIMIT 3;
 customer_key | customer_name | customer_status | customer_city | customer_state | customer_loyalty_tier
--------------+---------------+-----------------+---------------+----------------+-----------------------
 1            | John Smith    | Inactive        | New York      | NY             | Gold
 2            | Emily Johnson | Active          | Philadelphia  | PA             | Silver
 3            | Michael Brown | Suspended       | Brooklyn      | NY             | Bronze
(3 rows)

This is the simplest pattern. In many cases, using view_customer directly is sufficient. The datamart wrapper is useful when you want to rename columns, apply filters, or add calculated fields for a specific consumption use case.

The customer_registration_date attribute you added in Chapter 3 is a natural extension here. Adding it to dim_customer_current and bucketing on it (DATE_TRUNC('month', customer_registration_date)) gives you a month-of-acquisition cohort key on the dimension, ready to slice every fact table built on top.


Exercise 2: Type 2 Dimension (Full History)

A Type 2 dimension tracks the complete change history with explicit valid_from and valid_to date ranges. This enables point-in-time joins: given a timestamp, you can determine what the customer's attributes were at that moment.

Daana's view_customer_hist provides one row per change event. The LEAD() window function adds the valid_to boundary:

CREATE OR REPLACE VIEW datamart.dim_customer AS
SELECT
  customer_key,
  customer_name,
  customer_status,
  customer_city,
  customer_state,
  customer_loyalty_tier,
  eff_tmstp AS valid_from,
  COALESCE(
    LEAD(eff_tmstp) OVER (PARTITION BY customer_key ORDER BY eff_tmstp) - INTERVAL '1 millisecond',
    TIMESTAMP '9999-12-31 23:59:59'
  ) AS valid_to,
  CASE
    WHEN LEAD(eff_tmstp) OVER (PARTITION BY customer_key ORDER BY eff_tmstp) IS NULL
    THEN TRUE ELSE FALSE
  END AS is_current
FROM daana_dw.view_customer_hist;

Query Emily's dimension history:

SELECT
  customer_key,
  customer_name,
  customer_city,
  valid_from,
  valid_to,
  is_current
FROM datamart.dim_customer
WHERE customer_key = '2'
ORDER BY valid_from;
 customer_key | customer_name | customer_city | valid_from              | valid_to                    | is_current
--------------+---------------+---------------+-------------------------+-----------------------------+------------
 2            |               |               | 2023-02-14 00:00:00     | 2024-11-04 18:24:49.999     | false
 2            | Emily Johnson |               | 2024-11-04 18:24:50     | 2024-11-04 23:59:59.999     | false
 2            | Emily Johnson | Boston        | 2024-11-05 00:00:00     | 2024-11-28 23:59:59.999     | false
 2            | Emily Johnson | Philadelphia  | 2024-11-29 00:00:00     | 9999-12-31 23:59:59         | true
(4 rows)

Each row covers a non-overlapping time range. The valid_to of one row ends 1 millisecond before the valid_from of the next. The current version has valid_to = 9999-12-31 and is_current = true.

The output above assumes you haven't run Chapter 4, Exercise 2, which inserts a San Francisco snapshot for Emily on 2024-11-30. If you did run it, expect a fifth row here ending at 9999-12-31 for San Francisco, with the Philadelphia row's valid_to shortened to end one millisecond before.

Point-in-time query: Where was Emily on 27 November 2024?

SELECT customer_name, customer_city
FROM datamart.dim_customer
WHERE customer_key = '2'
  AND TIMESTAMP '2024-11-27' BETWEEN valid_from AND valid_to;
 customer_name | customer_city
---------------+---------------
 Emily Johnson | Boston

Exercise 3: Transaction Fact Table

A transaction fact table records one row per business event with dimension keys and numeric measures. Point-in-time joins to the Type 2 dimension capture the state of each dimension at the time of the event.

Build fact_orders from view_order plus the relationship-aware view_customer_with_rel (which exposes the order_key -> customer_key link), joining to dim_customer to capture the customer's name and loyalty tier at order time:

CREATE OR REPLACE VIEW datamart.fact_orders AS
SELECT
  o.order_key,
  o.order_id,
  o.order_status,
  o.order_purchase_ts                   AS order_date,

  o.order_value_amount::numeric(12,2)   AS order_amount,
  o.order_value_currency                AS order_currency,

  rel.customer_key,
  c.customer_name                       AS customer_name_at_order,
  c.customer_loyalty_tier               AS customer_tier_at_order

FROM daana_dw.view_order o
LEFT JOIN daana_dw.view_customer_with_rel rel
  ON o.order_key = rel.order_key
LEFT JOIN datamart.dim_customer c
  ON rel.customer_key = c.customer_key
  AND o.order_purchase_ts BETWEEN c.valid_from AND c.valid_to;

Why pull customer_key from view_customer_with_rel? Daana materializes related-entity columns on the target side of each relationship. IS_PLACED_BY runs from ORDER (source) to CUSTOMER (target), so:

                    IS_PLACED_BY
ORDER (source)  ─────────────────►  CUSTOMER (target)
     │                                     │
     │                                     │
     ▼                                     ▼
view_order_with_rel               view_customer_with_rel
    = ORDER attrs only            = CUSTOMER attrs + order_key
      (source side, no            + the order's own columns
       joined columns)              (target side carries the join)

A first instinct is to look for customer_key on view_order_with_rel. It is not there. The join key lives on the target side, which is view_customer_with_rel. This rule applies to every relationship in your model: if you can't find a foreign key on the side you started from, it is on the other side. Run \d daana_dw.view_<entity>_with_rel to see what each side actually carries.

Query the fact table:

SELECT
  order_id,
  order_date,
  order_amount,
  order_currency,
  customer_name_at_order,
  customer_tier_at_order
FROM datamart.fact_orders
ORDER BY order_id::integer;
 order_id |     order_date      | order_amount | order_currency | customer_name_at_order | customer_tier_at_order
----------+---------------------+--------------+----------------+------------------------+------------------------
 1        | 2024-11-01 10:30:00 |        60.33 | USD            |                        | Gold
 2        | 2024-11-02 14:15:00 |        24.49 | USD            |                        | Silver
 3        | 2024-11-03 09:20:00 |        71.39 | USD            | Michael Brown          | Bronze
 4        | 2024-11-04 16:45:00 |        44.03 | USD            | Sarah Davis            | Gold
 5        | 2024-11-05 11:30:00 |        19.52 | USD            |                        | Silver
 6        | 2024-11-06 13:20:00 |        92.38 | USD            |                        | Platinum
 7        | 2024-11-07 10:00:00 |        57.07 | USD            | Robert Anderson        |
 8        | 2024-11-08 15:30:00 |        72.50 | USD            |                        | Bronze
 9        | 2024-11-09 09:45:00 |        38.60 | USD            | James Thomas           | Silver
 10       | 2024-11-10 14:20:00 |        65.77 | USD            | Maria Garcia           | Bronze
 11       | 2024-11-15 13:30:00 |        48.38 | USD            | John Smith             | Gold
 12       | 2024-11-20 11:00:00 |        35.88 | USD            | Emily Johnson          | Silver
 13       | 2024-11-25 16:45:00 |        56.53 | USD            | Jennifer Martinez      | Platinum
(13 rows)

This is the point-in-time join doing its job. Two patterns to notice:

  • Tier was always known. Loyalty memberships were ingested with their join_date as the effective timestamp (most customers joined in early 2023), so every order on the November dates above falls inside a tier-bearing version of the customer history. Robert Anderson (order 7) is the exception: he never enrolled in the loyalty program, so his customer_tier_at_order is empty.
  • Name was sometimes not yet known. customer_name arrives from stage.customers, whose updated_at for several customers is later than their first order in November. For orders 1, 2, 5, 6, and 8 the point-in-time lookup correctly returns NULL. The same customers' second orders (11 for John, 12 for Emily, 13 for Jennifer) fall after their CRM updated_at and the name is populated. The fact row still records each order accurately; the blanks are not bugs in the join, they are the true historical state.

If you want to fall back to the customer's earliest known name when none was effective at order time, you can COALESCE against dim_customer_current.customer_name. That choice is a business decision, not a technical one.


Business Query: Revenue by Loyalty Tier at Order Time

With the fact table in place, standard analytical queries work as expected:

SELECT
  COALESCE(customer_tier_at_order, 'No Membership') AS tier,
  COUNT(*) AS order_count,
  SUM(order_amount) AS total_revenue,
  ROUND(AVG(order_amount), 2) AS avg_order_value
FROM datamart.fact_orders
GROUP BY customer_tier_at_order
ORDER BY total_revenue DESC;
     tier      | order_count | total_revenue | avg_order_value
---------------+-------------+---------------+-----------------
 Bronze        |           3 |        209.66 |           69.89
 Gold          |           3 |        152.74 |           50.91
 Platinum      |           2 |        148.91 |           74.46
 Silver        |           4 |        118.49 |           29.62
 No Membership |           1 |         57.07 |           57.07
(5 rows)

This query answers a question that requires both the order data and the customer's state at the time of the order. Without the Type 2 dimension and point-in-time join, the answer would reflect current tier assignments rather than the tier that was in effect when each order was placed.


Why Point-in-Time Joins Matter

Consider a customer who was Bronze when they placed an order in November, but has since been upgraded to Gold. A join to a Type 1 dimension (current state) would incorrectly attribute the order to the Gold tier. The Type 2 dimension with BETWEEN valid_from AND valid_to preserves the correct historical context.

This is the canonical pattern for all temporal joins against Daana's output:

-- Correct: captures state at event time
JOIN datamart.dim_customer c
  ON o.customer_key = c.customer_key
  AND o.order_purchase_ts BETWEEN c.valid_from AND c.valid_to

-- Incorrect: captures current state regardless of when the event occurred
JOIN daana_dw.view_customer c
  ON o.customer_key = c.customer_key

Summary

LayerSourcePattern
Type 1 Dimensionview_<entity>Wrap or select for BI consumption
Type 2 Dimensionview_<entity>_histAdd valid_from/valid_to with LEAD()
Transaction Factview_<event> + relationship view + Type 2 dimsPoint-in-time join with BETWEEN

The relationship view is whichever side exposes the foreign key. For an ORDER -> CUSTOMER relationship, the join key (customer_key) lives on view_customer_with_rel, not view_order_with_rel. Always check \d on a view before assuming it carries a related key.

Daana generates and maintains Layer 2 (business entities with full history). Layer 3 (dimensions and facts) is standard SQL built from Daana's views. The datamart views provide a clean interface for BI tools and analysts.

The datamart.* objects in this chapter are plain SQL views, so each query against them re-evaluates the underlying joins and window functions in Layer 2. That is fine for the tutorial dataset and for low-volume reporting, but at scale you'll want to materialize Layer 3 (PostgreSQL MATERIALIZED VIEW with a refresh schedule, or a downstream tool such as dbt that compiles these definitions into managed tables). Daana's output views remain the source of truth either way.

See Understanding Daana Output for additional consumption patterns including factless facts and periodic snapshot facts.

Congratulations! You've completed the Daana CLI Tutorial Series.

Ready for more? Check out the Command Reference or explore the DMDL documentation.