Guides
Understanding Daana's Output

Daana CLI Output Artifacts and Usage

Overview

When you run daana-cli execute, Daana generates a complete ensemble model implementation in your data warehouse. This documentation explains:

  1. What you'll query - The views Daana creates
  2. How to build your consumption layer - Patterns for dimensions and facts
  3. Physical implementation - What's under the hood (optional deep dive)

Understanding the Three-Layer Architecture

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Raw Ingestion (DAS - Data As System sees it)       │
│ Your source systems, staging tables, raw data               │
│ This layer can be thought of as a bronze- or raw layer      │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Business Entities (DAB - Data As Business sees it) │ ← DAANA CREATES THIS LAYER
│ Conformed, historized entities in ensemble model pattern    │
│ This layer can be thought of as a silver-, model- or        │
│ intermediate layer                                          │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Analytics Consumption (DAR - Data As Requirements) │ ← YOU BUILD THIS FROM DAANA'S OUTPUT
│ Dimensions, facts, marts tailored to analytics needs        │
│ This layer can be thought of as a gold- or mart layer       │
└─────────────────────────────────────────────────────────────┘

Daana's role: Generate and maintain Layer 2. Provide clean, queryable views that serve as building blocks for Layer 3.

Your role: Use Daana's views to build consumption-layer dimensions and facts for your analytics requirements.


How to Use This Guide

This documentation is designed for selective reading. Choose your path based on your immediate needs:

Getting started (15-30 minutes):

  • Read Part 1 to understand the views Daana creates
  • Skim Part 2 pattern descriptions to see what's possible
  • Start building with Pattern 1 (Type 1 Dimensions)

Building your consumption layer (progressive approach):

  • Patterns 1-2: Start here for dimensions (Type 1 and Type 2)
  • Patterns 3-4: Read when you need fact tables
  • Pattern 5: Read when you need periodic snapshots
  • Each pattern is self-contained - read only what you need right now

Reference and deep dives:

  • Part 3 (Query Guidance): Quick reference for what to query vs. avoid
  • Part 4 (Physical Implementation): Optional - for troubleshooting or curiosity about what's under the hood
  • Focal Framework: Deep dive for architects and framework evaluators

Time-based reading paths:

  • 30 minutes: Part 1 + Pattern 1 → start building Type 1 dimensions
  • 2 hours: Part 1 + all of Part 2 → comprehensive understanding of consumption patterns
  • Deep dive: Add Part 4 + Focal Framework Guide → full architectural understanding

By role:

  • Data/Analytics Engineer building consumption layer: Part 1-2, reference Part 3 as needed
  • Troubleshooting/debugging: Part 1, Part 3, skim Part 4 for column explanations
  • Architect evaluating Daana: All parts + Focal Framework Guide
  • BI analyst exploring data: Part 1, Part 3

Part 1: What You'll Query - The Views

For each entity defined in your model.yaml (e.g., CUSTOMER), Daana creates three views in the DAANA_DW schema. These views transpose the physical key-value structure into user-friendly table schemas.

Note on examples: The attribute names used in examples throughout this document (FIRST_NAME, EMAIL, CUSTOMER_SEGMENT, CUSTOMER_STATUS_STARTED_DATE, etc.) are illustrative. The actual attributes in your views depend on what you define in your model. The patterns and techniques remain the same regardless of your specific attribute names.


VIEW_{ENTITY} - Current State (Type 1)

Purpose: Current state of each entity. One row per entity key, showing the most recent attribute values.

When to use:

  • Dashboards showing current state
  • Operational reports
  • "What is the customer's current email?"

Structure:

VIEW_CUSTOMER
├── CUSTOMER_KEY 
├── EFF_TMSTP          -- When this version became effective (business time)
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── CUSTOMER_SEGMENT
└──  ... (all attributes from your model)

Example query:

SELECT
  c.CUSTOMER_KEY,
  c.FIRST_NAME,
  c.LAST_NAME,
  c.EMAIL,
  c.CUSTOMER_SEGMENT
FROM DAANA_DW.VIEW_CUSTOMER c
WHERE c.CUSTOMER_SEGMENT = 'premium'

VIEW_{ENTITY}_HIST - Full History (Type 2)

Purpose: Complete change history for each entity. Multiple rows per entity key, one row per change.

When to use:

  • Historical trend analysis
  • Point-in-time reporting
  • "What was the customer's email on 2024-06-15?"
  • Auditing and compliance

Structure:

VIEW_CUSTOMER_HIST
├── CUSTOMER_KEY
├── EFF_TMSTP          -- When this version became effective (business time)
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── CUSTOMER_SEGMENT
└── ... (all attributes from your model)

Example query:

-- Find customer state at a specific point in time
SELECT
  c.CUSTOMER_KEY,
  c.FIRST_NAME,
  c.EMAIL,
  c.CUSTOMER_SEGMENT
FROM DAANA_DW.VIEW_CUSTOMER_HIST c
WHERE c.CUSTOMER_KEY = 'CUST_12345'
  AND c.EFF_TMSTP <= '2024-06-15 00:00:00'
  AND (c.VER_TMSTP > '2024-06-15 00:00:00' OR c.VER_TMSTP = TIMESTAMP('9999-12-31 23:59:59'))

To use as a proper Type 2 dimension with valid_from/valid_to columns, see Pattern 2 below.


VIEW_{ENTITY}_WITH_REL - Current State with Relationships

Purpose: Current state of entity with all relationships already joined.

When to use:

  • Simplified querying when you need entity + related entities
  • Exploratory analysis across related entities

Structure:

VIEW_CUSTOMER_WITH_REL
├── CUSTOMER_KEY
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── ... (all customer attributes)
├── ORDER_KEY          -- Related ORDER entities
├── ... (all order attributes)
├── ACCOUNT_KEY        -- Related ACCOUNT entities
├── ... (all account attributes)
└── ... (all relationship keys and attributes from your model related to customer)

Example query:

SELECT
  c.CUSTOMER_KEY,
  c.FIRST_NAME,
  c.ORDER_KEY,
  c.ACCOUNT_KEY
FROM DAANA_DW.VIEW_CUSTOMER_WITH_REL c
WHERE c.CUSTOMER_SEGMENT = 'premium'

Relationship/Bridge Tables: {ENTITY1}_{ENTITY2}_X

Purpose: Track many-to-many relationships between entities over time.

Naming convention: The _X suffix denotes a relationship/bridge table.

Structure:

ORDER_CUSTOMER_X
├── ORDER_KEY          -- Links to ORDER entity
├── CUSTOMER_KEY       -- Links to CUSTOMER entity
├── TYPE_KEY           -- Relationship type (if multiple types exist)
├── EFF_TMSTP          -- When this relationship became effective
├── VER_TMSTP          -- When this relationship was recorded
├── ROW_ST             -- Row status
└── POPLN_TMSTP        -- When this data was loaded

When to use: Bridge tables are used in advanced consumption patterns (see Pattern 3: Factless Facts).


Part 2: Building Your Consumption Layer

Why Layer 3 Exists

Different consumption needs require different structures. A BI dashboard optimized for speed needs different structure than a data science feature store or a CRM export.

Layer 2 (Daana) handles:

  • Data conformance (consistent business keys)
  • Historization (tracking changes over time)
  • Data quality (validation and testing)
  • Lineage (what came from where)

Layer 3 (you) optimizes for:

  • Specific use cases (BI, ML, exports)
  • Query performance (denormalization, aggregation)
  • User experience (familiar schemas, simple joins)

Be Pragmatic - The Heavy Lifting is Done

Daana has already done the hard governance work: your data is conformed, historized, well-defined, and quality-checked. Now be pragmatic.

Need a denormalized wide table for a dashboard? Build it. Need a different grain for analysis? Do it. Need a flat file export for a CRM system? Serve it.

The point of Layer 3 is to optimize for consumption without worrying about breaking the source of truth. Layer 2 remains clean and consistent while Layer 3 adapts to how your organization actually uses data.


Common Consumption Patterns

The following patterns show how to build Layer 3 structures from Daana's Layer 2 views:

  1. Type 1 Dimensions - Use VIEW_{ENTITY} directly
  2. Type 2 Dimensions - Add valid_from/valid_to to VIEW_{ENTITY}_HIST
  3. Factless Facts (Twine of Keys) - Weave relationship timelines together
  4. Transaction Facts - Combine entity views with measures
  5. Periodic Snapshot Facts - Daily/weekly/monthly entity state

Pattern 1: Type 1 Dimensions (Current State)

What it is: Dimension showing only current state. Updates overwrite previous values.

When to use:

  • Real-time dashboards
  • Operational reporting
  • Analyses that don't need historical context

How to build:

-- Option 1: Use Daana's view directly
CREATE VIEW datamart.dim_customer AS
SELECT
  CUSTOMER_KEY AS customer_id,
  FIRST_NAME AS first_name,
  LAST_NAME AS last_name,
  EMAIL AS email,
  CUSTOMER_SEGMENT AS customer_segment,
  EFF_TMSTP AS last_updated
FROM DAANA_DW.VIEW_CUSTOMER

That's it. Daana's VIEW_{ENTITY} views are already Type 1 dimensions.


Pattern 2: Type 2 Dimensions (Full History)

What it is: Dimension tracking complete change history with valid_from/valid_to date ranges and a current version flag.

When to use:

  • Historical trend analysis
  • Point-in-time reporting
  • Regulatory compliance and auditing
  • "What did the data look like on X date?"

The pattern: Add valid_from and valid_to columns to VIEW_{ENTITY}_HIST using window functions.

How to build:

CREATE OR REPLACE TABLE datamart.dim_customer AS (
  SELECT
    CUSTOMER_KEY AS customer_id,
 
    -- Add Type 2 SCD columns
    EFF_TMSTP AS valid_from,
    LEAD(TIMESTAMP_SUB(EFF_TMSTP, INTERVAL 1 MILLISECOND), 1, TIMESTAMP('9999-12-31 23:59:59'))
      OVER (PARTITION BY CUSTOMER_KEY ORDER BY EFF_TMSTP) AS valid_to,
 
    -- Your business attributes
    FIRST_NAME,
    LAST_NAME,
    EMAIL,
    CUSTOMER_SEGMENT,
    CUSTOMER_STATUS
 
  FROM DAANA_DW.VIEW_CUSTOMER_HIST
)

How it works:

  1. valid_from: Effective timestamp when this version became active
  2. valid_to: Look ahead to the next version's effective timestamp minus 1 millisecond (using LEAD)
  3. '9999-12-31 23:59:59': Default for current version = "valid until further notice"

This pattern ensures non-overlapping validity ranges. When the next version starts at timestamp T, the current version ends at T - 1 millisecond, preventing any ambiguity in point-in-time queries.

Querying for point-in-time state:

-- Get customer state on 2024-06-15
SELECT
  customer_id,
  customer_segment,
  customer_status
FROM datamart.dim_customer
WHERE customer_id = 'CUST_12345'
  AND '2024-06-15' BETWEEN valid_from AND valid_to

Future Daana versions will generate these Type 2 dimensions automatically.

Applying this pattern to other entities: The same LEAD window technique works for any entity. To create dim_product, dim_order, or any other Type 2 dimension, apply this pattern to the respective VIEW_{ENTITY}_HIST view.


Pattern 3: Factless Facts (Twine of Keys)

What it is: A fact table that captures relationships or events without numeric measures. Tracks which dimensions were related at which points in time.

When to use:

  • Tracking relationships that change over time
  • Event logging (who did what when)
  • Relationship history when an entity relates to multiple other entities that can change independently

When NOT to use:

  • If you only have one relationship per entity (e.g., ORDER always has exactly one CUSTOMER), a simple JOIN suffices

The challenge: An ORDER might relate to different CUSTOMER and PRODUCT over time (reassignments, corrections). Each relationship can change independently. How do you weave these timelines together?

The "twine of keys" pattern: Union all relationship tables, track each timeline separately, then pivot back to wide format.

How to build:

CREATE OR REPLACE TABLE datamart.fact_order_relationships AS (
 
-- Step 1: Union all relationships into tall format
WITH twine AS (
  -- ORDER → CUSTOMER
  SELECT
    ORDER_KEY,
    CUSTOMER_KEY,
    CAST(NULL AS STRING) AS PRODUCT_KEY,
    EFF_TMSTP,
    'ORDER_CUSTOMER' AS rel_type
  FROM DAANA_DW.ORDER_CUSTOMER_X
 
  UNION ALL
 
  -- ORDER → PRODUCT
  SELECT
    ORDER_KEY,
    CAST(NULL AS STRING) AS CUSTOMER_KEY,
    PRODUCT_KEY,
    EFF_TMSTP,
    'ORDER_PRODUCT' AS rel_type
  FROM DAANA_DW.ORDER_PRODUCT_X
),
 
-- Step 2: Track the most recent effective timestamp for each relationship type
in_effect AS (
  SELECT
    ORDER_KEY,
    EFF_TMSTP,
    rel_type,
    CUSTOMER_KEY,
    PRODUCT_KEY,
 
    -- Carry forward the last known effective timestamp for each relationship
    MAX(CASE WHEN rel_type = 'ORDER_CUSTOMER' THEN EFF_TMSTP END)
      OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
            RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_customer,
    MAX(CASE WHEN rel_type = 'ORDER_PRODUCT' THEN EFF_TMSTP END)
      OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
            RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_product
  FROM twine
  QUALIFY RANK() OVER (PARTITION BY ORDER_KEY, EFF_TMSTP ORDER BY EFF_TMSTP DESC) = 1
),
 
-- Step 3: Separate CTEs for each relationship type
cte_customer AS (
  SELECT ORDER_KEY, EFF_TMSTP AS eff_customer_ts, CUSTOMER_KEY
  FROM in_effect WHERE rel_type = 'ORDER_CUSTOMER'
),
cte_product AS (
  SELECT ORDER_KEY, EFF_TMSTP AS eff_product_ts, PRODUCT_KEY
  FROM in_effect WHERE rel_type = 'ORDER_PRODUCT'
)
 
-- Step 4: Join everything back together (pivot to wide format)
SELECT DISTINCT
  ie.ORDER_KEY,
  ie.EFF_TMSTP,
  cust.CUSTOMER_KEY,
  prod.PRODUCT_KEY
FROM in_effect ie
LEFT JOIN cte_customer cust
  ON ie.ORDER_KEY = cust.ORDER_KEY
  AND ie.eff_customer = cust.eff_customer_ts
LEFT JOIN cte_product prod
  ON ie.ORDER_KEY = prod.ORDER_KEY
  AND ie.eff_product = prod.eff_product_ts
)

How it works:

  1. Union relationships into tall format - Each relationship type becomes a row with sparse columns
  2. Track timelines separately - Window function carries forward last known EFF_TMSTP for each relationship
  3. Pivot to wide format - Join CTEs to reconstruct one row per ORDER_KEY + EFF_TMSTP with all dimension keys

Result grain: One row per ORDER_KEY per point in time when any relationship changed.

Why "twine of keys"? Like weaving threads together - each relationship has its own timeline, woven into one coherent history showing "what was in effect when."

Future Daana versions will generate these factless facts automatically.


Pattern 4: Transaction Facts

What it is: Fact table with one row per business event or transaction, containing dimension keys, numeric measures, and calculated metrics that span multiple entities.

When to use:

  • Order analysis requiring customer/product context at time of order
  • Metrics that combine attributes from multiple entities
  • "What was the customer's segment when they placed this order?"
  • "What was the product cost at time of purchase?" (for margin calculation)

The foundation: Transaction facts build on factless facts (Pattern 3). The factless fact gives you dimension keys at each point in time. Now join to Type 2 dimensions to get their attributes as they were at that moment.

How to build:

CREATE TABLE datamart.fact_orders AS
SELECT
  /* Grain: One row per order */
  ff.ORDER_KEY AS order_id,
  ff.EFF_TMSTP AS order_timestamp,
 
  /* Dimension Keys */
  ff.CUSTOMER_KEY AS customer_id,
  ff.PRODUCT_KEY AS product_id,
 
  /* Metrics from ORDER entity */
  o.ORDER_DATE,
  o.ORDER_QUANTITY,
  o.ORDER_AMOUNT,
 
  /* Calculated measures spanning entities */
  -- Get product cost at point in time for margin calculation
  o.ORDER_AMOUNT - (o.ORDER_QUANTITY * p.PRODUCT_COST) AS order_margin,
 
  -- Calculate loyalty points based on customer segment at time of order
  CASE
    WHEN c.CUSTOMER_SEGMENT = 'premium' THEN o.ORDER_AMOUNT * 0.1
    ELSE 0
  END AS loyalty_points_earned
 
FROM datamart.fact_order_relationships ff  -- Start with factless fact!
 
-- Get ORDER attributes (these don't change over time for a given order)
JOIN DAANA_DW.VIEW_ORDER o
  ON ff.ORDER_KEY = o.ORDER_KEY
 
-- Point-in-time join to CUSTOMER Type 2 dimension
-- Join to get customer state at time of order for calculations
JOIN datamart.dim_customer c
  ON ff.CUSTOMER_KEY = c.customer_id
  AND ff.EFF_TMSTP BETWEEN c.valid_from AND c.valid_to
 
-- Point-in-time join to PRODUCT Type 2 dimension
-- Join to get product cost at time of order for margin calculation
JOIN datamart.dim_product p
  ON ff.PRODUCT_KEY = p.product_id
  AND ff.EFF_TMSTP BETWEEN p.valid_from AND p.valid_to

This fact table follows Kimball methodology by storing only keys, timestamps, and numeric measures. Dimensional attributes (like customer segment, product category) are accessed through joins to dimension tables rather than denormalized into the fact table. This keeps the fact table lean and flexible.

⚠️

Note on BI tool compatibility: Some BI tools struggle with joins that use both a key and a timestamp range (BETWEEN valid_from AND valid_to). If you encounter performance or compatibility issues, consider creating a surrogate version key in your dimension tables (e.g., customer_version_key as a concatenation of customer_key and valid_from timestamp) to simplify joins to a single equality condition.

How it works:

  1. Start with factless fact - Gives you ORDER_KEY + EFF_TMSTP + all related dimension keys
  2. Join to ORDER view - Get order-level measures (quantity, amount)
  3. Point-in-time join to Type 2 dimensions - Get customer/product attributes as they were at EFF_TMSTP
  4. Calculate cross-entity metrics - Order margin, customer lifetime value, segment-based calculations

Why point-in-time joins matter:

-- Customer segment might have changed since order was placed
Customer 'CUST_123' timeline:
2024-01-01 to 2024-06-30: segment = 'standard'
2024-07-01 to 9999-12-31: segment = 'premium'
 
Order placed on 2024-05-15:
- Point-in-time join: customer_segment_at_order = 'standard'
- Current state join: customer_segment_at_order = 'premium' ✗ (wrong!)

The canonical Daana temporal join pattern: When joining to Type 2 dimensions, always use timestamp BETWEEN valid_from AND valid_to. This pattern appears throughout Daana consumption layers and ensures you get the dimension state as it was at the relevant point in time.

Best practices:

  • Build factless fact first (Pattern 3), then add measures
  • Use point-in-time joins (BETWEEN valid_from AND valid_to) for Type 2 dimensions
  • Keep measures atomic when possible (store quantity and price separately)
  • Calculate derived metrics (margin, points) in the fact table for query performance

Pattern 5: Periodic Snapshot Facts

What it is: Fact table with one row per entity per time period (daily, weekly, monthly), capturing state and metrics at regular intervals.

When to use:

  • Daily customer status snapshots
  • Weekly inventory levels
  • Monthly account balances
  • "What were ALL customers' states on 2024-06-15?"

The pattern: Expand temporal ranges into individual time period rows using date arrays, then join aggregated metrics.

How to build:

CREATE OR REPLACE TABLE datamart.fact_customer_daily_snapshot AS (
 
-- Step 1: Get customer status ranges from historized view
WITH customer_status_periods AS (
  SELECT
    c.CUSTOMER_KEY AS customer_id,
    c.CUSTOMER_SEGMENT,
    c.CUSTOMER_STATUS,
    c.CUSTOMER_STATUS_STARTED_DATE,
    CASE
      -- If status is still current, use today as end
      WHEN c.CUSTOMER_STATUS_ENDED_DATE = DATE('9999-12-31')
        THEN CURRENT_DATE()
      ELSE c.CUSTOMER_STATUS_ENDED_DATE
    END AS customer_status_ended_date,
    c.CUSTOMER_REGISTRATION_DATE
  FROM DAANA_DW.VIEW_CUSTOMER_HIST c
),
 
-- Step 2: Aggregate event-level metrics by date
customer_orders_daily AS (
  SELECT
    DATE(o.ORDER_PURCHASE_TS) AS snapshot_date,
    o.CUSTOMER_KEY AS customer_id,
    COUNT(DISTINCT o.ORDER_KEY) AS number_of_orders,
    SUM(o.ORDER_AMOUNT) AS total_order_amount
  FROM DAANA_DW.VIEW_ORDER o
  GROUP BY DATE(o.ORDER_PURCHASE_TS), o.CUSTOMER_KEY
),
 
-- Step 3: Expand status ranges into daily rows
customer_status_expanded AS (
  SELECT
    snapshot_date,
    customer_id,
    CUSTOMER_SEGMENT,
    CUSTOMER_STATUS,
    DATE_DIFF(snapshot_date, CUSTOMER_STATUS_STARTED_DATE, DAY)
      AS number_of_consecutive_status_days,
    CASE
      WHEN CUSTOMER_REGISTRATION_DATE = snapshot_date THEN 1
      ELSE 0
    END AS customer_registration_flag,
    CASE
      WHEN RANK() OVER (PARTITION BY customer_id, CUSTOMER_STATUS
                        ORDER BY snapshot_date) = 1
      THEN 1
      ELSE 0
    END AS status_change_flag
  FROM customer_status_periods
 
  -- Key technique: Explode date range into individual rows
  CROSS JOIN UNNEST(
    GENERATE_DATE_ARRAY(
      CUSTOMER_STATUS_STARTED_DATE,
      customer_status_ended_date
    )
  ) AS snapshot_date
)
 
-- Step 4: Combine dimensional state with aggregated metrics
SELECT
  /* Primary Key */
  cse.snapshot_date,
  cse.customer_id,
 
  /* Dimensional Attributes */
  cse.CUSTOMER_SEGMENT,
  cse.CUSTOMER_STATUS,
  cse.number_of_consecutive_status_days,
 
  /* Event Flags */
  cse.customer_registration_flag,
  cse.status_change_flag,
 
  /* Metrics */
  COALESCE(ord.number_of_orders, 0) AS number_of_orders,
  COALESCE(ord.total_order_amount, 0) AS total_order_amount
 
FROM customer_status_expanded cse
LEFT JOIN customer_orders_daily ord
  ON cse.customer_id = ord.customer_id
  AND cse.snapshot_date = ord.snapshot_date
)

How it works:

  1. Get temporal ranges from VIEW_CUSTOMER_HIST (status periods)
  2. Aggregate event metrics by date (orders per day)
  3. Expand ranges into daily rows using GENERATE_DATE_ARRAY + CROSS JOIN UNNEST
  4. Join dimensional state with metrics to create final snapshot

Result grain: One row per customer per day with segment, status, flags, and order metrics.

Query pattern:

-- All premium customers on a specific date with their orders
SELECT
  customer_id,
  customer_segment,
  customer_status,
  number_of_orders,
  total_order_amount
FROM datamart.fact_customer_daily_snapshot
WHERE snapshot_date = '2024-06-15'
  AND customer_segment = 'premium'

When to use:

  • Regular reporting cadence (daily, weekly, monthly)
  • Trend analysis over time
  • State-based aggregations ("how many active users per day")

Part 3: Query Guidance

Quick Reference

ObjectQuery?Purpose
VIEW_{ENTITY}✅ YesCurrent state - one row per entity
VIEW_{ENTITY}_HIST✅ YesFull history - point-in-time queries
VIEW_{ENTITY}_WITH_REL✅ YesCurrent state with relationships pre-joined
Your consumption layer✅ YesDimensions/facts you build from above
{ENTITY}_FOCAL❌ AvoidInternal - one row per surrogate key
{ENTITY}_IDFR❌ AvoidInternal - business ID to surrogate mapping
{ENTITY}_DESC❌ AvoidInternal - key-value attribute storage
{ENTITY1}_{ENTITY2}_X❌ AvoidInternal - use VIEW_WITH_REL or build factless facts

Why Views Over Physical Tables?

Daana's views provide:

  1. Transposition - Key-value pairs become familiar table columns
  2. Abstraction - Physical implementation can change without breaking queries
  3. Performance - Views are optimized for common query patterns
  4. Simplicity - No need to understand TYPE_KEY mapping or join logic

The principle: You shouldn't need to learn ensemble modeling to use Daana's output. The views hide that complexity.


Part 4: Physical Implementation (Optional Deep Dive)

You don't need to understand ensemble modeling to use Daana's views. But if you're curious about what's under the hood, here's how it works.

For each entity, Daana creates up to four physical tables following the Focal Framework pattern. These tables store data in a generic, flexible structure that the views transpose into user-friendly schemas.

For comprehensive understanding of the Focal Framework and ensemble modeling principles, see the Focal Framework →


Physical Tables Created

1. FOCAL Table: {ENTITY}_FOCAL

Purpose: Maintains exactly one row per entity instance, identified by the surrogate key. While the IDFR table may contain multiple rows for the same entity (one per business identifier), the FOCAL table consolidates these into a single authoritative row.

When you'll see it: Rarely queried directly - the views handle the complexity.

Structure:

CUSTOMER_FOCAL
├── CUSTOMER_KEY       -- Primary surrogate key (unique per entity instance)
├── INST_KEY           -- Batch identifier (which pipeline loaded this row)
├── INST_ROW_KEY       -- Workflow identifier (which workflow created the metadata)
└── POPLN_TMSTP        -- Population timestamp (when row was loaded)

2. IDFR Table: {ENTITY}_IDFR (Conditional)

Created when: Entity has multi_idfr type in model definition.

Purpose: Manages key integration by mapping multiple business identifiers to a single stable surrogate key. Real entities often have multiple IDs (internal ID, SSN, loyalty number, partner ID) - this table maps all of them to one CUSTOMER_KEY.

Why it matters: Enables the system to recognize the same entity regardless of which identifier is used in source systems.

Structure:

CUSTOMER_IDFR
├── CUSTOMER_IDFR      -- Business identifier value (email, SSN, etc.)
├── EFF_TMSTP          -- Effective timestamp (when became valid)
├── VER_TMSTP          -- Version timestamp (for bi-temporal support)
├── ROW_ST             -- Row status (Y=Active, N=Inactive)
├── CUSTOMER_KEY       -- Links to FOCAL table
├── INST_KEY           -- Batch identifier
├── INST_ROW_KEY       -- Workflow identifier
└── POPLN_TMSTP        -- Population timestamp

3. DESC Table: {ENTITY}_DESC

Purpose: Stores all descriptive attributes in a flexible, typed structure using generic value columns. Instead of creating fixed columns for each attribute, it uses TYPE_KEY to identify what data is stored in each row.

Why this design: Enables non-destructive development. Adding new attributes requires only new TYPE_KEY definitions and new rows - no schema changes, no ALTER TABLE statements.

How views use it: Views join on TYPE_KEY to transpose these key-value rows into familiar table columns.

Structure:

CUSTOMER_DESC
├── CUSTOMER_KEY       -- Links to FOCAL table
├── TYPE_KEY           -- Semantic key identifying which attribute this row stores
├── SEQ_NBR            -- Sequence number (not used in Daana implementation)
├── EFF_TMSTP          -- Effective timestamp
├── VER_TMSTP          -- Version timestamp
├── ROW_ST             -- Row status (Y=Active, N=Inactive)
├── STA_TMSTP          -- Mappable column for timestamp data (start/from)
├── END_TMSTP          -- Mappable column for timestamp data (end/to)
├── VAL_STR            -- Mappable column for string data
├── VAL_NUM            -- Mappable column for numeric data
├── UOM                -- Unit of measure (classifies VAL_STR or VAL_NUM)
├── UOM_TYPE           -- Unit of measure type
├── DATA_KEY           -- Data key column (not used in Daana implementation)
├── INST_KEY           -- Batch identifier
├── INST_ROW_KEY       -- Workflow identifier
└── POPLN_TMSTP        -- Population timestamp

Example: A single CUSTOMER_KEY might have multiple rows in DESC:

  • One row with TYPE_KEY=101 storing FIRST_NAME in VAL_STR
  • Another row with TYPE_KEY=102 storing EMAIL in VAL_STR
  • Another row with TYPE_KEY=103 storing REGISTRATION_DATE in STA_TMSTP

The view joins these rows and pivots them into columns: FIRST_NAME, EMAIL, REGISTRATION_DATE.


4. Relationship Table: {ENTITY1}_{ENTITY2}_X

Purpose: Captures temporal many-to-many relationships between entities. A single relationship table can store multiple relationship types using TYPE_KEY.

Example: ORDER_CUSTOMER_X might contain different relationship types:

  • PLACED_BY (customer who placed the order)
  • SHIPPED_TO (customer receiving shipment)
  • RETURNED_BY (customer who returned the order)

Structure:

ORDER_CUSTOMER_X
├── ORDER_KEY          -- Surrogate key for driving entity
├── CUSTOMER_KEY       -- Surrogate key for related entity
├── TYPE_KEY           -- Identifies relationship type
├── EFF_TMSTP          -- Effective timestamp
├── VER_TMSTP          -- Version timestamp
├── ROW_ST             -- Row status
├── INST_KEY           -- Batch identifier
├── INST_ROW_KEY       -- Workflow identifier
└── POPLN_TMSTP        -- Population timestamp

Understanding Key Columns

TYPE_KEY - The Semantic Bridge

What it is: A foreign key to the metadata model that identifies which business attribute (Atomic Context) is stored in this row. It's the bridge connecting physical data to logical meaning.

Why it matters: This enables views to transpose generic physical rows into meaningful business attributes. When you query VIEW_CUSTOMER, the view joins on TYPE_KEY to translate data into familiar column format.

In practice: You'll never query on TYPE_KEY directly - the views handle this for you. But understanding it helps when debugging or exploring the physical layer.


Three Timestamp Types

Daana tracks three types of timestamps for different purposes:

EFF_TMSTP (Effective Timestamp): Business time - when this version became valid in the real world. Used for point-in-time queries ("What was the customer's segment on 2024-06-15?").

VER_TMSTP (Version Timestamp): System time - when this version was created in the warehouse. Supports bi-temporal tracking (multiple versions at the same effective time). Used for audit trails and understanding when your knowledge of the data changed.

POPLN_TMSTP (Population Timestamp): Load time - when this row was physically inserted into the table. Used for operational monitoring and pipeline debugging.

Why three timestamps matter: They enable you to answer different questions:

  • "What was true at this business date?" → EFF_TMSTP
  • "What did we know at this system date?" → VER_TMSTP
  • "When did we load this data?" → POPLN_TMSTP

Audit and Metadata Columns

INST_KEY (Instance Key): Batch identifier indicating which pipeline execution loaded this row. Join to DAANA_METADATA.PROCINST_DESC to see the exact SQL that loaded this data - valuable for debugging.

INST_ROW_KEY (Instance Row Key): Workflow identifier indicating which workflow created the metadata for this row.

ROW_ST (Row Status): Active/Inactive indicator (Y=Active, N=Inactive). Combined with temporal columns, determines if the row is currently valid.

SEQ_NBR (Sequence Number): Sequence identifier (not used in Daana implementation).

DATA_KEY: Data key column (not used in Daana implementation).

UOM (Unit of Measure): Classifies what kind of data resides in VAL_STR or VAL_NUM (e.g., "USD", "EUR", "kg", "meters").


Operational Lineage

Every physical table includes INST_KEY, enabling you to trace which pipeline loaded any specific row:

-- Find the SQL that loaded a specific customer row
SELECT DISTINCT pd.VAL_STR
FROM DAANA_METADATA.PROCINST_DESC pd
INNER JOIN DAANA_DW.CUSTOMER_DESC cd
  ON cd.INST_KEY = pd.PROCINST_KEY
WHERE cd.CUSTOMER_KEY = 'CUST_12345'

This operational lineage is invaluable for debugging and understanding data provenance.


Next Steps

  1. Explore the views: After running daana-cli execute, query SELECT * FROM DAANA_DW.VIEW_{ENTITY} to see your entities
  2. Build Type 1 dimensions: Use VIEW_{ENTITY} directly or with light transformation
  3. Build Type 2 dimensions: Apply the LEAD window pattern to VIEW_{ENTITY}_HIST
  4. Build facts: Start with transaction facts from entity views + measures
  5. Advanced patterns: Factless facts and periodic snapshots when needed

Related Documentation