Daana CLI Output Artifacts and Usage
Overview
When you run daana-cli execute, Daana generates a complete ensemble model implementation in your data warehouse. This documentation explains:
- What you'll query - The views Daana creates
- How to build your consumption layer - Patterns for dimensions and facts
- Physical implementation - What's under the hood (optional deep dive)
Understanding the Three-Layer Architecture
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Raw Ingestion (DAS - Data As System sees it) │
│ Your source systems, staging tables, raw data │
│ This layer can be thought of as a bronze- or raw layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Business Entities (DAB - Data As Business sees it) │ ← DAANA CREATES THIS LAYER
│ Conformed, historized entities in ensemble model pattern │
│ This layer can be thought of as a silver-, model- or │
│ intermediate layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Analytics Consumption (DAR - Data As Requirements) │ ← YOU BUILD THIS FROM DAANA'S OUTPUT
│ Dimensions, facts, marts tailored to analytics needs │
│ This layer can be thought of as a gold- or mart layer │
└─────────────────────────────────────────────────────────────┘Daana's role: Generate and maintain Layer 2. Provide clean, queryable views that serve as building blocks for Layer 3.
Your role: Use Daana's views to build consumption-layer dimensions and facts for your analytics requirements.
How to Use This Guide
This documentation is designed for selective reading. Choose your path based on your immediate needs:
Getting started (15-30 minutes):
- Read Part 1 to understand the views Daana creates
- Skim Part 2 pattern descriptions to see what's possible
- Start building with Pattern 1 (Type 1 Dimensions)
Building your consumption layer (progressive approach):
- Patterns 1-2: Start here for dimensions (Type 1 and Type 2)
- Patterns 3-4: Read when you need fact tables
- Pattern 5: Read when you need periodic snapshots
- Each pattern is self-contained - read only what you need right now
Reference and deep dives:
- Part 3 (Query Guidance): Quick reference for what to query vs. avoid
- Part 4 (Physical Implementation): Optional - for troubleshooting or curiosity about what's under the hood
- Focal Framework: Deep dive for architects and framework evaluators
Time-based reading paths:
- 30 minutes: Part 1 + Pattern 1 → start building Type 1 dimensions
- 2 hours: Part 1 + all of Part 2 → comprehensive understanding of consumption patterns
- Deep dive: Add Part 4 + Focal Framework Guide → full architectural understanding
By role:
- Data/Analytics Engineer building consumption layer: Part 1-2, reference Part 3 as needed
- Troubleshooting/debugging: Part 1, Part 3, skim Part 4 for column explanations
- Architect evaluating Daana: All parts + Focal Framework Guide
- BI analyst exploring data: Part 1, Part 3
Part 1: What You'll Query - The Views
For each entity defined in your model.yaml (e.g., CUSTOMER), Daana creates three views in the DAANA_DW schema. These views transpose the physical key-value structure into user-friendly table schemas.
Note on examples: The attribute names used in examples throughout this document (FIRST_NAME, EMAIL, CUSTOMER_SEGMENT, CUSTOMER_STATUS_STARTED_DATE, etc.) are illustrative. The actual attributes in your views depend on what you define in your model. The patterns and techniques remain the same regardless of your specific attribute names.
VIEW_{ENTITY} - Current State (Type 1)
Purpose: Current state of each entity. One row per entity key, showing the most recent attribute values.
When to use:
- Dashboards showing current state
- Operational reports
- "What is the customer's current email?"
Structure:
VIEW_CUSTOMER
├── CUSTOMER_KEY
├── EFF_TMSTP -- When this version became effective (business time)
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── CUSTOMER_SEGMENT
└── ... (all attributes from your model)Example query:
SELECT
c.CUSTOMER_KEY,
c.FIRST_NAME,
c.LAST_NAME,
c.EMAIL,
c.CUSTOMER_SEGMENT
FROM DAANA_DW.VIEW_CUSTOMER c
WHERE c.CUSTOMER_SEGMENT = 'premium'VIEW_{ENTITY}_HIST - Full History (Type 2)
Purpose: Complete change history for each entity. Multiple rows per entity key, one row per change.
When to use:
- Historical trend analysis
- Point-in-time reporting
- "What was the customer's email on 2024-06-15?"
- Auditing and compliance
Structure:
VIEW_CUSTOMER_HIST
├── CUSTOMER_KEY
├── EFF_TMSTP -- When this version became effective (business time)
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── CUSTOMER_SEGMENT
└── ... (all attributes from your model)Example query:
-- Find customer state at a specific point in time
SELECT
c.CUSTOMER_KEY,
c.FIRST_NAME,
c.EMAIL,
c.CUSTOMER_SEGMENT
FROM DAANA_DW.VIEW_CUSTOMER_HIST c
WHERE c.CUSTOMER_KEY = 'CUST_12345'
AND c.EFF_TMSTP <= '2024-06-15 00:00:00'
AND (c.VER_TMSTP > '2024-06-15 00:00:00' OR c.VER_TMSTP = TIMESTAMP('9999-12-31 23:59:59'))To use as a proper Type 2 dimension with valid_from/valid_to columns, see Pattern 2 below.
VIEW_{ENTITY}_WITH_REL - Current State with Relationships
Purpose: Current state of entity with all relationships already joined.
When to use:
- Simplified querying when you need entity + related entities
- Exploratory analysis across related entities
Structure:
VIEW_CUSTOMER_WITH_REL
├── CUSTOMER_KEY
├── FIRST_NAME
├── LAST_NAME
├── EMAIL
├── ... (all customer attributes)
├── ORDER_KEY -- Related ORDER entities
├── ... (all order attributes)
├── ACCOUNT_KEY -- Related ACCOUNT entities
├── ... (all account attributes)
└── ... (all relationship keys and attributes from your model related to customer)Example query:
SELECT
c.CUSTOMER_KEY,
c.FIRST_NAME,
c.ORDER_KEY,
c.ACCOUNT_KEY
FROM DAANA_DW.VIEW_CUSTOMER_WITH_REL c
WHERE c.CUSTOMER_SEGMENT = 'premium'Relationship/Bridge Tables: {ENTITY1}_{ENTITY2}_X
Purpose: Track many-to-many relationships between entities over time.
Naming convention: The _X suffix denotes a relationship/bridge table.
Structure:
ORDER_CUSTOMER_X
├── ORDER_KEY -- Links to ORDER entity
├── CUSTOMER_KEY -- Links to CUSTOMER entity
├── TYPE_KEY -- Relationship type (if multiple types exist)
├── EFF_TMSTP -- When this relationship became effective
├── VER_TMSTP -- When this relationship was recorded
├── ROW_ST -- Row status
└── POPLN_TMSTP -- When this data was loadedWhen to use: Bridge tables are used in advanced consumption patterns (see Pattern 3: Factless Facts).
Part 2: Building Your Consumption Layer
Why Layer 3 Exists
Different consumption needs require different structures. A BI dashboard optimized for speed needs different structure than a data science feature store or a CRM export.
Layer 2 (Daana) handles:
- Data conformance (consistent business keys)
- Historization (tracking changes over time)
- Data quality (validation and testing)
- Lineage (what came from where)
Layer 3 (you) optimizes for:
- Specific use cases (BI, ML, exports)
- Query performance (denormalization, aggregation)
- User experience (familiar schemas, simple joins)
Be Pragmatic - The Heavy Lifting is Done
Daana has already done the hard governance work: your data is conformed, historized, well-defined, and quality-checked. Now be pragmatic.
Need a denormalized wide table for a dashboard? Build it. Need a different grain for analysis? Do it. Need a flat file export for a CRM system? Serve it.
The point of Layer 3 is to optimize for consumption without worrying about breaking the source of truth. Layer 2 remains clean and consistent while Layer 3 adapts to how your organization actually uses data.
Common Consumption Patterns
The following patterns show how to build Layer 3 structures from Daana's Layer 2 views:
- Type 1 Dimensions - Use
VIEW_{ENTITY}directly - Type 2 Dimensions - Add
valid_from/valid_totoVIEW_{ENTITY}_HIST - Factless Facts (Twine of Keys) - Weave relationship timelines together
- Transaction Facts - Combine entity views with measures
- Periodic Snapshot Facts - Daily/weekly/monthly entity state
Pattern 1: Type 1 Dimensions (Current State)
What it is: Dimension showing only current state. Updates overwrite previous values.
When to use:
- Real-time dashboards
- Operational reporting
- Analyses that don't need historical context
How to build:
-- Option 1: Use Daana's view directly
CREATE VIEW datamart.dim_customer AS
SELECT
CUSTOMER_KEY AS customer_id,
FIRST_NAME AS first_name,
LAST_NAME AS last_name,
EMAIL AS email,
CUSTOMER_SEGMENT AS customer_segment,
EFF_TMSTP AS last_updated
FROM DAANA_DW.VIEW_CUSTOMERThat's it. Daana's VIEW_{ENTITY} views are already Type 1 dimensions.
Pattern 2: Type 2 Dimensions (Full History)
What it is: Dimension tracking complete change history with valid_from/valid_to date ranges and a current version flag.
When to use:
- Historical trend analysis
- Point-in-time reporting
- Regulatory compliance and auditing
- "What did the data look like on X date?"
The pattern: Add valid_from and valid_to columns to VIEW_{ENTITY}_HIST using window functions.
How to build:
CREATE OR REPLACE TABLE datamart.dim_customer AS (
SELECT
CUSTOMER_KEY AS customer_id,
-- Add Type 2 SCD columns
EFF_TMSTP AS valid_from,
LEAD(TIMESTAMP_SUB(EFF_TMSTP, INTERVAL 1 MILLISECOND), 1, TIMESTAMP('9999-12-31 23:59:59'))
OVER (PARTITION BY CUSTOMER_KEY ORDER BY EFF_TMSTP) AS valid_to,
-- Your business attributes
FIRST_NAME,
LAST_NAME,
EMAIL,
CUSTOMER_SEGMENT,
CUSTOMER_STATUS
FROM DAANA_DW.VIEW_CUSTOMER_HIST
)How it works:
- valid_from: Effective timestamp when this version became active
- valid_to: Look ahead to the next version's effective timestamp minus 1 millisecond (using LEAD)
- '9999-12-31 23:59:59': Default for current version = "valid until further notice"
This pattern ensures non-overlapping validity ranges. When the next version starts at timestamp T, the current version ends at T - 1 millisecond, preventing any ambiguity in point-in-time queries.
Querying for point-in-time state:
-- Get customer state on 2024-06-15
SELECT
customer_id,
customer_segment,
customer_status
FROM datamart.dim_customer
WHERE customer_id = 'CUST_12345'
AND '2024-06-15' BETWEEN valid_from AND valid_toFuture Daana versions will generate these Type 2 dimensions automatically.
Applying this pattern to other entities: The same LEAD window technique works for any entity. To create dim_product, dim_order, or any other Type 2 dimension, apply this pattern to the respective VIEW_{ENTITY}_HIST view.
Pattern 3: Factless Facts (Twine of Keys)
What it is: A fact table that captures relationships or events without numeric measures. Tracks which dimensions were related at which points in time.
When to use:
- Tracking relationships that change over time
- Event logging (who did what when)
- Relationship history when an entity relates to multiple other entities that can change independently
When NOT to use:
- If you only have one relationship per entity (e.g., ORDER always has exactly one CUSTOMER), a simple JOIN suffices
The challenge: An ORDER might relate to different CUSTOMER and PRODUCT over time (reassignments, corrections). Each relationship can change independently. How do you weave these timelines together?
The "twine of keys" pattern: Union all relationship tables, track each timeline separately, then pivot back to wide format.
How to build:
CREATE OR REPLACE TABLE datamart.fact_order_relationships AS (
-- Step 1: Union all relationships into tall format
WITH twine AS (
-- ORDER → CUSTOMER
SELECT
ORDER_KEY,
CUSTOMER_KEY,
CAST(NULL AS STRING) AS PRODUCT_KEY,
EFF_TMSTP,
'ORDER_CUSTOMER' AS rel_type
FROM DAANA_DW.ORDER_CUSTOMER_X
UNION ALL
-- ORDER → PRODUCT
SELECT
ORDER_KEY,
CAST(NULL AS STRING) AS CUSTOMER_KEY,
PRODUCT_KEY,
EFF_TMSTP,
'ORDER_PRODUCT' AS rel_type
FROM DAANA_DW.ORDER_PRODUCT_X
),
-- Step 2: Track the most recent effective timestamp for each relationship type
in_effect AS (
SELECT
ORDER_KEY,
EFF_TMSTP,
rel_type,
CUSTOMER_KEY,
PRODUCT_KEY,
-- Carry forward the last known effective timestamp for each relationship
MAX(CASE WHEN rel_type = 'ORDER_CUSTOMER' THEN EFF_TMSTP END)
OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_customer,
MAX(CASE WHEN rel_type = 'ORDER_PRODUCT' THEN EFF_TMSTP END)
OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_product
FROM twine
QUALIFY RANK() OVER (PARTITION BY ORDER_KEY, EFF_TMSTP ORDER BY EFF_TMSTP DESC) = 1
),
-- Step 3: Separate CTEs for each relationship type
cte_customer AS (
SELECT ORDER_KEY, EFF_TMSTP AS eff_customer_ts, CUSTOMER_KEY
FROM in_effect WHERE rel_type = 'ORDER_CUSTOMER'
),
cte_product AS (
SELECT ORDER_KEY, EFF_TMSTP AS eff_product_ts, PRODUCT_KEY
FROM in_effect WHERE rel_type = 'ORDER_PRODUCT'
)
-- Step 4: Join everything back together (pivot to wide format)
SELECT DISTINCT
ie.ORDER_KEY,
ie.EFF_TMSTP,
cust.CUSTOMER_KEY,
prod.PRODUCT_KEY
FROM in_effect ie
LEFT JOIN cte_customer cust
ON ie.ORDER_KEY = cust.ORDER_KEY
AND ie.eff_customer = cust.eff_customer_ts
LEFT JOIN cte_product prod
ON ie.ORDER_KEY = prod.ORDER_KEY
AND ie.eff_product = prod.eff_product_ts
)How it works:
- Union relationships into tall format - Each relationship type becomes a row with sparse columns
- Track timelines separately - Window function carries forward last known EFF_TMSTP for each relationship
- Pivot to wide format - Join CTEs to reconstruct one row per ORDER_KEY + EFF_TMSTP with all dimension keys
Result grain: One row per ORDER_KEY per point in time when any relationship changed.
Why "twine of keys"? Like weaving threads together - each relationship has its own timeline, woven into one coherent history showing "what was in effect when."
Future Daana versions will generate these factless facts automatically.
Pattern 4: Transaction Facts
What it is: Fact table with one row per business event or transaction, containing dimension keys, numeric measures, and calculated metrics that span multiple entities.
When to use:
- Order analysis requiring customer/product context at time of order
- Metrics that combine attributes from multiple entities
- "What was the customer's segment when they placed this order?"
- "What was the product cost at time of purchase?" (for margin calculation)
The foundation: Transaction facts build on factless facts (Pattern 3). The factless fact gives you dimension keys at each point in time. Now join to Type 2 dimensions to get their attributes as they were at that moment.
How to build:
CREATE TABLE datamart.fact_orders AS
SELECT
/* Grain: One row per order */
ff.ORDER_KEY AS order_id,
ff.EFF_TMSTP AS order_timestamp,
/* Dimension Keys */
ff.CUSTOMER_KEY AS customer_id,
ff.PRODUCT_KEY AS product_id,
/* Metrics from ORDER entity */
o.ORDER_DATE,
o.ORDER_QUANTITY,
o.ORDER_AMOUNT,
/* Calculated measures spanning entities */
-- Get product cost at point in time for margin calculation
o.ORDER_AMOUNT - (o.ORDER_QUANTITY * p.PRODUCT_COST) AS order_margin,
-- Calculate loyalty points based on customer segment at time of order
CASE
WHEN c.CUSTOMER_SEGMENT = 'premium' THEN o.ORDER_AMOUNT * 0.1
ELSE 0
END AS loyalty_points_earned
FROM datamart.fact_order_relationships ff -- Start with factless fact!
-- Get ORDER attributes (these don't change over time for a given order)
JOIN DAANA_DW.VIEW_ORDER o
ON ff.ORDER_KEY = o.ORDER_KEY
-- Point-in-time join to CUSTOMER Type 2 dimension
-- Join to get customer state at time of order for calculations
JOIN datamart.dim_customer c
ON ff.CUSTOMER_KEY = c.customer_id
AND ff.EFF_TMSTP BETWEEN c.valid_from AND c.valid_to
-- Point-in-time join to PRODUCT Type 2 dimension
-- Join to get product cost at time of order for margin calculation
JOIN datamart.dim_product p
ON ff.PRODUCT_KEY = p.product_id
AND ff.EFF_TMSTP BETWEEN p.valid_from AND p.valid_toThis fact table follows Kimball methodology by storing only keys, timestamps, and numeric measures. Dimensional attributes (like customer segment, product category) are accessed through joins to dimension tables rather than denormalized into the fact table. This keeps the fact table lean and flexible.
Note on BI tool compatibility: Some BI tools struggle with joins that use both a key and a timestamp range (BETWEEN valid_from AND valid_to). If you encounter performance or compatibility issues, consider creating a surrogate version key in your dimension tables (e.g., customer_version_key as a concatenation of customer_key and valid_from timestamp) to simplify joins to a single equality condition.
How it works:
- Start with factless fact - Gives you ORDER_KEY + EFF_TMSTP + all related dimension keys
- Join to ORDER view - Get order-level measures (quantity, amount)
- Point-in-time join to Type 2 dimensions - Get customer/product attributes as they were at EFF_TMSTP
- Calculate cross-entity metrics - Order margin, customer lifetime value, segment-based calculations
Why point-in-time joins matter:
-- Customer segment might have changed since order was placed
Customer 'CUST_123' timeline:
2024-01-01 to 2024-06-30: segment = 'standard'
2024-07-01 to 9999-12-31: segment = 'premium'
Order placed on 2024-05-15:
- Point-in-time join: customer_segment_at_order = 'standard' ✓
- Current state join: customer_segment_at_order = 'premium' ✗ (wrong!)The canonical Daana temporal join pattern: When joining to Type 2 dimensions, always use timestamp BETWEEN valid_from AND valid_to. This pattern appears throughout Daana consumption layers and ensures you get the dimension state as it was at the relevant point in time.
Best practices:
- Build factless fact first (Pattern 3), then add measures
- Use point-in-time joins (
BETWEEN valid_from AND valid_to) for Type 2 dimensions - Keep measures atomic when possible (store quantity and price separately)
- Calculate derived metrics (margin, points) in the fact table for query performance
Pattern 5: Periodic Snapshot Facts
What it is: Fact table with one row per entity per time period (daily, weekly, monthly), capturing state and metrics at regular intervals.
When to use:
- Daily customer status snapshots
- Weekly inventory levels
- Monthly account balances
- "What were ALL customers' states on 2024-06-15?"
The pattern: Expand temporal ranges into individual time period rows using date arrays, then join aggregated metrics.
How to build:
CREATE OR REPLACE TABLE datamart.fact_customer_daily_snapshot AS (
-- Step 1: Get customer status ranges from historized view
WITH customer_status_periods AS (
SELECT
c.CUSTOMER_KEY AS customer_id,
c.CUSTOMER_SEGMENT,
c.CUSTOMER_STATUS,
c.CUSTOMER_STATUS_STARTED_DATE,
CASE
-- If status is still current, use today as end
WHEN c.CUSTOMER_STATUS_ENDED_DATE = DATE('9999-12-31')
THEN CURRENT_DATE()
ELSE c.CUSTOMER_STATUS_ENDED_DATE
END AS customer_status_ended_date,
c.CUSTOMER_REGISTRATION_DATE
FROM DAANA_DW.VIEW_CUSTOMER_HIST c
),
-- Step 2: Aggregate event-level metrics by date
customer_orders_daily AS (
SELECT
DATE(o.ORDER_PURCHASE_TS) AS snapshot_date,
o.CUSTOMER_KEY AS customer_id,
COUNT(DISTINCT o.ORDER_KEY) AS number_of_orders,
SUM(o.ORDER_AMOUNT) AS total_order_amount
FROM DAANA_DW.VIEW_ORDER o
GROUP BY DATE(o.ORDER_PURCHASE_TS), o.CUSTOMER_KEY
),
-- Step 3: Expand status ranges into daily rows
customer_status_expanded AS (
SELECT
snapshot_date,
customer_id,
CUSTOMER_SEGMENT,
CUSTOMER_STATUS,
DATE_DIFF(snapshot_date, CUSTOMER_STATUS_STARTED_DATE, DAY)
AS number_of_consecutive_status_days,
CASE
WHEN CUSTOMER_REGISTRATION_DATE = snapshot_date THEN 1
ELSE 0
END AS customer_registration_flag,
CASE
WHEN RANK() OVER (PARTITION BY customer_id, CUSTOMER_STATUS
ORDER BY snapshot_date) = 1
THEN 1
ELSE 0
END AS status_change_flag
FROM customer_status_periods
-- Key technique: Explode date range into individual rows
CROSS JOIN UNNEST(
GENERATE_DATE_ARRAY(
CUSTOMER_STATUS_STARTED_DATE,
customer_status_ended_date
)
) AS snapshot_date
)
-- Step 4: Combine dimensional state with aggregated metrics
SELECT
/* Primary Key */
cse.snapshot_date,
cse.customer_id,
/* Dimensional Attributes */
cse.CUSTOMER_SEGMENT,
cse.CUSTOMER_STATUS,
cse.number_of_consecutive_status_days,
/* Event Flags */
cse.customer_registration_flag,
cse.status_change_flag,
/* Metrics */
COALESCE(ord.number_of_orders, 0) AS number_of_orders,
COALESCE(ord.total_order_amount, 0) AS total_order_amount
FROM customer_status_expanded cse
LEFT JOIN customer_orders_daily ord
ON cse.customer_id = ord.customer_id
AND cse.snapshot_date = ord.snapshot_date
)How it works:
- Get temporal ranges from
VIEW_CUSTOMER_HIST(status periods) - Aggregate event metrics by date (orders per day)
- Expand ranges into daily rows using
GENERATE_DATE_ARRAY+CROSS JOIN UNNEST - Join dimensional state with metrics to create final snapshot
Result grain: One row per customer per day with segment, status, flags, and order metrics.
Query pattern:
-- All premium customers on a specific date with their orders
SELECT
customer_id,
customer_segment,
customer_status,
number_of_orders,
total_order_amount
FROM datamart.fact_customer_daily_snapshot
WHERE snapshot_date = '2024-06-15'
AND customer_segment = 'premium'When to use:
- Regular reporting cadence (daily, weekly, monthly)
- Trend analysis over time
- State-based aggregations ("how many active users per day")
Part 3: Query Guidance
Quick Reference
| Object | Query? | Purpose |
|---|---|---|
VIEW_{ENTITY} | ✅ Yes | Current state - one row per entity |
VIEW_{ENTITY}_HIST | ✅ Yes | Full history - point-in-time queries |
VIEW_{ENTITY}_WITH_REL | ✅ Yes | Current state with relationships pre-joined |
| Your consumption layer | ✅ Yes | Dimensions/facts you build from above |
{ENTITY}_FOCAL | ❌ Avoid | Internal - one row per surrogate key |
{ENTITY}_IDFR | ❌ Avoid | Internal - business ID to surrogate mapping |
{ENTITY}_DESC | ❌ Avoid | Internal - key-value attribute storage |
{ENTITY1}_{ENTITY2}_X | ❌ Avoid | Internal - use VIEW_WITH_REL or build factless facts |
Why Views Over Physical Tables?
Daana's views provide:
- Transposition - Key-value pairs become familiar table columns
- Abstraction - Physical implementation can change without breaking queries
- Performance - Views are optimized for common query patterns
- Simplicity - No need to understand TYPE_KEY mapping or join logic
The principle: You shouldn't need to learn ensemble modeling to use Daana's output. The views hide that complexity.
Part 4: Physical Implementation (Optional Deep Dive)
You don't need to understand ensemble modeling to use Daana's views. But if you're curious about what's under the hood, here's how it works.
For each entity, Daana creates up to four physical tables following the Focal Framework pattern. These tables store data in a generic, flexible structure that the views transpose into user-friendly schemas.
Physical Tables Created
1. FOCAL Table: {ENTITY}_FOCAL
Purpose: Maintains exactly one row per entity instance, identified by the surrogate key. While the IDFR table may contain multiple rows for the same entity (one per business identifier), the FOCAL table consolidates these into a single authoritative row.
When you'll see it: Rarely queried directly - the views handle the complexity.
Structure:
CUSTOMER_FOCAL
├── CUSTOMER_KEY -- Primary surrogate key (unique per entity instance)
├── INST_KEY -- Batch identifier (which pipeline loaded this row)
├── INST_ROW_KEY -- Workflow identifier (which workflow created the metadata)
└── POPLN_TMSTP -- Population timestamp (when row was loaded)2. IDFR Table: {ENTITY}_IDFR (Conditional)
Created when: Entity has multi_idfr type in model definition.
Purpose: Manages key integration by mapping multiple business identifiers to a single stable surrogate key. Real entities often have multiple IDs (internal ID, SSN, loyalty number, partner ID) - this table maps all of them to one CUSTOMER_KEY.
Why it matters: Enables the system to recognize the same entity regardless of which identifier is used in source systems.
Structure:
CUSTOMER_IDFR
├── CUSTOMER_IDFR -- Business identifier value (email, SSN, etc.)
├── EFF_TMSTP -- Effective timestamp (when became valid)
├── VER_TMSTP -- Version timestamp (for bi-temporal support)
├── ROW_ST -- Row status (Y=Active, N=Inactive)
├── CUSTOMER_KEY -- Links to FOCAL table
├── INST_KEY -- Batch identifier
├── INST_ROW_KEY -- Workflow identifier
└── POPLN_TMSTP -- Population timestamp3. DESC Table: {ENTITY}_DESC
Purpose: Stores all descriptive attributes in a flexible, typed structure using generic value columns. Instead of creating fixed columns for each attribute, it uses TYPE_KEY to identify what data is stored in each row.
Why this design: Enables non-destructive development. Adding new attributes requires only new TYPE_KEY definitions and new rows - no schema changes, no ALTER TABLE statements.
How views use it: Views join on TYPE_KEY to transpose these key-value rows into familiar table columns.
Structure:
CUSTOMER_DESC
├── CUSTOMER_KEY -- Links to FOCAL table
├── TYPE_KEY -- Semantic key identifying which attribute this row stores
├── SEQ_NBR -- Sequence number (not used in Daana implementation)
├── EFF_TMSTP -- Effective timestamp
├── VER_TMSTP -- Version timestamp
├── ROW_ST -- Row status (Y=Active, N=Inactive)
├── STA_TMSTP -- Mappable column for timestamp data (start/from)
├── END_TMSTP -- Mappable column for timestamp data (end/to)
├── VAL_STR -- Mappable column for string data
├── VAL_NUM -- Mappable column for numeric data
├── UOM -- Unit of measure (classifies VAL_STR or VAL_NUM)
├── UOM_TYPE -- Unit of measure type
├── DATA_KEY -- Data key column (not used in Daana implementation)
├── INST_KEY -- Batch identifier
├── INST_ROW_KEY -- Workflow identifier
└── POPLN_TMSTP -- Population timestampExample: A single CUSTOMER_KEY might have multiple rows in DESC:
- One row with TYPE_KEY=101 storing FIRST_NAME in VAL_STR
- Another row with TYPE_KEY=102 storing EMAIL in VAL_STR
- Another row with TYPE_KEY=103 storing REGISTRATION_DATE in STA_TMSTP
The view joins these rows and pivots them into columns: FIRST_NAME, EMAIL, REGISTRATION_DATE.
4. Relationship Table: {ENTITY1}_{ENTITY2}_X
Purpose: Captures temporal many-to-many relationships between entities. A single relationship table can store multiple relationship types using TYPE_KEY.
Example: ORDER_CUSTOMER_X might contain different relationship types:
- PLACED_BY (customer who placed the order)
- SHIPPED_TO (customer receiving shipment)
- RETURNED_BY (customer who returned the order)
Structure:
ORDER_CUSTOMER_X
├── ORDER_KEY -- Surrogate key for driving entity
├── CUSTOMER_KEY -- Surrogate key for related entity
├── TYPE_KEY -- Identifies relationship type
├── EFF_TMSTP -- Effective timestamp
├── VER_TMSTP -- Version timestamp
├── ROW_ST -- Row status
├── INST_KEY -- Batch identifier
├── INST_ROW_KEY -- Workflow identifier
└── POPLN_TMSTP -- Population timestampUnderstanding Key Columns
TYPE_KEY - The Semantic Bridge
What it is: A foreign key to the metadata model that identifies which business attribute (Atomic Context) is stored in this row. It's the bridge connecting physical data to logical meaning.
Why it matters: This enables views to transpose generic physical rows into meaningful business attributes. When you query VIEW_CUSTOMER, the view joins on TYPE_KEY to translate data into familiar column format.
In practice: You'll never query on TYPE_KEY directly - the views handle this for you. But understanding it helps when debugging or exploring the physical layer.
Three Timestamp Types
Daana tracks three types of timestamps for different purposes:
EFF_TMSTP (Effective Timestamp): Business time - when this version became valid in the real world. Used for point-in-time queries ("What was the customer's segment on 2024-06-15?").
VER_TMSTP (Version Timestamp): System time - when this version was created in the warehouse. Supports bi-temporal tracking (multiple versions at the same effective time). Used for audit trails and understanding when your knowledge of the data changed.
POPLN_TMSTP (Population Timestamp): Load time - when this row was physically inserted into the table. Used for operational monitoring and pipeline debugging.
Why three timestamps matter: They enable you to answer different questions:
- "What was true at this business date?" → EFF_TMSTP
- "What did we know at this system date?" → VER_TMSTP
- "When did we load this data?" → POPLN_TMSTP
Audit and Metadata Columns
INST_KEY (Instance Key):
Batch identifier indicating which pipeline execution loaded this row. Join to DAANA_METADATA.PROCINST_DESC to see the exact SQL that loaded this data - valuable for debugging.
INST_ROW_KEY (Instance Row Key): Workflow identifier indicating which workflow created the metadata for this row.
ROW_ST (Row Status): Active/Inactive indicator (Y=Active, N=Inactive). Combined with temporal columns, determines if the row is currently valid.
SEQ_NBR (Sequence Number): Sequence identifier (not used in Daana implementation).
DATA_KEY: Data key column (not used in Daana implementation).
UOM (Unit of Measure): Classifies what kind of data resides in VAL_STR or VAL_NUM (e.g., "USD", "EUR", "kg", "meters").
Operational Lineage
Every physical table includes INST_KEY, enabling you to trace which pipeline loaded any specific row:
-- Find the SQL that loaded a specific customer row
SELECT DISTINCT pd.VAL_STR
FROM DAANA_METADATA.PROCINST_DESC pd
INNER JOIN DAANA_DW.CUSTOMER_DESC cd
ON cd.INST_KEY = pd.PROCINST_KEY
WHERE cd.CUSTOMER_KEY = 'CUST_12345'This operational lineage is invaluable for debugging and understanding data provenance.
Next Steps
- Explore the views: After running
daana-cli execute, querySELECT * FROM DAANA_DW.VIEW_{ENTITY}to see your entities - Build Type 1 dimensions: Use
VIEW_{ENTITY}directly or with light transformation - Build Type 2 dimensions: Apply the LEAD window pattern to
VIEW_{ENTITY}_HIST - Build facts: Start with transaction facts from entity views + measures
- Advanced patterns: Factless facts and periodic snapshots when needed
Related Documentation
- DMDL: Model Definition - How to define entities
- DMDL: Mapping - How to map source data to entities
- Commands - All available commands
- Focal Framework - Deep dive into the Focal Framework