Guides
Understanding Daana Output
Overview
When you run daana-cli execute, Daana generates a complete ensemble model implementation in your data warehouse. This documentation explains:
- What you'll query - The views Daana creates
- How to build your consumption layer - Patterns for dimensions and facts
- Physical implementation - What's under the hood (optional deep dive)
Understanding the Three-Layer Architecture
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Raw Ingestion (DAS - Data As System sees it) │
│ Your source systems, staging tables, raw data │
│ This layer can be thought of as a bronze- or raw layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Business Entities (DAB - Data As Business sees it) │ ← DAANA CREATES THIS LAYER
│ Conformed, historized entities in ensemble model pattern │
│ This layer can be thought of as a silver-, model- or │
│ intermediate layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Analytics Consumption (DAR - Data As Requirements) │ ← YOU BUILD THIS FROM DAANA'S OUTPUT
│ Dimensions, facts, marts tailored to analytics needs │
│ This layer can be thought of as a gold- or mart layer │
└─────────────────────────────────────────────────────────────┘
Daana's role: Generate and maintain Layer 2. Provide clean, queryable views that serve as building blocks for Layer 3.
Your role: Use Daana's views to build consumption-layer dimensions and facts for your analytics requirements.
How to Use This Guide
This documentation is designed for selective reading. Choose your path based on your immediate needs:
Getting started (15-30 minutes):
- Read Part 1 to understand the views Daana creates
- Skim Part 2 pattern descriptions to see what's possible
- Start building with Pattern 1 (Type 1 Dimensions)
Building your consumption layer (progressive approach):
- Patterns 1-2: Start here for dimensions (Type 1 and Type 2)
- Patterns 3-4: Read when you need fact tables
- Pattern 5: Read when you need periodic snapshots
- Each pattern is self-contained - read only what you need right now
Reference and deep dives:
- Part 3 (Query Guidance): Quick reference for what to query vs. avoid
- Part 4 (Physical Implementation): Optional - for troubleshooting or curiosity about what's under the hood
- Focal Framework: Deep dive for architects and framework evaluators
Time-based reading paths:
- 30 minutes: Part 1 + Pattern 1 → start building Type 1 dimensions
- 2 hours: Part 1 + all of Part 2 → comprehensive understanding of consumption patterns
- Deep dive: Add Part 4 + Focal Framework Guide → full architectural understanding
By role:
- Data/Analytics Engineer building consumption layer: Part 1-2, reference Part 3 as needed
- Troubleshooting/debugging: Part 1, Part 3, skim Part 4 for column explanations
- Architect evaluating Daana: All parts + Focal Framework Guide
- BI analyst exploring data: Part 1, Part 3
Part 1: What You'll Query - The Views
For each entity defined in your model.yaml (e.g., CUSTOMER), Daana creates three views in the daana_dw schema. These views transpose the physical key-value structure into user-friendly table schemas.
Note on examples: The attribute names used in examples throughout this document (first_name, email, customer_segment, customer_status_started_date, etc.) are illustrative. The actual attributes in your views depend on what you define in your model. The patterns and techniques remain the same regardless of your specific attribute names.
SQL dialect and identifier casing. Patterns 1 and 2 (Type 1 and Type 2 dimensions) are written in dialect-neutral SQL that runs on PostgreSQL out of the tutorial box. Patterns 3, 4, and 5 use BigQuery-specific syntax (TIMESTAMP_SUB, GENERATE_DATE_ARRAY, CROSS JOIN UNNEST, QUALIFY); each is called out in its section with a Postgres adaptation note. Identifiers are shown lowercase to match what \d daana_dw.view_<entity> reports on PostgreSQL; on Snowflake or Oracle they fold to uppercase, but Daana itself is case-insensitive on the wire.
view_{entity} - Current State (Type 1)
Purpose: Current state of each entity. One row per entity key, showing the most recent attribute values.
When to use:
- Dashboards showing current state
- Operational reports
- "What is the customer's current email?"
Structure:
view_customer
├── customer_key
├── eff_tmstp -- When this version became effective (business time)
├── first_name
├── last_name
├── email
├── customer_segment
└── ... (all attributes from your model)
Example query:
SELECT
c.customer_key,
c.first_name,
c.last_name,
c.email,
c.customer_segment
FROM daana_dw.view_customer c
WHERE c.customer_segment = 'premium'
view_{entity}_hist - Full History (Type 2)
Purpose: Complete change history for each entity. Multiple rows per entity key, one row per change.
When to use:
- Historical trend analysis
- Point-in-time reporting
- "What was the customer's email on 2024-06-15?"
- Auditing and compliance
Structure:
view_customer_hist
├── customer_key
├── eff_tmstp -- When this version became effective (business time)
├── first_name
├── last_name
├── email
├── customer_segment
└── ... (all attributes from your model)
Example query (PostgreSQL):
-- Find customer state at a specific point in time
SELECT
c.customer_key,
c.first_name,
c.email,
c.customer_segment
FROM daana_dw.view_customer_hist c
WHERE c.customer_key = 'CUST_12345'
AND c.eff_tmstp <= TIMESTAMP '2024-06-15 00:00:00'
ORDER BY c.eff_tmstp DESC
LIMIT 1;
For point-in-time queries across all customers at once, prefer the f_customer(p_eff_tmstp) function shown below; it wraps the same predicate.
To use as a proper Type 2 dimension with valid_from/valid_to columns, see Pattern 2 below.
Value-run compaction
view_{entity}_hist returns one row per value-run. The view returns a single row for consecutive observations carrying the same full attribute tuple, anchored at the earliest known eff_tmstp. If an entity is observed five times with customer_email = 'a@example.com' and then changes to b@example.com, the view returns two rows: one per distinct email value.
For BI tools that compute aggregates, this means COUNT(*), AVG(val_num), and SUM(val_num) over the view return one row per distinct state, not one per ingest event. Time-weighted aggregates also remain correct because each row anchors at the earliest eff_tmstp of its run.
For every observation as recorded, including same-value rows from repeated ingest events, query the underlying <entity>_desc table:
SELECT customer_key, type_key, eff_tmstp, val_str, val_num, row_st
FROM daana_dw.customer_desc
WHERE customer_key = 'CUST_12345'
ORDER BY type_key, eff_tmstp;
The view is the consumer access path. The customer_desc table is the audit-trail access path.
view_{entity}_with_rel - Current State with Relationships
Purpose: Current state of entity with all relationships already joined.
When to use:
- Simplified querying when you need entity + related entities
- Exploratory analysis across related entities
Structure:
view_customer_with_rel
├── customer_key
├── first_name
├── last_name
├── email
├── ... (all customer attributes)
├── order_key -- Related ORDER entities
├── ... (all order attributes)
├── account_key -- Related ACCOUNT entities
├── ... (all account attributes)
└── ... (all relationship keys and attributes from your model related to customer)
Example query:
SELECT
c.customer_key,
c.first_name,
c.order_key,
c.account_key
FROM daana_dw.view_customer_with_rel c
WHERE c.customer_segment = 'premium'
The join key lives on the target side of the relationship. Daana materializes related-entity columns on the target of each relationship, not the source. For an
ORDER -> CUSTOMERrelationship (sourceORDER, targetCUSTOMER), the foreign keycustomer_keyand the customer's columns appear onview_customer_with_rel, not onview_order_with_rel. The order side carries only ORDER's own attributes. Rule of thumb: if the foreign key is missing where you started, look at the_with_relview of the entity on the other end of the relationship. Run\d daana_dw.view_<entity>_with_relto see exactly which columns each side carries.
Internal EAV view: v_{entity}_desc
Purpose: A debugging view over the raw {entity}_desc table that joins to atom_contx_desc to surface a human-readable name column alongside val_str, val_num, uom, and the temporal columns.
When to use: Only when you want to inspect the EAV layout, for example to confirm that a grouped attribute ends up on a single physical row carrying both val_num and uom.
Note on the
namecolumn. Thenameisatom_contx_desc.val_str. By convention that value carries the entity prefix (for exampleORDER_order_id,ORDER_order_status). The user-facingview_{entity}family is a separate view layer that exposes each attribute as its own column named by the attribute'snamefield from the model YAML. The two view families show different naming layers from the metadata model; one is not derived from the other.
Time-travel function: f_{entity}(p_eff_tmstp)
Purpose: Returns each entity's full state as of the given timestamp, in the same column shape as view_{entity}.
Default argument: p_eff_tmstp defaults to the sentinel '9999-12-31', so calling f_{entity}() with no argument returns the latest snapshot. This is the recommended one-liner for current-state queries when you want the function form (e.g., for a parameterised report).
Example:
SELECT customer_key, customer_name, customer_city, customer_status
FROM daana_dw.f_customer('2024-11-27');
-- No argument => latest state, equivalent to selecting from view_customer
SELECT customer_key, customer_status
FROM daana_dw.f_customer()
WHERE customer_key = '2';
This is the same function the tutorial uses in Chapter 1, Query 3 and Chapter 2, Time-travel queries. For BI tools that need explicit valid_from/valid_to ranges, build a Type 2 dimension on top of view_{entity}_hist (Pattern 2 below).
Relationship Tables: {entity1}_{entity2}_x
Purpose: Track many-to-many relationships between entities over time.
Naming convention: The _x suffix denotes a relationship table.
Structure:
order_customer_x
├── order_key -- Links to ORDER entity
├── customer_key -- Links to CUSTOMER entity
├── type_key -- Relationship type (if multiple types exist)
├── eff_tmstp -- When this relationship became effective
├── ver_tmstp -- When this relationship was recorded
├── row_st -- Row status
└── popln_tmstp -- When this data was loaded
What you'll see in the relationship table over time
The relationship table records every observed set of focal02 members for one focal01_key at each eff_tmstp, scoped to one type_key. One row exists per (focal01, focal02) pair in the set at that timestamp, with row_st = 'Y' at the time the set was observed.
Suppose PRODUCT_001 belongs to three product groups at first load:
focal01_key | focal02_key | type_key | row_st | eff_tmstp
PRODUCT_001 | PRODUCT_GROUP_1 | 65 | Y | 2020-01-01 09:15:42.123456
PRODUCT_001 | PRODUCT_GROUP_2 | 65 | Y | 2020-01-01 09:15:42.123456
PRODUCT_001 | PRODUCT_GROUP_3 | 65 | Y | 2020-01-01 09:15:42.123456
The source later emits a new group membership for PRODUCT_001 at 2020-01-03 14:23:17.123456 that drops PRODUCT_GROUP_3 and adds PRODUCT_GROUP_4. Daana writes one batch of rows at the new timestamp:
PRODUCT_001 | PRODUCT_GROUP_1 | 65 | Y | 2020-01-03 14:23:17.123456
PRODUCT_001 | PRODUCT_GROUP_2 | 65 | Y | 2020-01-03 14:23:17.123456
PRODUCT_001 | PRODUCT_GROUP_4 | 65 | Y | 2020-01-03 14:23:17.123456
The relationship table now contains six rows for PRODUCT_001: the original three at the 2020-01-01 timestamp, and the new three at the 2020-01-03 timestamp. All carry row_st = 'Y'. The reader writes no closer rows for set transitions. Set membership at any past point is reconstructable from these rows by selecting the largest eff_tmstp not exceeding the query time per (focal01_key, type_key).
Querying the active set
The view view_{focal01}_has_{focal02} returns the currently-active set per focal01. It wraps the function f_{focal01}_{focal02}_x() which applies the following query against the _x table:
SELECT focal01_key, focal02_key, type_key, eff_tmstp, ver_tmstp
FROM (
SELECT *,
RANK() OVER (
PARTITION BY focal01_key, type_key
ORDER BY eff_tmstp DESC, ver_tmstp DESC
) AS rnk
FROM order_customer_x
) ranked
WHERE rnk = 1 AND row_st = 'Y';
The RANK() partitions by (focal01_key, type_key). Within each partition it returns the rows tied on the largest eff_tmstp (with ver_tmstp as a tie-breaker). The outer WHERE rnk = 1 AND row_st = 'Y' keeps only those whose latest observation is active.
For PRODUCT_001 post-update, the view returns {PRODUCT_GROUP_1, PRODUCT_GROUP_2, PRODUCT_GROUP_4} (the 2020-01-03 set). The 2020-01-01 rows sit at rnk = 2 and are not returned.
For point-in-time queries, call the function directly: f_product_product_group_x(TIMESTAMP '2020-01-01 12:00:00') returns the membership as known at that time (the 2020-01-01 set).
Source-system prerequisite: complete groups at one timestamp
The set-change detection above depends on the source system emitting the complete group for (focal01_key, type_key) in one update, with all member rows sharing a single database timestamp. Daana detects a set change by comparing the fingerprint of all focal02_key values at one eff_tmstp against the fingerprint at the previous eff_tmstp. The reader writes the new set's rows whenever the fingerprint differs. Source updates that emit the set one pair at a time, each with its own timestamp, defeat the fingerprint and produce one apparent set change per row.
Two cases follow:
- The source emits the full set in one update. All member rows share one source timestamp. Daana sees the change and writes the closer batch plus the new batch correctly. This is the supported pattern.
- The source emits the set one pair at a time, each with its own timestamp. Daana compares each single-row "set" against the previous single-row set, so the fingerprint always differs by one element. The result is insert-only behavior with no set-change detection, and historical sets cannot be reconstructed accurately.
Source systems that produce grouped state changes at a single transactional timestamp satisfy this contract. A Kafka event carrying the full membership tuple as one payload satisfies it; a separate Kafka event per pair, each with its own commit timestamp, does not.
Snapshot and collection sources
The relationship-table prerequisite is that each source update carry the complete set of focal02 members for one focal01 at a single eff_tmstp. Two source shapes meet this prerequisite directly:
- Event-shaped sources. A Kafka payload carrying the full membership tuple at the moment of state change. Every payload is one update.
- Snapshot-shaped sources. A table with one row per
(parent, member, snapshot_timestamp). Every snapshot contains every currently-active pair. The complete set at anysnapshot_timestampis the rows that share that timestamp.
Daana's reader takes either shape as input without upstream pre-aggregation. Two deduplication steps run at execution time:
- Row-level deduplication. The reader applies
SELECT DISTINCTto the source read. Exact-duplicate input rows are eliminated before fingerprint computation. - Set-level change detection. The reader computes the fingerprint per
(focal01_idfr, eff_tmstp)group viaSTRING_AGG(focal02_idfr ORDER BY focal02_idfr), compares each timestamp's fingerprint to the previous timestamp's fingerprint perfocal01viaLAG, and emits rows for the new set only at change points. Adjacent snapshots with identical fingerprints produce no relationship-table rows.
Snapshot sources benefit from both: identical rows within a snapshot are deduplicated at step 1, and unchanged snapshots between change points are dropped at step 2.
The source shape that needs upstream work is collection-shaped: a table where each row carries the full set as a structured value (a JSON array, a Postgres array column, a comma-separated string). The reader compares fingerprints of row-level values and does not parse arrays. Normalize collection-shaped sources to row shape before the data reaches Daana.
-- Source: post_staging.product_groups_snapshot
-- snapshot_ts | product_id | product_groups
-- 2024-01-01 | PROD_001 | ["G1","G2","G3"]
-- 2024-01-02 | PROD_001 | ["G1","G2","G3"] (unchanged)
-- 2024-01-03 | PROD_001 | ["G1","G2","G4"] (changed)
CREATE VIEW post_staging.product_group_memberships AS
SELECT snapshot_ts, product_id, group_id
FROM post_staging.product_groups_snapshot,
UNNEST(product_groups) AS group_id;
The view emits three rows at each snapshot_ts, one per group, in the shape Daana's reader expects. Daana reads all snapshot rows but emits no relationship-table rows for the unchanged 2024-01-02 snapshot.
Failure mode: first-seen union of pairs
A common upstream pattern accumulates relationship pairs as they are observed and writes each (parent, member) row at the timestamp it was first seen. The resulting table looks correct row-by-row but violates the relationship-table prerequisite for M:N relationships.
Where first-seen is valid. In a 1:M relationship, each parent has at most one active member at any time. A new (parent, member) row at a new eff_tmstp is by definition a state change: the previous member is no longer active, the new one is. First-seen and complete-set are the same shape when the set size is one.
Where first-seen breaks. M:N relationships allow a parent to have multiple concurrently-active members. The reader treats each (focal01_idfr, eff_tmstp) group as the full membership set for that parent at that timestamp. A first-seen feed carries one new pair per timestamp, not the complete set.
Consequence. The fingerprint at every observed timestamp differs from the previous fingerprint by exactly one element. The reader interprets each row as a set change. The relationship table fills with spurious one-row "sets" at each successive eff_tmstp. view_{focal01}_has_{focal02} returns one active member per parent at any given time, never the full membership.
Recovery. Daana cannot reconstruct the complete set from a first-seen feed. The reader sees one pair per timestamp and has no signal distinguishing "this is the new member added to an existing set" from "the set is now exactly this one member." Recovery requires adding group-membership state upstream of Daana so the complete set at each change point is recoverable.
The most direct option is to materialize a snapshot table downstream of the first-seen feed: one row per (parent, member, snapshot_ts), with every active pair present at every snapshot. Daana reads the snapshot. The complete set at any snapshot_ts is the rows sharing that timestamp.
Whichever shape the state takes (snapshot table, stateful stream processor that re-emits the full set on each change, materialized view over an append-only history), the constraint is the same: something upstream of Daana must know what the current group is before Daana can detect when it changes.
Empty sets and removal semantics
The relationship-table reader detects set changes by comparing fingerprints across eff_tmstp values within the source read. The comparison is timestamp-against-timestamp, not source-against-target. A focal01 that previously had members and is no longer mentioned by the source produces no rows at the silent timestamp. What happens to the active set going forward depends on the ingestion strategy:
- INCREMENTAL. The strategy does not infer removal from absence. The last observed set remains active indefinitely in the relationship table. To remove members, the source must emit an explicit empty-set update (zero member rows for that
focal01) at a neweff_tmstp. The reader emits nothing for an empty observation; the set-oriented view continues to return the last non-empty set. - FULL_LOG. The strategy compares every batch's source set against target state. If a
(focal01, focal02)pair that previously existed in target is absent from the current batch's set for thatfocal01, the truedelta's soft-delete branch writes arow_st = 'N'row for the pair at the batch'seff_tmstp. The view's outerWHERE row_st = 'Y'filter then excludes the closed pair fromview_{focal01}_has_{focal02}going forward. Under FULL_LOG, removal is inferred from absence.
The most reliable cross-strategy pattern is for the source to emit the complete set at every change timestamp, including empty sets. INCREMENTAL requires the explicit empty-set emission to record removal. FULL_LOG works correctly with implicit absence but also handles explicit empty sets correctly.
Sources whose business semantic is "the parent no longer has members" should materialize that statement as an emission rather than rely on silence. The fingerprint of an empty set is the empty string, and the reader treats a fingerprint transition from non-empty to empty as a real set change.
Reading a populated relationship table
A relationship table for an actively-changing M:N relationship accumulates rows over time. Each set change writes one row per member of the new set at the change timestamp. A focal01 with 20 members and 100 change events over a year produces roughly 2000 rows. The table is correct; the row count is a function of activity, not noise.
When inspecting the table directly, the patterns below distinguish expected output from upstream problems.
Expected: only row_st = 'Y' rows from set transitions. A set change writes new Y rows at the new eff_tmstp. The previous set's Y rows from earlier eff_tmstp values remain in the table unchanged. The reader writes no closer (row_st = 'N') rows for set transitions. The set-oriented view uses RANK() over the partition (focal01_key, type_key) ordered by eff_tmstp DESC, ver_tmstp DESC, and the outer WHERE rnk = 1 AND row_st = 'Y' filter handles which observation is the active set.
Expected: under FULL_LOG, isolated row_st = 'N' rows for pairs that disappeared from source. A previously-active pair that is no longer in the source's current batch produces a closer row at the batch eff_tmstp from the FULL_LOG truedelta's soft-delete branch. The view excludes that pair from the active set going forward. These N rows do not pair with later Y rows for the same (focal01, focal02) at the same eff_tmstp; they are individual closures, not part of a set-transition pair.
Expected: under INCREMENTAL, no row_st = 'N' rows. INCREMENTAL is forward-only and produces no N rows under any normal source pattern. If the source explicitly emits an empty set for a focal01 at a new eff_tmstp, the reader produces no rows at all for that observation (the view's behavior on the now-empty active set falls through to the previous non-empty eff_tmstp).
Investigation signal: more distinct eff_tmstp values per focal01_idfr than known source updates. Count distinct eff_tmstp values per focal01_idfr. If the count is roughly equal to the source update count, the reader is detecting real changes. If the count is dramatically higher, suspect the first-seen anti-pattern: every individual pair appears as its own change event (see "Failure mode: first-seen union of pairs" above).
Note: legacy row_st = 'N' rows from pre-DAANA-131 deployments. Customers who deployed the relationship reader before the DAANA-131 release may have N closer rows in their _x table from set transitions, recorded under the previous reader's emit-closers-on-set-change behavior. Those rows are correctly eclipsed by the set-oriented view's outer WHERE row_st = 'Y' filter and produce no consumer-visible incorrectness. They can be left in place or cleaned up via daana-cli execute --full-refresh --entity <FOCAL01_ENTITY> --yes.
For most consumption work, query view_{focal01}_has_{focal02} (or call f_{focal01}_{focal02}_x(p_eff_tmstp) for point-in-time queries) and filter by current time rather than inspecting the relationship table directly. The view applies the set-oriented filter and returns one row per active member. Direct inspection of the relationship table is for debugging the source feed or auditing history, not for downstream analytics.
Recovering from a misshapen source
After fixing a source-shape problem (the first-seen anti-pattern reshaped to emit complete sets, a collection-shaped feed normalized to row shape, source updates re-issued at a single eff_tmstp per change), the relationship table does not self-correct. Daana writes correct rows from the moment the corrected source is deployed forward; rows already in the table reflect the historical (incorrect) interpretation.
To clear the historical rows and reload from the corrected source, use Full refresh scoped to the affected entity:
daana-cli execute --full-refresh --entity <FOCAL01_ENTITY> --yes
The operation clears the entity's relationship-table rows along with its data and re-executes the pipeline against the current source. After the corrected source has been deployed, a single full refresh reproduces the relationship table from a clean state.
The three full-refresh scopes (model, entity, attribute) apply to relationship-table recovery the same way they apply to entity-attribute recovery. Choose the narrowest scope that contains the issue: one entity if only that entity's relationships are affected, the whole model if multiple relationship tables were produced by the same misshapen feed.
Full refresh does not fix gaps in what the source emits. Under INCREMENTAL, if the source never sends an empty-set signal, "set is now empty" remains undetectable before and after a full refresh. The fix in that case is at the source: add the emission, then refresh. Under FULL_LOG, source-side absence is enough — the truedelta closes pairs that disappear from a subsequent batch.
When to use: Prefer view_{focal01}_has_{focal02} (which calls f_{focal01}_{focal02}_x()) for normal querying. The view applies the set-oriented filter and returns one row per active member. Relationship tables are useful in advanced consumption patterns (see Pattern 3: Factless Facts) and when joining Daana's output to external tables that don't live in the model. The view_{entity}_with_rel view is the joined "entity plus its relationships as columns" shape; that view shows entity attributes alongside the active-set members.
Part 2: Building Your Consumption Layer
Why Layer 3 Exists
Different consumption needs require different structures. A BI dashboard optimized for speed needs different structure than a data science feature store or a CRM export.
Layer 2 (Daana) handles:
- Data conformance (consistent business keys)
- Historization (tracking changes over time)
- Data quality (validation and testing)
- Lineage (what came from where)
Layer 3 (you) optimizes for:
- Specific use cases (BI, ML, exports)
- Query performance (denormalization, aggregation)
- User experience (familiar schemas, simple joins)
Be Pragmatic - The Heavy Lifting is Done
Daana has already done the hard governance work: your data is conformed, historized, well-defined, and quality-checked. Now be pragmatic.
Need a denormalized wide table for a dashboard? Build it. Need a different grain for analysis? Do it. Need a flat file export for a CRM system? Serve it.
The point of Layer 3 is to optimize for consumption without worrying about breaking the source of truth. Layer 2 remains clean and consistent while Layer 3 adapts to how your organization actually uses data.
Common Consumption Patterns
The following patterns show how to build Layer 3 structures from Daana's Layer 2 views:
- Type 1 Dimensions - Use
view_{entity}directly - Type 2 Dimensions - Add
valid_from/valid_totoview_{entity}_hist - Factless Facts (Twine of Keys) - Weave relationship timelines together
- Transaction Facts - Combine entity views with measures
- Periodic Snapshot Facts - Daily/weekly/monthly entity state
Pattern 1: Type 1 Dimensions (Current State)
What it is: Dimension showing only current state. Updates overwrite previous values.
When to use:
- Real-time dashboards
- Operational reporting
- Analyses that don't need historical context
How to build:
-- Option 1: Use Daana's view directly
CREATE VIEW datamart.dim_customer AS
SELECT
customer_key AS customer_id,
first_name,
last_name,
email,
customer_segment,
eff_tmstp AS last_updated
FROM daana_dw.view_customer;
That's it. Daana's view_{entity} views are already Type 1 dimensions. Note that view_customer only carries the current snapshot, so eff_tmstp is the timestamp of the latest version; if you need a true "last changed" stamp across attributes, derive it from view_customer_hist.
Pattern 2: Type 2 Dimensions (Full History)
What it is: Dimension tracking complete change history with valid_from/valid_to date ranges and a current version flag.
When to use:
- Historical trend analysis
- Point-in-time reporting
- Regulatory compliance and auditing
- "What did the data look like on X date?"
The pattern: Add valid_from and valid_to columns to view_{entity}_hist using window functions.
How to build (PostgreSQL):
CREATE OR REPLACE VIEW datamart.dim_customer AS
SELECT
customer_key AS customer_id,
-- Type 2 SCD validity range
eff_tmstp AS valid_from,
COALESCE(
LEAD(eff_tmstp) OVER (PARTITION BY customer_key ORDER BY eff_tmstp) - INTERVAL '1 millisecond',
TIMESTAMP '9999-12-31 23:59:59'
) AS valid_to,
CASE
WHEN LEAD(eff_tmstp) OVER (PARTITION BY customer_key ORDER BY eff_tmstp) IS NULL
THEN TRUE ELSE FALSE
END AS is_current,
-- Business attributes
first_name,
last_name,
email,
customer_segment,
customer_status
FROM daana_dw.view_customer_hist;
BigQuery variant: replace the LEAD(...) - INTERVAL '1 millisecond' with LEAD(TIMESTAMP_SUB(eff_tmstp, INTERVAL 1 MILLISECOND), 1, TIMESTAMP('9999-12-31 23:59:59')) OVER (...). The semantics are identical.
How it works:
- valid_from: effective timestamp when this version became active.
- valid_to: look ahead to the next version's effective timestamp minus one millisecond (using
LEAD). 9999-12-31 23:59:59: default for the current version, "valid until further notice".
This pattern ensures non-overlapping validity ranges. When the next version starts at timestamp T, the current version ends at T - 1 millisecond, preventing any ambiguity in point-in-time queries.
Querying for point-in-time state:
-- Get customer state on 2024-06-15
SELECT
customer_id,
customer_segment,
customer_status
FROM datamart.dim_customer
WHERE customer_id = 'CUST_12345'
AND TIMESTAMP '2024-06-15' BETWEEN valid_from AND valid_to;
Future Daana versions will generate these Type 2 dimensions automatically.
Applying this pattern to other entities: The same LEAD window technique works for any entity. To create dim_product, dim_order, or any other Type 2 dimension, apply this pattern to the respective view_{entity}_hist view.
Pattern 3: Factless Facts (Twine of Keys)
What it is: A fact table that captures relationships or events without numeric measures. Tracks which dimensions were related at which points in time.
When to use:
- Tracking relationships that change over time
- Event logging (who did what when)
- Relationship history when an entity relates to multiple other entities that can change independently
When NOT to use:
- If you only have one relationship per entity (e.g., ORDER always has exactly one CUSTOMER), a simple JOIN suffices
The challenge: An ORDER might relate to different CUSTOMER and PRODUCT over time (reassignments, corrections). Each relationship can change independently. How do you weave these timelines together?
The "twine of keys" pattern: Union all relationship tables, track each timeline separately, then pivot back to wide format.
Dialect: the example below uses BigQuery syntax (
QUALIFY,RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWover aMAX(...) OVER ..., plusCAST(NULL AS STRING)). On PostgreSQL, replaceQUALIFY ... = 1with aWHERE rn = 1filter on aROW_NUMBER()CTE and useCAST(NULL AS varchar)for the typedNULLs.
How to build (BigQuery):
CREATE OR REPLACE TABLE datamart.fact_order_relationships AS (
-- Step 1: Union all relationships into tall format
WITH twine AS (
-- ORDER → CUSTOMER
SELECT
ORDER_KEY,
CUSTOMER_KEY,
CAST(NULL AS STRING) AS PRODUCT_KEY,
EFF_TMSTP,
'ORDER_CUSTOMER' AS rel_type
FROM DAANA_DW.ORDER_CUSTOMER_X
UNION ALL
-- ORDER → PRODUCT
SELECT
ORDER_KEY,
CAST(NULL AS STRING) AS CUSTOMER_KEY,
PRODUCT_KEY,
EFF_TMSTP,
'ORDER_PRODUCT' AS rel_type
FROM DAANA_DW.ORDER_PRODUCT_X
),
-- Step 2: Track the most recent effective timestamp for each relationship type
in_effect AS (
SELECT
ORDER_KEY,
EFF_TMSTP,
rel_type,
CUSTOMER_KEY,
PRODUCT_KEY,
-- Carry forward the last known effective timestamp for each relationship
MAX(CASE WHEN rel_type = 'ORDER_CUSTOMER' THEN EFF_TMSTP END)
OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_customer,
MAX(CASE WHEN rel_type = 'ORDER_PRODUCT' THEN EFF_TMSTP END)
OVER (PARTITION BY ORDER_KEY ORDER BY EFF_TMSTP
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS eff_product
FROM twine
QUALIFY RANK() OVER (PARTITION BY ORDER_KEY, EFF_TMSTP ORDER BY EFF_TMSTP DESC) = 1
),
-- Step 3: Separate CTEs for each relationship type
cte_customer AS (
SELECT ORDER_KEY, EFF_TMSTP AS eff_customer_ts, CUSTOMER_KEY
FROM in_effect WHERE rel_type = 'ORDER_CUSTOMER'
),
cte_product AS (
SELECT ORDER_KEY, EFF_TMSTP AS eff_product_ts, PRODUCT_KEY
FROM in_effect WHERE rel_type = 'ORDER_PRODUCT'
)
-- Step 4: Join everything back together (pivot to wide format)
SELECT DISTINCT
ie.ORDER_KEY,
ie.EFF_TMSTP,
cust.CUSTOMER_KEY,
prod.PRODUCT_KEY
FROM in_effect ie
LEFT JOIN cte_customer cust
ON ie.ORDER_KEY = cust.ORDER_KEY
AND ie.eff_customer = cust.eff_customer_ts
LEFT JOIN cte_product prod
ON ie.ORDER_KEY = prod.ORDER_KEY
AND ie.eff_product = prod.eff_product_ts
)
How it works:
- Union relationships into tall format - Each relationship type becomes a row with sparse columns
- Track timelines separately - Window function carries forward last known EFF_TMSTP for each relationship
- Pivot to wide format - Join CTEs to reconstruct one row per ORDER_KEY + EFF_TMSTP with all dimension keys
Result grain: One row per ORDER_KEY per point in time when any relationship changed.
Why "twine of keys"? Like weaving threads together - each relationship has its own timeline, woven into one coherent history showing "what was in effect when."
Future Daana versions will generate these factless facts automatically.
Pattern 4: Transaction Facts
What it is: Fact table with one row per business event or transaction, containing dimension keys, numeric measures, and calculated metrics that span multiple entities.
When to use:
- Order analysis requiring customer/product context at time of order
- Metrics that combine attributes from multiple entities
- "What was the customer's segment when they placed this order?"
- "What was the product cost at time of purchase?" (for margin calculation)
The foundation: Transaction facts build on factless facts (Pattern 3). The factless fact gives you dimension keys at each point in time. Now join to Type 2 dimensions to get their attributes as they were at that moment.
Dialect: the example below assumes the BigQuery factless-fact CTE from Pattern 3. On PostgreSQL, build the equivalent factless fact first (using
LATERALjoins or window functions instead ofQUALIFY) and then run this same join shape; only the dialect of the upstream factless-fact CTE changes.
For the simpler one-relationship case (e.g., the tutorial's ORDER -> CUSTOMER), Tutorial Chapter 7 walks through a fact_orders build that joins view_order to view_customer_with_rel (target side, where customer_key lives) and to dim_customer with a BETWEEN valid_from AND valid_to predicate. Use that pattern when there is no need for a separate factless-fact step.
How to build (BigQuery):
CREATE TABLE datamart.fact_orders AS
SELECT
/* Grain: One row per order */
ff.ORDER_KEY AS order_id,
ff.EFF_TMSTP AS order_timestamp,
/* Dimension Keys */
ff.CUSTOMER_KEY AS customer_id,
ff.PRODUCT_KEY AS product_id,
/* Metrics from ORDER entity */
o.ORDER_DATE,
o.ORDER_QUANTITY,
o.ORDER_AMOUNT,
/* Calculated measures spanning entities */
-- Get product cost at point in time for margin calculation
o.ORDER_AMOUNT - (o.ORDER_QUANTITY * p.PRODUCT_COST) AS order_margin,
-- Calculate loyalty points based on customer segment at time of order
CASE
WHEN c.CUSTOMER_SEGMENT = 'premium' THEN o.ORDER_AMOUNT * 0.1
ELSE 0
END AS loyalty_points_earned
FROM datamart.fact_order_relationships ff -- Start with factless fact!
-- Get ORDER attributes (these don't change over time for a given order)
JOIN DAANA_DW.VIEW_ORDER o
ON ff.ORDER_KEY = o.ORDER_KEY
-- Point-in-time join to CUSTOMER Type 2 dimension
-- Join to get customer state at time of order for calculations
JOIN datamart.dim_customer c
ON ff.CUSTOMER_KEY = c.customer_id
AND ff.EFF_TMSTP BETWEEN c.valid_from AND c.valid_to
-- Point-in-time join to PRODUCT Type 2 dimension
-- Join to get product cost at time of order for margin calculation
JOIN datamart.dim_product p
ON ff.PRODUCT_KEY = p.product_id
AND ff.EFF_TMSTP BETWEEN p.valid_from AND p.valid_to
This fact table follows Kimball methodology by storing only keys, timestamps, and numeric measures. Dimensional attributes (like customer segment, product category) are accessed through joins to dimension tables rather than denormalized into the fact table. This keeps the fact table lean and flexible.
Note on BI tool compatibility: Some BI tools struggle with joins that use both a key and a timestamp range (BETWEEN valid_from AND valid_to). If you encounter performance or compatibility issues, consider creating a surrogate version key in your dimension tables (e.g., customer_version_key as a concatenation of customer_key and valid_from timestamp) to simplify joins to a single equality condition.
How it works:
- Start with factless fact - Gives you ORDER_KEY + EFF_TMSTP + all related dimension keys
- Join to ORDER view - Get order-level measures (quantity, amount)
- Point-in-time join to Type 2 dimensions - Get customer/product attributes as they were at EFF_TMSTP
- Calculate cross-entity metrics - Order margin, customer lifetime value, segment-based calculations
Why point-in-time joins matter:
-- Customer segment might have changed since order was placed
Customer 'CUST_123' timeline:
2024-01-01 to 2024-06-30: segment = 'standard'
2024-07-01 to 9999-12-31: segment = 'premium'
Order placed on 2024-05-15:
- Point-in-time join: customer_segment_at_order = 'standard' ✓
- Current state join: customer_segment_at_order = 'premium' ✗ (wrong!)
The canonical Daana temporal join pattern: When joining to Type 2 dimensions, always use timestamp BETWEEN valid_from AND valid_to. This pattern appears throughout Daana consumption layers and ensures you get the dimension state as it was at the relevant point in time.
Best practices:
- Build factless fact first (Pattern 3), then add measures
- Use point-in-time joins (
BETWEEN valid_from AND valid_to) for Type 2 dimensions - Keep measures atomic when possible (store quantity and price separately)
- Calculate derived metrics (margin, points) in the fact table for query performance
Pattern 5: Periodic Snapshot Facts
What it is: Fact table with one row per entity per time period (daily, weekly, monthly), capturing state and metrics at regular intervals.
When to use:
- Daily customer status snapshots
- Weekly inventory levels
- Monthly account balances
- "What were ALL customers' states on 2024-06-15?"
The pattern: Expand temporal ranges into individual time period rows using date arrays, then join aggregated metrics.
Dialect: the example below uses BigQuery's
GENERATE_DATE_ARRAYandCROSS JOIN UNNEST. The PostgreSQL equivalent isgenerate_series(start_date, end_date, INTERVAL '1 day')joined withCROSS JOIN LATERAL, returning the same one-row-per-day shape; everything downstream of the expansion CTE is portable.
How to build (BigQuery):
CREATE OR REPLACE TABLE datamart.fact_customer_daily_snapshot AS (
-- Step 1: Get customer status ranges from historized view
WITH customer_status_periods AS (
SELECT
c.CUSTOMER_KEY AS customer_id,
c.CUSTOMER_SEGMENT,
c.CUSTOMER_STATUS,
c.CUSTOMER_STATUS_STARTED_DATE,
CASE
-- If status is still current, use today as end
WHEN c.CUSTOMER_STATUS_ENDED_DATE = DATE('9999-12-31')
THEN CURRENT_DATE()
ELSE c.CUSTOMER_STATUS_ENDED_DATE
END AS customer_status_ended_date,
c.CUSTOMER_REGISTRATION_DATE
FROM DAANA_DW.VIEW_CUSTOMER_HIST c
),
-- Step 2: Aggregate event-level metrics by date
customer_orders_daily AS (
SELECT
DATE(o.ORDER_PURCHASE_TS) AS snapshot_date,
o.CUSTOMER_KEY AS customer_id,
COUNT(DISTINCT o.ORDER_KEY) AS number_of_orders,
SUM(o.ORDER_AMOUNT) AS total_order_amount
FROM DAANA_DW.VIEW_ORDER o
GROUP BY DATE(o.ORDER_PURCHASE_TS), o.CUSTOMER_KEY
),
-- Step 3: Expand status ranges into daily rows
customer_status_expanded AS (
SELECT
snapshot_date,
customer_id,
CUSTOMER_SEGMENT,
CUSTOMER_STATUS,
DATE_DIFF(snapshot_date, CUSTOMER_STATUS_STARTED_DATE, DAY)
AS number_of_consecutive_status_days,
CASE
WHEN CUSTOMER_REGISTRATION_DATE = snapshot_date THEN 1
ELSE 0
END AS customer_registration_flag,
CASE
WHEN RANK() OVER (PARTITION BY customer_id, CUSTOMER_STATUS
ORDER BY snapshot_date) = 1
THEN 1
ELSE 0
END AS status_change_flag
FROM customer_status_periods
-- Key technique: Explode date range into individual rows
CROSS JOIN UNNEST(
GENERATE_DATE_ARRAY(
CUSTOMER_STATUS_STARTED_DATE,
customer_status_ended_date
)
) AS snapshot_date
)
-- Step 4: Combine dimensional state with aggregated metrics
SELECT
/* Primary Key */
cse.snapshot_date,
cse.customer_id,
/* Dimensional Attributes */
cse.CUSTOMER_SEGMENT,
cse.CUSTOMER_STATUS,
cse.number_of_consecutive_status_days,
/* Event Flags */
cse.customer_registration_flag,
cse.status_change_flag,
/* Metrics */
COALESCE(ord.number_of_orders, 0) AS number_of_orders,
COALESCE(ord.total_order_amount, 0) AS total_order_amount
FROM customer_status_expanded cse
LEFT JOIN customer_orders_daily ord
ON cse.customer_id = ord.customer_id
AND cse.snapshot_date = ord.snapshot_date
)
How it works:
- Get temporal ranges from
VIEW_CUSTOMER_HIST(status periods) - Aggregate event metrics by date (orders per day)
- Expand ranges into daily rows using
GENERATE_DATE_ARRAY+CROSS JOIN UNNEST - Join dimensional state with metrics to create final snapshot
Result grain: One row per customer per day with segment, status, flags, and order metrics.
Query pattern:
-- All premium customers on a specific date with their orders
SELECT
customer_id,
customer_segment,
customer_status,
number_of_orders,
total_order_amount
FROM datamart.fact_customer_daily_snapshot
WHERE snapshot_date = '2024-06-15'
AND customer_segment = 'premium'
When to use:
- Regular reporting cadence (daily, weekly, monthly)
- Trend analysis over time
- State-based aggregations ("how many active users per day")
Part 3: Query Guidance
Quick Reference
| Object | Query? | Purpose |
|---|---|---|
view_{entity} | Yes | Current state, one row per entity |
view_{entity}_hist | Yes | Full history, point-in-time queries |
view_{entity}_with_rel | Yes | Current state with relationships pre-joined |
f_{entity}(p_eff_tmstp) | Yes | Parametric point-in-time function (default: latest) |
| Your consumption layer | Yes | Dimensions and facts you build from the above |
{entity}_focal | Avoid | Internal, one row per surrogate key (often empty) |
{entity}_idfr | Avoid | Internal, business id to surrogate mapping (empty without allow_multiple_identifiers) |
{entity}_desc | Avoid in queries; OK for debugging | Internal, key-value attribute storage |
v_{entity}_desc | OK for debugging | Debugging view over _desc with attribute names (prefixed) |
{entity1}_{entity2}_x | Avoid | Internal, use view_with_rel or build factless facts |
Why Views Over Physical Tables?
Daana's views provide:
- Transposition - Key-value pairs become familiar table columns
- Abstraction - Physical implementation can change without breaking queries
- Performance - Views are optimized for common query patterns
- Simplicity - No need to understand TYPE_KEY mapping or join logic
The principle: You shouldn't need to learn ensemble modeling to use Daana's output. The views hide that complexity.
Part 4: Physical Implementation (Optional Deep Dive)
You don't need to understand ensemble modeling to use Daana's views. But if you're curious about what's under the hood, here's how it works.
For each entity, Daana creates up to four physical tables following the Focal Framework pattern. These tables store data in a generic, flexible structure that the views transpose into user-friendly schemas.
Physical Tables Created
1. FOCAL Table: {entity}_focal
Purpose: Maintains exactly one row per entity instance, identified by the surrogate key. While the IDFR table may contain multiple rows for the same entity (one per business identifier), the FOCAL table consolidates these into a single authoritative row.
When you'll see it: Rarely queried directly; the views handle the complexity.
When it is populated. With the standard ingestion strategies (FULL, FULL_LOG, TRANSACTIONAL), surrogate keys are derived directly from descriptor rows via f_<entity>(), and the focal table stays empty. INCREMENTAL pipelines, IDFR-driven entities (allow_multiple_identifiers: true), and explicit entity-lifecycle events populate the focal table; querying it on the tutorial's PostgreSQL warehouse will return zero rows by design.
Structure:
customer_focal
├── customer_key -- Primary surrogate key (unique per entity instance)
├── inst_key
├── inst_row_key
└── popln_tmstp
2. IDFR Table: {entity}_idfr
Always created. The table exists for every entity, but it is only populated when a mapping group sets allow_multiple_identifiers: true (see Multiple Identifiers). With the default single-identifier setup, the IDFR table stays empty and Daana derives surrogate keys directly from the descriptor table via the f_<entity>() function.
Purpose: Manages key integration by mapping multiple business identifiers to a single stable surrogate key. Real entities often have multiple IDs (internal ID, SSN, loyalty number, partner ID) - this table maps all of them to one customer_key.
Why it matters: Enables the system to recognize the same entity regardless of which identifier is used in source systems.
Structure:
customer_idfr
├── customer_idfr -- Business identifier value (email, SSN, etc.)
├── eff_tmstp -- Effective timestamp (when became valid)
├── ver_tmstp -- Version timestamp (for bi-temporal support)
├── row_st -- Row status (Y=Active, N=Inactive)
├── customer_key -- Links to FOCAL table
├── inst_key
├── inst_row_key
└── popln_tmstp
When populated, the IDFR table holds one row per identifier value, all rows for the same real-world entity sharing one customer_key. A query against a customer that integrates two source systems looks like:
SELECT customer_idfr, customer_key, row_st
FROM daana_dw.customer_idfr
WHERE customer_key = 'UUID-AAA'
ORDER BY customer_idfr;
customer_idfr | customer_key | row_st
-------------------+--------------+--------
C-12345 | UUID-AAA | Y
L-A789 | UUID-AAA | Y
The CRM's crm_customer_id (C-12345) and the legacy ERP's legacy_id (L-A789) both resolve to the same customer_key. Queries that need to fan out to every business identifier for an entity join the user-facing view_customer to customer_idfr on customer_key. See Multiple Identifiers for the resolution mechanics and the requirement that overlapping sources share at least one identifier per row.
3. DESC Table: {ENTITY}_DESC
Purpose: Stores all descriptive attributes in a flexible, typed structure using generic value columns. Instead of creating fixed columns for each attribute, it uses TYPE_KEY to identify what data is stored in each row.
Why this design: Enables non-destructive development. Adding new attributes requires only new TYPE_KEY definitions and new rows - no schema changes, no ALTER TABLE statements.
How views use it: Views join on TYPE_KEY to transpose these key-value rows into familiar table columns.
Structure:
customer_desc
├── customer_key -- Links to FOCAL table
├── type_key -- Semantic key identifying which attribute this row stores
├── seq_nbr -- Sequence number (used in delta detection hash computations)
├── eff_tmstp -- Effective timestamp
├── ver_tmstp -- Version timestamp
├── row_st -- Row status (Y=Active, N=Inactive)
├── sta_tmstp -- Mappable column for timestamp data (start/from)
├── end_tmstp -- Mappable column for timestamp data (end/to)
├── val_str -- Mappable column for string data
├── val_num -- Mappable column for numeric data
├── uom -- Unit of measure (classifies val_str or val_num)
├── uom_type -- Unit of measure type
├── data_key
├── inst_key
├── inst_row_key
└── popln_tmstp
Example: A single customer_key might have multiple rows in customer_desc:
- One row with
type_key=101storingfirst_nameinval_str - Another row with
type_key=102storingemailinval_str - Another row with
type_key=103storingregistration_dateinsta_tmstp
The view joins these rows and pivots them into columns: first_name, email, registration_date.
4. Relationship Table: {ENTITY1}_{ENTITY2}_X
Purpose: Captures temporal many-to-many relationships between entities. A single relationship table can store multiple relationship types using TYPE_KEY.
Example: ORDER_CUSTOMER_X might contain different relationship types:
- PLACED_BY (customer who placed the order)
- SHIPPED_TO (customer receiving shipment)
- RETURNED_BY (customer who returned the order)
Structure:
order_customer_x
├── order_key -- Surrogate key for driving entity (focal01_key)
├── customer_key -- Surrogate key for related entity (focal02_key)
├── type_key -- Identifies relationship type
├── eff_tmstp -- Effective timestamp
├── ver_tmstp -- Version timestamp
├── row_st -- Row status
├── data_key
├── inst_key
├── inst_row_key
└── popln_tmstp
Understanding Key Columns
TYPE_KEY - The Semantic Link
What it is: A foreign key to the metadata model that identifies which business attribute (atomic-context) is stored in this row. It links physical data to logical meaning.
Why it matters: This enables views to transpose generic physical rows into meaningful business attributes. When you query VIEW_CUSTOMER, the view joins on TYPE_KEY to translate data into familiar column format.
In practice: You'll never query on TYPE_KEY directly - the views handle this for you. But understanding it helps when debugging or exploring the physical layer.
Three Timestamp Types
Daana tracks three types of timestamps for different purposes:
EFF_TMSTP (Effective Timestamp): Business time - when this version became valid in the real world. Used for point-in-time queries ("What was the customer's segment on 2024-06-15?").
VER_TMSTP (Version Timestamp): System time - when this version was created in the warehouse. Supports bi-temporal tracking (multiple versions at the same effective time). Used for audit trails and understanding when your knowledge of the data changed.
POPLN_TMSTP (Population Timestamp): Load time - when this row was physically inserted into the table. Used for operational monitoring and pipeline debugging.
Why three timestamps matter: They enable you to answer different questions:
- "What was true at this business date?" → EFF_TMSTP
- "What did we know at this system date?" → VER_TMSTP
- "When did we load this data?" → POPLN_TMSTP
Audit and Metadata Columns
DATA_KEY (Data Key):
DATA_KEY isolates each pipeline's data within shared tables. Computed as hash(PROC_IDFR), where PROC_IDFR encodes the pipeline's identity (entity, target table, source table, code pattern). When truedelta compares source data against the target table, it filters by DATA_KEY so each pipeline only sees its own rows. This enables multiple mapping groups to write to the same entity table without interfering with each other — each group gets a unique DATA_KEY, and truedelta operates independently per group.
Present in: DESC, X tables · Stability: per-pipeline
INST_KEY (Instance Key):
INST_KEY (internally PROCINST_KEY) is a unique identifier for each pipeline execution. A new value is generated every time a pipeline runs. Use this to trace which rows were written by which execution, or to join against the metadata table for full pipeline lineage (execution time, status, batch window).
Present in: DESC, X, IDFR, FOCAL tables · Stability: per-execution · Join to: PROCINST_DESC
INST_ROW_KEY (Instance Row Key):
INST_ROW_KEY identifies the pipeline template (not the execution). Unlike INST_KEY which changes every run, INST_ROW_KEY remains stable across executions of the same pipeline. It encodes the workflow identity and is useful for grouping rows by pipeline definition rather than by individual run.
Present in: DESC, X, IDFR, FOCAL tables · Stability: per-pipeline
ROW_ST (Row Status):
ROW_ST tracks the lifecycle of each row. Active rows have ROW_ST = 'Y'. When truedelta detects that a source value has changed, it sets the previous row to ROW_ST = 'N' (soft delete) and inserts a new row with the updated value. The semantic view filters to ROW_ST = 'Y' to show only current state.
Present in: DESC, X, IDFR tables · Stability: per-row
SEQ_NBR (Sequence Number): Sequence identifier used in hash computations for delta detection.
UOM (Unit of Measure): Classifies what kind of data resides in VAL_STR or VAL_NUM (e.g., "USD", "EUR", "kg", "meters").
Operational Lineage
Every physical table includes inst_key, enabling you to trace which pipeline loaded any specific row:
-- Find the SQL that loaded a specific customer row
SELECT DISTINCT pd.val_str
FROM daana_metadata.procinst_desc pd
INNER JOIN daana_dw.customer_desc cd
ON cd.inst_key = pd.procinst_key
WHERE cd.customer_key = 'CUST_12345';
This operational lineage is invaluable for debugging and understanding data provenance.
Next Steps
- Explore the views: After running
daana-cli execute, querySELECT * FROM DAANA_DW.VIEW_{ENTITY}to see your entities - Build Type 1 dimensions: Use
VIEW_{ENTITY}directly or with light transformation - Build Type 2 dimensions: Apply the LEAD window pattern to
VIEW_{ENTITY}_HIST - Build facts: Start with transaction facts from entity views + measures
- Advanced patterns: Factless facts and periodic snapshots when needed
Related Documentation
- DMDL: Model Definition - How to define entities
- DMDL: Mapping - How to map source data to entities
- Commands - All available commands
- Focal Framework - Deep dive into the Focal Framework