Reference

Glossary

A comprehensive reference of key terms and concepts in Daana CLI.

Core Concepts

The fundamental DMDL building blocks.

DMDL (Daana Model Description Language)

The YAML-based declarative modeling system used to define business data models. DMDL allows you to describe entities, attributes, and relationships in a human-readable format that compiles to executable JSON.

model:
id: "MY_MODEL"
name: "MyModel"
entities:
- id: "CUSTOMER"
name: "CUSTOMER"
attributes:
- id: "CUSTOMER_ID"
name: "CUSTOMER_ID"
type: "STRING"

See also: Model, Entity, Attribute, Relationship

Model

The top-level container for your data definition. Every DMDL file starts with a model declaration that includes metadata about your business data model, entities, attributes, and relationships.

See also: DMDL (Daana Model Description Language), Entity

Entity

A business object representing a real-world concept like CUSTOMER, ORDER, or DEVICE. Entities are the primary building blocks of your data model and contain attributes that describe their properties.

Key characteristics

  • Must have both id and name fields
  • Contains one or more attributes
  • Can participate in relationships with other entities

See also: Attribute, Relationship, id vs name

Attribute

A property of an entity with a specific data type (STRING, NUMBER, START_TIMESTAMP, END_TIMESTAMP, etc.). Each attribute represents a piece of information about an entity.

Examples

  • CUSTOMER_ID (STRING)
  • ORDER_VALUE (NUMBER)
  • ORDER_PURCHASE_TS (START_TIMESTAMP)

See also: Entity, Grouped Attributes, effective_timestamp

Relationship

A connection between two entities that defines how they relate to each other. Relationships describe the associations in your business model. Example: an ORDER IS_PLACED_BY a CUSTOMER.

Required fields

  • name: Display name for the relationship
  • definition: What this relationship represents
  • source_entity: The entity where the relationship originates
  • target_entity: The entity where the relationship points

Optional fields

  • description: Detailed explanation of the relationship

See also: Entity

Database Architecture

How Daana relates to your data warehouse.

Target Database

Your data warehouse database that contains both source data and transformed business entities. This is where your BI tools will connect.

What's stored here

  • Source data (e.g., olist_orders, olist_customers tables)
  • Transformed data in the daana_dw schema
  • Business-ready tables generated by Daana transformations

See also: connection profiles, Connection Profile, daana_dw Schema

Model Elements

Field-level concepts inside a DMDL model.

id vs name

Both fields are required on entities, attributes, and most other model objects. They serve different roles.

- id: "CUSTOMER_EMAIL"
name: "CUSTOMER_EMAIL"
definition: "Customer's email address"
type: "STRING"

When to use each

  • id: Stable internal identifier. Used for code references and cross-links.
  • name: Display name for human readability. Often the same as id for clarity.

effective_timestamp

A boolean flag (true/false) that marks an attribute as tracking changes over time. When set to true, Daana tracks the historical values of this attribute.

Use cases

  • Customer addresses (they move)
  • Order statuses (pending → shipped → delivered)
  • Product prices (they fluctuate)

Enables questions like

  • "What was this customer's city in 2017?"
  • "How did this order's status change over time?"

See also: Attribute

Grouped Attributes

Related fields bundled together to maintain context. Used when multiple attributes represent different aspects of the same concept.

- id: "ORDER_VALUE"
name: "ORDER_VALUE"
effective_timestamp: true
group:
- id: "ORDER_VALUE_AMOUNT"
name: "ORDER_VALUE_AMOUNT"
definition: "Monetary value of the order"
type: "NUMBER"
- id: "ORDER_VALUE_CURRENCY"
name: "ORDER_VALUE_CURRENCY"
definition: "Currency of the order value"
type: "STRING"

Common use cases

  • Amount + Currency
  • Address components (street, city, state, zip)
  • Name components (first name, last name, middle initial)

See also: Attribute

Workflow Concepts

The five lifecycle commands of a Daana project.

Init

Create a new project with template files. This sets up the recommended directory structure and creates starter files for your model, workflow, and connections.

daana-cli init my-project

What's created

  • model.yaml — Your data model definition
  • workflow.yaml — Workflow orchestration
  • connections.yaml — Database connection profiles
  • mappings/ — Directory for mapping files

Install

Set up the Daana framework in your target database. This creates the necessary schemas and infrastructure that power the transformation engine.

daana-cli install

What happens

  • Transformation infrastructure is created in the target database
  • System tables for tracking workflows are set up

See also: Focal Framework

Check

Validate your configurations before deployment. Catches errors early and ensures your model, mappings, workflow, and connections are properly configured.

daana-cli check model
daana-cli check mapping
daana-cli check workflow
daana-cli check connections

Deploy

Deploy your workflow to the target database. Reads your YAML configurations, compiles them, and sets up the transformation infrastructure in your data warehouse.

daana-cli deploy

What happens

  • Model and mappings are validated and compiled
  • daana_dw schema is created in the target database
  • Transformation procedures are generated
  • Target tables are set up for business entities

See also: daana_dw Schema

Execute

Run data transformations. Reads source data from the target database, applies transformation rules, and writes clean business entities to the daana_dw schema.

daana-cli execute

What happens

  • Source data is read from raw tables
  • Transformations are applied
  • Business entities are written to daana_dw schema
  • BI-ready tables are created

See also: Business Entity, daana_dw Schema

Configuration

Connection profiles and CLI configuration.

Connection Profile

A named database connection configuration that defines how to connect to your data warehouse. Connection profiles let you manage multiple environments (dev, staging, production) in a single YAML file.

connections:
dev:
type: "postgresql"
host: "localhost"
port: 5432
user: "dev"
password: "devpass"
database: "customerdb"
sslmode: "disable"
target_schema: "daana_dw"
production:
type: "postgresql"
host: "prod.example.com"
port: 5432
user: "${DW_USER}"
password: "${DW_PASSWORD}"
database: "analytics"
sslmode: "require"
target_schema: "daana_dw"

Use in commands

  • daana-cli deploy --connections connections.yaml --connection dev

See also: Connection profiles reference, Target Database

Configuration File

The ~/.daana/config.yaml file that contains your application preferences such as logging levels and default file paths.

app:
log_level: "info"
paths:
model: "model.yaml"
workflow: "workflow.yaml"
connections: "connections.yaml"
mappings: "mappings/"

See also: Configuration reference, Configuration Priority

Configuration Priority

The order in which Daana CLI resolves configuration values, highest to lowest:

Resolution order

  • Command-line flags
  • Environment variables (e.g., APP_LOG_LEVEL=debug)
  • Configuration file (~/.daana/config.yaml)
  • Default values

See also: Configuration File

Data Transformation

How source data becomes business entities.

Mapping

The definition of how source data fields connect to business model attributes. Mappings tell Daana which columns in your raw tables should populate which attributes in your business entities. Example: mapping the customer_id column from olist_customers to the CUSTOMER_ID attribute in the CUSTOMER entity.

See also: Mapping reference, Mapping Group, Attribute

Workflow

An orchestration definition that links together a business model, one or more mappings, a connection profile, and execution parameters. Workflows define the complete end-to-end transformation from source data to business entities.

See also: Workflow reference, Mapping, Connection Profile, Execute

daana_dw Schema

The default schema name in your target database where transformed business entities are written. This is where your BI tools should connect to access clean, analytics-ready data.

Contains

  • Business entity tables (e.g., CUSTOMER, ORDER)
  • Transformation views
  • Helper functions

See also: Target Database, Business Entity

Advanced Concepts

The underlying engine and advanced patterns.

Focal Framework

The underlying transformation engine that Daana CLI wraps. Focal provides the execution logic for data transformations. The daana-cli install command installs Focal into your target database.

See also: Focal Framework concepts, Install

Template Generation

The automatic creation of YAML scaffolds for mappings and workflows. Templates reduce boilerplate and ensure correct structure.

daana-cli generate mapping --model model.json --entity CUSTOMER -o mapping.yaml
daana-cli generate workflow -o workflow.yaml

Mapping Group

A container within a mapping that holds one or more source tables and their relationship definitions. Multiple mapping groups within the same mapping enable sourcing the same entity from independent data systems. Each group generates separate pipelines with unique DATA_KEY values, ensuring true delta isolation.

See also: Mapping

Common Terminology

Industry-standard data and analytics terms.

BI (Business Intelligence)

Analytics tools and dashboards that query your data warehouse to provide business insights. Examples: Tableau, Power BI, Looker, Metabase.

ETL (Extract, Transform, Load)

The traditional process of moving data from source systems to a data warehouse. Daana CLI provides a modern, declarative approach to ETL.

Data Warehouse

A centralized repository of integrated data from multiple sources, optimized for analytics and reporting. In Daana CLI, this is your target database.

See also: Target Database

Source Data

Raw operational data from your business systems. In the Olist tutorial, this includes tables like olist_orders, olist_customers, olist_products, etc.

Business Entity

A clean, analytics-ready representation of a business concept. Business entities live in the daana_dw schema and are what BI tools query.

See also: Entity, daana_dw Schema


Data Types

STRING

Use STRING for any text-based information in your model, including names, descriptions, addresses, email addresses, and even text-based identifiers.

Examples:

  • CUSTOMER_NAME (Full name of the customer)
  • EMAIL (Customer email address)
  • ORDER_NUMBER (Human-readable order identifier)
  • PRODUCT_DESCRIPTION (Detailed product information)

NUMBER

Use NUMBER for any numeric values in your model, including quantities, amounts, measurements, counts, and percentages. This type handles both integers and decimal numbers.

Examples:

  • ORDER_AMOUNT (Total order value)
  • QUANTITY (Number of items)
  • PRICE (Unit price)
  • DISCOUNT_PERCENTAGE (Discount rate applied)

⚠️ Important: For monetary amounts with currency, consider using UNIT type or grouped attributes to keep amount and currency together.

UNIT

Use UNIT when you need to track both a numeric value and its unit of measurement together. This is essential for monetary amounts with currency, measurements with units, or any quantity that needs a denominator. UNIT types are always defined as grouped attributes.

Examples:

  • ORDER_AMOUNT (with currency) (Order value with currency code)
  • PRODUCT_WEIGHT (Product weight with unit (kg, lbs))
  • DISTANCE (Distance traveled with unit (km, miles))

⚠️ Important: UNIT types must always be defined using the 'group' field with at least two attributes: the numeric value and the unit identifier.

START_TIMESTAMP

START_TIMESTAMP is your default choice for any timestamp attribute. Use it for single timestamps, when events occur, or when something begins. This type tells Daana's transformation engine how to handle temporal data correctly.

Examples:

  • ORDER_PURCHASE_TS (When customer completed the purchase)
  • ACCOUNT_CREATED_AT (When customer account was created)
  • EVENT_START_TIME (When event began)
  • SUBSCRIPTION_START_DATE (When subscription activated)

⚠️ Important: This is the generic timestamp type - use it whenever you have a single point in time. For tracking periods or durations, use START_TIMESTAMP together with END_TIMESTAMP.

END_TIMESTAMP

Use END_TIMESTAMP to mark the end of a time period or the completion of an event. This type is always used together with START_TIMESTAMP to define durations or periods. It tells the transformation engine that this timestamp represents a completion or end point.

Examples:

  • ORDER_DELIVERED_TS (When order was delivered to customer)
  • EVENT_END_TIME (When event concluded)
  • SUBSCRIPTION_END_DATE (When subscription expired)
  • CONTRACT_END_DATE (When contract expires)

⚠️ Important: END_TIMESTAMP is always used together with START_TIMESTAMP to define periods. For single timestamps, use START_TIMESTAMP instead.


Framework Columns

Audit and lineage columns present across all generated tables.

DATA_KEY

DATA_KEY isolates each pipeline's data within shared tables. Computed as hash(PROC_IDFR), where PROC_IDFR encodes the pipeline's identity (entity, target table, source table, code pattern). When truedelta compares source data against the target table, it filters by DATA_KEY so each pipeline only sees its own rows. This enables multiple mapping groups to write to the same entity table without interfering with each other — each group gets a unique DATA_KEY, and truedelta operates independently per group.

Present in: DESC, X tables · Stability: per-pipeline

INST_KEY

INST_KEY (internally PROCINST_KEY) is a unique identifier for each pipeline execution. A new value is generated every time a pipeline runs. Use this to trace which rows were written by which execution, or to join against the metadata table for full pipeline lineage (execution time, status, batch window).

Present in: DESC, X, IDFR, FOCAL tables · Stability: per-execution · Join to: PROCINST_DESC

INST_ROW_KEY

INST_ROW_KEY identifies the pipeline template (not the execution). Unlike INST_KEY which changes every run, INST_ROW_KEY remains stable across executions of the same pipeline. It encodes the workflow identity and is useful for grouping rows by pipeline definition rather than by individual run.

Present in: DESC, X, IDFR, FOCAL tables · Stability: per-pipeline

ENTITY_KEY

ENTITY_KEY is the surrogate key that uniquely identifies each business entity. The actual column name is entity-specific: customer_entity_key, order_entity_key, etc. In relationship (X) tables, the columns are named focal01_key and focal02_key for the source and target entities respectively. The key is derived from the primary key expression defined in the mapping and remains stable across pipeline runs. All attribute rows for the same entity share the same key, enabling the semantic view to pivot them into a single entity row.

Present in: DESC, X, IDFR, FOCAL tables · Stability: per-entity

TYPE_KEY

TYPE_KEY maps each row in the DESC and X tables to its attribute or relationship definition. Each attribute in the model gets a unique TYPE_KEY value (derived from the atomic context). The semantic view function translates TYPE_KEY values into named columns, pivoting the generic table structure into a readable entity view.

Present in: DESC, X tables · Stability: per-pipeline

EFF_TMSTP

EFF_TMSTP records the business-meaningful time when a value became effective. For event-sourced data, this is typically the event timestamp. For snapshot data, it reflects when the source system recorded the change. The semantic view uses EFF_TMSTP to determine the latest state of each entity attribute.

Present in: DESC, X, IDFR tables · Stability: per-row

VER_TMSTP

VER_TMSTP is the bi-temporal system timestamp — when the data warehouse physically recorded this row. Unlike EFF_TMSTP (business time), VER_TMSTP reflects processing time. Together they enable point-in-time queries: EFF_TMSTP answers 'when did it happen?' while VER_TMSTP answers 'when did we know about it?'

Present in: DESC, X, IDFR tables · Stability: per-row

ROW_ST

ROW_ST tracks the lifecycle of each row. Active rows have ROW_ST = 'Y'. When truedelta detects that a source value has changed, it sets the previous row to ROW_ST = 'N' (soft delete) and inserts a new row with the updated value. The semantic view filters to ROW_ST = 'Y' to show only current state.

Present in: DESC, X, IDFR tables · Stability: per-row

POPLN_TMSTP

POPLN_TMSTP records the physical write time of each row. Defaults to the current timestamp at INSERT time (dialect-specific: LOCALTIMESTAMP for PostgreSQL, CURRENT_TIMESTAMP() for BigQuery/Snowflake). Useful for operational monitoring — if POPLN_TMSTP values cluster, the pipeline ran in a single batch; if they span hours, the pipeline may have been slow or restarted.

Present in: DESC, X, IDFR, FOCAL tables · Stability: per-row


Quick Reference

Complete Workflow Sequence

1. daana-cli init2. daana-cli install3. (Edit model.yaml and connections.yaml to define your data)4. daana-cli generate5. (Edit mappings to connect to your data sources)6. daana-cli check7. daana-cli deploy8. daana-cli execute

File Types

  • .yaml or .yml: All definitions (models, mappings, workflows, connections)

Common Ports (Development)

  • 5432: Target database (data warehouse)

Need more help? Check out: