DMDL Reference
Workflow
Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.
Workflow Structure
A workflow file has four main components:
- Workflow metadata - ID, name, and description of the workflow
- Model reference - Path to the data model file
- Mappings list - Ordered list of mapping files to execute
- Connection profile - Which database connection to use
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into business-ready entities
model: models/ecommerce-model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/order-mapping.yaml
connection: dev
Tip: Generate a workflow template with
daana-cli generate workflow -o workflow.yamlto get started quickly.
Workflow Metadata
Every workflow needs identifying information:
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into analytics-ready format
description: |
This workflow processes customer and order data from the
operational database and transforms it into business entities
for analytics and reporting.
| Field | Purpose |
|---|---|
id | Unique identifier. Used to reference the workflow in commands. |
name | Human-readable name. Displayed in logs and status reports. |
definition | One-line summary of what the workflow does. |
description | Optional detailed explanation with business context. Supports multi-line text. |
Tip: Choose a workflow ID that clearly reflects the business domain (e.g.,
ECOMMERCE_WORKFLOW,customer_analytics).
Model Reference
The model field points to your data model file:
workflow:
model: models/ecommerce-model.yaml
You can reference either:
- YAML files (
.yaml,.yml) - Will be auto-compiled during workflow execution - JSON files (
.json) - Pre-compiled model for production use
# Development: use YAML for easier editing
model: models/ecommerce-model.yaml
# JSON is optional if you already have a compiled model
model: models/ecommerce-model.json
Best Practice: Use YAML unless you have a specific reason to maintain pre-compiled JSON files.
Mappings
The mappings field is an ordered list of mapping file paths:
workflow:
mappings:
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
- mappings/order-mapping.yaml
- mappings/order-item-mapping.yaml
Mapping Order Matters
Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:
mappings:
# 1. Independent entities first (no foreign keys)
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
# 2. Then entities that reference the above
- mappings/order-mapping.yaml # References CUSTOMER
# 3. Finally, entities that reference those
- mappings/order-item-mapping.yaml # References ORDER and PRODUCT
Rule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.
Connection Profile
The connection field specifies which database connection profile to use:
workflow:
connection: dev
This references a named profile from your connections.yaml file:
# connections.yaml
connections:
dev:
type: postgresql
host: localhost
port: 5432
user: dev
password: devpass
database: customerdb
production:
type: postgresql
host: prod.example.com
# ...
See Connections for details on configuring connection profiles.
Batch Processing
For large datasets, configure batch processing in the advanced section. This enables incremental data loading for pipelines using ingestion_strategy: INCREMENTAL or TRANSACTIONAL.
Basic Batch Configuration
Set batch_expression to the timestamp column that tracks new data in your source tables. Daana auto-generates the filter — no SQL needed.
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms e-commerce data with incremental loading
model: models/ecommerce-model.yaml
mappings:
- mappings/order-mapping.yaml
connection: production
advanced:
batch_expression: updated_at
batch_expression
The timestamp column used to filter source data during incremental processing. When set, the framework auto-generates the batch filter SQL.
advanced:
batch_expression: updated_at
Common choices:
updated_at- Standard timestamp columningest_ts- Ingestion timestamp from data lakebatch_date- Batch date for periodic loads
Per-Table Override
When your source tables use different timestamp column names, override batch_expression on individual tables in the mapping YAML:
# mapping.yaml
tables:
- table: batch.daily_usage
batch_expression: batch_date # This table uses batch_date
ingestion_strategy: INCREMENTAL
attributes:
- id: USAGE_MINUTES
transformation_expression: minutes_watched
- table: streaming.events
# No batch_expression — uses workflow default (updated_at)
ingestion_strategy: TRANSACTIONAL
attributes:
- id: EVENT_TYPE
transformation_expression: event_type
Which Pipelines Use Batch Filtering?
| Ingestion Strategy | Batch filter applied? | Why |
|---|---|---|
INCREMENTAL | Yes | Only reads new data via watermark |
TRANSACTIONAL | Yes | No delta detection — without filtering, rows are duplicated |
FULL | No | Must read all data for correct change detection |
FULL_LOG | No | Must read all data for soft-delete detection |
IDFR pipelines always apply batch filtering regardless of strategy — they skip previously registered identifiers.
Important: When setting
batch_expressionat workflow level, ensure the column exists on all source tables — including those used by multi-IDFR entities. If a source table lacks the column, the pipeline will fail with a SQL error. Use per-tablebatch_expressionoverrides when different tables have different timestamp columns, or omit the workflow-level setting and only configure it per-table.
Automatic Batch Tracking
Daana tracks batch execution in a metadata table (batch_history). Each pipeline records its batch window with a status lifecycle:
- First run - Processes all historical data (from epoch to now)
- Subsequent runs - Automatically continues from where the last successful batch ended
- Failed runs - Do not advance the watermark; the next run retries the same window
- Manual override - Use
--batch-startand--batch-endflags for custom ranges
# Automatic: continues from last successful batch
daana-cli execute
# Manual: specify exact range
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"
# Force: override stale run detection
daana-cli execute --force
Workflow Commands
Generate Template
Create a new workflow file from a template:
daana-cli generate workflow -o workflow.yaml
Check Workflow
Validate your workflow configuration before deploying:
daana-cli check workflow workflow.yaml
This validates:
- Workflow YAML structure
- Model file exists and is valid
- Mapping files exist and are valid
- Connection profile exists
Execute Workflow
Run the data transformation:
# Full execution
daana-cli execute
# With batch parameters (for incremental loading)
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"
Complete Example
Here's a complete workflow showing all components together:
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into analytics-ready format
description: |
This workflow processes customer and order data from the
operational database and transforms it into business entities
for analytics and reporting.
model: models/ecommerce-model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
- mappings/order-mapping.yaml
- mappings/order-item-mapping.yaml
connection: production
advanced:
batch_expression: updated_at
Quick Reference
Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.
Workflow Fields
| Field | Type | Required | Description |
|---|---|---|---|
| id | string | ✓ | Unique identifier for the workflow |
| name | string | ✓ | Human-readable name for the workflow |
| definition | string | ✓ | One-line description of the workflow purpose |
| description | string | ○ | Detailed description of the workflow |
| connection | string | ✓ | Connection profile to use for execution |
| model | ModelReference | ✓ | Path to the data model file |
| mappings | list of string | ✓ | List of mapping files to execute ⚠ Order matters: list parent entities before children |
| advanced | object | ○ | Advanced workflow settings |
| └advanced.batch_expression | string | ○ | Batch filtering expression for incremental source reads ⚠ Per-table override: set batch_expression on individual tables in your mapping YAML⚠ If value contains ${BATCH_START} or ${BATCH_END}, it's used as raw SQL (auto-detected)⚠ FULL and FULL_LOG pipelines ignore this setting (they read all data) |
| └advanced.batch_stale_timeout | string | ○ | Timeout for stale pipeline run detection ⚠ Uses Go duration format (e.g., 8h, 30m, 1h30m)⚠ Use --force to override stale detection during execution |
✓ = required, ○ = optional
Best Practices
- Use descriptive workflow IDs - Choose names that reflect the business domain (e.g.,
ECOMMERCE_WORKFLOW,CUSTOMER_ANALYTICS) - Order mappings by dependency - Independent entities first, then dependent ones
- Validate before deploying - Always run
daana-cli check workflowbefore deployment - Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
- Use environment-specific connections - Create separate connection profiles for dev, staging, and production
- Document your workflows - Use the
descriptionfield for detailed documentation - Keep models in YAML - Unless you have a specific reason to maintain compiled JSON files
- Test incrementally - Test incremental logic with small date ranges first
Next Steps
- Configure connections to your data sources
- Create mappings to define data transformations
- Follow the tutorial for a complete end-to-end example