DMDL Reference

Workflow

Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.

Workflow Structure

A workflow file has four main components:

  1. Workflow metadata - ID, name, and description of the workflow
  2. Model reference - Path to the data model file
  3. Mappings list - Ordered list of mapping files to execute
  4. Connection profile - Which database connection to use
workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into business-ready entities

  model: models/ecommerce-model.yaml

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml

  connection: dev

Tip: Generate a workflow template with daana-cli generate workflow -o workflow.yaml to get started quickly.

Workflow Metadata

Every workflow needs identifying information:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.
FieldPurpose
idUnique identifier. Used to reference the workflow in commands.
nameHuman-readable name. Displayed in logs and status reports.
definitionOne-line summary of what the workflow does.
descriptionOptional detailed explanation with business context. Supports multi-line text.

Tip: Choose a workflow ID that clearly reflects the business domain (e.g., ECOMMERCE_WORKFLOW, customer_analytics).

Model Reference

The model field points to your data model file:

workflow:
  model: models/ecommerce-model.yaml

You can reference either:

  • YAML files (.yaml, .yml) - Will be auto-compiled during workflow execution
  • JSON files (.json) - Pre-compiled model for production use
# Development: use YAML for easier editing
model: models/ecommerce-model.yaml

# JSON is optional if you already have a compiled model
model: models/ecommerce-model.json

Best Practice: Use YAML unless you have a specific reason to maintain pre-compiled JSON files.

Mappings

The mappings field is an ordered list of mapping file paths:

workflow:
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

Mapping Order Matters

Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:

mappings:
  # 1. Independent entities first (no foreign keys)
  - mappings/customer-mapping.yaml
  - mappings/product-mapping.yaml

  # 2. Then entities that reference the above
  - mappings/order-mapping.yaml      # References CUSTOMER

  # 3. Finally, entities that reference those
  - mappings/order-item-mapping.yaml  # References ORDER and PRODUCT

Rule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.

Connection Profile

The connection field specifies which database connection profile to use:

workflow:
  connection: dev

This references a named profile from your connections.yaml file:

# connections.yaml
connections:
  dev:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb

  production:
    type: postgresql
    host: prod.example.com
    # ...

See Connections for details on configuring connection profiles.

Batch Processing

For large datasets, configure batch processing in the advanced section. This enables incremental data loading for pipelines using ingestion_strategy: INCREMENTAL or TRANSACTIONAL.

Basic Batch Configuration

Set batch_expression to the timestamp column that tracks new data in your source tables. Daana auto-generates the filter — no SQL needed.

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms e-commerce data with incremental loading

  model: models/ecommerce-model.yaml
  mappings:
    - mappings/order-mapping.yaml
  connection: production

  advanced:
    batch_expression: updated_at

batch_expression

The timestamp column used to filter source data during incremental processing. When set, the framework auto-generates the batch filter SQL.

advanced:
  batch_expression: updated_at

Common choices:

  • updated_at - Standard timestamp column
  • ingest_ts - Ingestion timestamp from data lake
  • batch_date - Batch date for periodic loads

Per-Table Override

When your source tables use different timestamp column names, override batch_expression on individual tables in the mapping YAML:

# mapping.yaml
tables:
  - table: batch.daily_usage
    batch_expression: batch_date        # This table uses batch_date
    ingestion_strategy: INCREMENTAL
    attributes:
      - id: USAGE_MINUTES
        transformation_expression: minutes_watched

  - table: streaming.events
    # No batch_expression — uses workflow default (updated_at)
    ingestion_strategy: TRANSACTIONAL
    attributes:
      - id: EVENT_TYPE
        transformation_expression: event_type

Which Pipelines Use Batch Filtering?

Ingestion StrategyBatch filter applied?Why
INCREMENTALYesOnly reads new data via watermark
TRANSACTIONALYesNo delta detection — without filtering, rows are duplicated
FULLNoMust read all data for correct change detection
FULL_LOGNoMust read all data for soft-delete detection

IDFR pipelines always apply batch filtering regardless of strategy — they skip previously registered identifiers.

Important: When setting batch_expression at workflow level, ensure the column exists on all source tables — including those used by multi-IDFR entities. If a source table lacks the column, the pipeline will fail with a SQL error. Use per-table batch_expression overrides when different tables have different timestamp columns, or omit the workflow-level setting and only configure it per-table.

Automatic Batch Tracking

Daana tracks batch execution in a metadata table (batch_history). Each pipeline records its batch window with a status lifecycle:

  1. First run - Processes all historical data (from epoch to now)
  2. Subsequent runs - Automatically continues from where the last successful batch ended
  3. Failed runs - Do not advance the watermark; the next run retries the same window
  4. Manual override - Use --batch-start and --batch-end flags for custom ranges
# Automatic: continues from last successful batch
daana-cli execute

# Manual: specify exact range
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

# Force: override stale run detection
daana-cli execute --force

Workflow Commands

Generate Template

Create a new workflow file from a template:

daana-cli generate workflow -o workflow.yaml

Check Workflow

Validate your workflow configuration before deploying:

daana-cli check workflow workflow.yaml

This validates:

  • Workflow YAML structure
  • Model file exists and is valid
  • Mapping files exist and are valid
  • Connection profile exists

Execute Workflow

Run the data transformation:

# Full execution
daana-cli execute

# With batch parameters (for incremental loading)
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

Complete Example

Here's a complete workflow showing all components together:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.

  model: models/ecommerce-model.yaml

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

  connection: production

  advanced:
    batch_expression: updated_at

Quick Reference

Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.

Workflow Fields

FieldTypeRequiredDescription
idstring
Unique identifier for the workflow
namestring
Human-readable name for the workflow
definitionstring
One-line description of the workflow purpose
descriptionstring
Detailed description of the workflow
connectionstring
Connection profile to use for execution
modelModelReference
Path to the data model file
mappingslist of string
List of mapping files to execute
Order matters: list parent entities before children
advancedobject
Advanced workflow settings
advanced.batch_expressionstring
Batch filtering expression for incremental source reads
Per-table override: set batch_expression on individual tables in your mapping YAMLIf value contains ${BATCH_START} or ${BATCH_END}, it's used as raw SQL (auto-detected)FULL and FULL_LOG pipelines ignore this setting (they read all data)
advanced.batch_stale_timeoutstring
Timeout for stale pipeline run detection
Uses Go duration format (e.g., 8h, 30m, 1h30m)Use --force to override stale detection during execution

✓ = required, ○ = optional

Best Practices

  1. Use descriptive workflow IDs - Choose names that reflect the business domain (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS)
  2. Order mappings by dependency - Independent entities first, then dependent ones
  3. Validate before deploying - Always run daana-cli check workflow before deployment
  4. Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
  5. Use environment-specific connections - Create separate connection profiles for dev, staging, and production
  6. Document your workflows - Use the description field for detailed documentation
  7. Keep models in YAML - Unless you have a specific reason to maintain compiled JSON files
  8. Test incrementally - Test incremental logic with small date ranges first

Next Steps

Previous
Mapping