Workflow - Daana CLI Docs

Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.

Workflow Structure

A workflow file has four main components:

Workflow metadata - ID, name, and description of the workflow
Model reference - Path to the data model file
Mappings list - Ordered list of mapping files to execute
Connection profile - Which database connection to use

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into business-ready entities

  model: models/ecommerce-model.yaml

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml

  connection: dev

Tip: Generate a workflow template with daana-cli generate workflow -o workflow.yaml to get started quickly.

Workflow Metadata

Every workflow needs identifying information:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.

Field	Purpose
`id`	Unique identifier. Used to reference the workflow in commands.
`name`	Human-readable name. Displayed in logs and status reports.
`definition`	One-line summary of what the workflow does.
`description`	Optional detailed explanation with business context. Supports multi-line text.

Tip: Choose a workflow ID that clearly reflects the business domain (e.g., ECOMMERCE_WORKFLOW, customer_analytics).

Model Reference

The model field points to your data model file:

workflow:
  model: models/ecommerce-model.yaml

You can reference either:

YAML files (.yaml, .yml) - Will be auto-compiled during workflow execution
JSON files (.json) - Pre-compiled model for production use

# Development: use YAML for easier editing
model: models/ecommerce-model.yaml

# JSON is optional if you already have a compiled model
model: models/ecommerce-model.json

Best Practice: Use YAML unless you have a specific reason to maintain pre-compiled JSON files.

Mappings

The mappings field is an ordered list of mapping file paths:

workflow:
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

Mapping Order Matters

Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:

mappings:
  # 1. Independent entities first (no foreign keys)
  - mappings/customer-mapping.yaml
  - mappings/product-mapping.yaml

  # 2. Then entities that reference the above
  - mappings/order-mapping.yaml      # References CUSTOMER

  # 3. Finally, entities that reference those
  - mappings/order-item-mapping.yaml  # References ORDER and PRODUCT

Rule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.

Connection Profile

The connection field specifies which database connection profile to use:

workflow:
  connection: dev

This references a named profile from your connections.yaml file:

# connections.yaml
connections:
  dev:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb

  production:
    type: postgresql
    host: prod.example.com
    # ...

See Connections for details on configuring connection profiles.

Batch Processing

For large datasets, configure batch processing in the advanced section. This enables incremental data loading for pipelines using ingestion_strategy: INCREMENTAL or TRANSACTIONAL.

Basic Batch Configuration

Set batch_expression to the timestamp column that tracks new data in your source tables. Daana auto-generates the filter — no SQL needed.

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms e-commerce data with incremental loading

  model: models/ecommerce-model.yaml
  mappings:
    - mappings/order-mapping.yaml
  connection: production

  advanced:
    batch_expression: updated_at

batch_expression

The timestamp column used to filter source data during incremental processing. When set, the framework auto-generates the batch filter SQL.

advanced:
  batch_expression: updated_at

Common choices:

updated_at - Standard timestamp column
ingest_ts - Ingestion timestamp from data lake
batch_date - Batch date for periodic loads

Per-Table Override

When your source tables use different timestamp column names, override batch_expression on individual tables in the mapping YAML:

# mapping.yaml
tables:
  - table: batch.daily_usage
    batch_expression: batch_date        # This table uses batch_date
    ingestion_strategy: INCREMENTAL
    attributes:
      - id: USAGE_MINUTES
        transformation_expression: minutes_watched

  - table: streaming.events
    # No batch_expression — uses workflow default (updated_at)
    ingestion_strategy: TRANSACTIONAL
    attributes:
      - id: EVENT_TYPE
        transformation_expression: event_type

Which Pipelines Use Batch Filtering?

Ingestion Strategy	Batch filter applied?	Why
`INCREMENTAL`	Yes	Only reads new data via watermark
`TRANSACTIONAL`	Yes	No delta detection — without filtering, rows are duplicated
`FULL`	No	Must read all data for correct change detection
`FULL_LOG`	No	Must read all data for soft-delete detection

IDFR pipelines always apply batch filtering regardless of strategy — they skip previously registered identifiers.

Important: When setting batch_expression at workflow level, ensure the column exists on all source tables — including those used by multi-IDFR entities. If a source table lacks the column, the pipeline will fail with a SQL error. Use per-table batch_expression overrides when different tables have different timestamp columns, or omit the workflow-level setting and only configure it per-table.

Automatic Batch Tracking

Daana tracks batch execution in a metadata table (batch_history). Each pipeline records its batch window with a status lifecycle:

First run - Processes all historical data (from epoch to now)
Subsequent runs - Automatically continues from where the last successful batch ended
Failed runs - Do not advance the watermark; the next run retries the same window
Manual override - Use --batch-start and --batch-end flags for custom ranges

# Automatic: continues from last successful batch
daana-cli execute

# Manual: specify exact range
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

# Force: override stale run detection
daana-cli execute --force

Workflow Commands

Generate Template

Create a new workflow file from a template:

daana-cli generate workflow -o workflow.yaml

Check Workflow

Validate your workflow configuration before deploying:

daana-cli check workflow workflow.yaml

This validates:

Workflow YAML structure
Model file exists and is valid
Mapping files exist and are valid
Connection profile exists

Execute Workflow

Run the data transformation:

# Full execution
daana-cli execute

# With batch parameters (for incremental loading)
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

Complete Example

Here's a complete workflow showing all components together:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.

  model: models/ecommerce-model.yaml

  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

  connection: production

  advanced:
    batch_expression: updated_at

Quick Reference

Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.

Workflow Fields

Field	Type	Required	Description
id	`string`	✓	Unique identifier for the workflow
name	`string`	✓	Human-readable name for the workflow
definition	`string`	✓	One-line description of the workflow purpose
description	`string`	○	Detailed description of the workflow
connection	`string`	✓	Connection profile to use for execution
model	`ModelReference`	✓	Path to the data model file
mappings	list of `string`	✓	List of mapping files to execute ⚠ Order matters: list parent entities before children
advanced	`object`	○	Advanced workflow settings
└advanced.batch_expression	`string`	○	Batch filtering expression for incremental source reads ⚠ Per-table override: set batch_expression on individual tables in your mapping YAML⚠ If value contains ${BATCH_START} or ${BATCH_END}, it's used as raw SQL (auto-detected)⚠ FULL and FULL_LOG pipelines ignore this setting (they read all data)
└advanced.batch_stale_timeout	`string`	○	Timeout for stale pipeline run detection ⚠ Uses Go duration format (e.g., 8h, 30m, 1h30m)⚠ Use --force to override stale detection during execution

✓ = required, ○ = optional

Best Practices

Use descriptive workflow IDs - Choose names that reflect the business domain (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS)
Order mappings by dependency - Independent entities first, then dependent ones
Validate before deploying - Always run daana-cli check workflow before deployment
Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
Use environment-specific connections - Create separate connection profiles for dev, staging, and production
Document your workflows - Use the description field for detailed documentation
Keep models in YAML - Unless you have a specific reason to maintain compiled JSON files
Test incrementally - Test incremental logic with small date ranges first

Next Steps

Configure connections to your data sources
Create mappings to define data transformations
Follow the tutorial for a complete end-to-end example