DMDL
Workflow

Workflow

Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.

Workflow Structure

A workflow file has four main components:

  1. Workflow metadata - ID, name, and description of the workflow
  2. Model reference - Path to the data model file
  3. Mappings list - Ordered list of mapping files to execute
  4. Connection profile - Which database connection to use
workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into business-ready entities
 
  model: models/ecommerce-model.yaml
 
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml
 
  connection: dev

Tip: Generate a workflow template with daana-cli generate workflow -o workflow.yaml to get started quickly.

Workflow Metadata

Every workflow needs identifying information:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.
FieldPurpose
idUnique identifier (UPPER_SNAKE_CASE). Used to reference the workflow in commands.
nameHuman-readable name. Displayed in logs and status reports.
definitionOne-line summary of what the workflow does.
descriptionOptional detailed explanation with business context. Supports multi-line text.

Naming Convention: Use UPPERCASE names for workflow IDs (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS). The ID should clearly reflect the business domain.

Model Reference

The model field points to your data model file:

workflow:
  model: models/ecommerce-model.yaml

You can reference either:

  • YAML files (.yaml, .yml) - Will be auto-compiled during workflow execution
  • JSON files (.json) - Pre-compiled model for production use
# Development: use YAML for easier editing
model: models/ecommerce-model.yaml
 
# Production: use pre-compiled JSON for faster startup
model: models/ecommerce-model.json

Best Practice: Use YAML during development and pre-compiled JSON in production for faster workflow startup.

Mappings

The mappings field is an ordered list of mapping file paths:

workflow:
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

Mapping Order Matters

Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:

mappings:
  # 1. Independent entities first (no foreign keys)
  - mappings/customer-mapping.yaml
  - mappings/product-mapping.yaml
 
  # 2. Then entities that reference the above
  - mappings/order-mapping.yaml      # References CUSTOMER
 
  # 3. Finally, entities that reference those
  - mappings/order-item-mapping.yaml  # References ORDER and PRODUCT

Rule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.

Connection Profile

The connection field specifies which database connection profile to use:

workflow:
  connection: dev

This references a named profile from your connections.yaml file:

# connections.yaml
connections:
  dev:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb
 
  production:
    type: postgresql
    host: prod.example.com
    # ...

See Connections for details on configuring connection profiles.

Batch Processing

For large datasets, configure batch processing in the advanced section. This enables incremental data loading when mappings use ingestion_strategy: INCREMENTAL.

Basic Batch Configuration

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms e-commerce data with incremental loading
 
  model: models/ecommerce-model.yaml
  mappings:
    - mappings/order-mapping.yaml
  connection: production
 
  advanced:
    batch_column: updated_at
    read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

batch_column

The column used to track which records have been processed. Choose a timestamp column that updates whenever a record changes:

advanced:
  batch_column: updated_at

Common choices:

  • updated_at - Standard timestamp column
  • modified_date - Alternative naming
  • popln_tmstp - Population timestamp in some systems

read_logic

A SQL WHERE clause template that filters source data based on batch boundaries. Use placeholder variables:

  • ${BATCH_START} or p_batch_start_value - Start of the batch window
  • ${BATCH_END} or p_batch_end_value - End of the batch window
advanced:
  read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

Note: The SQL syntax is database-specific. Consult your platform's documentation for the correct date/timestamp comparison operators.

Automatic Batch Tracking

When executing workflows with INCREMENTAL mappings:

  1. First run - Processes all historical data (from epoch to now)
  2. Subsequent runs - Automatically continues from where the last batch ended
  3. Manual override - Use --batch-start and --batch-end flags for custom ranges
# Automatic: continues from last batch
daana-cli execute --workflow-id 123
 
# Manual: specify exact range
daana-cli execute --workflow-id 123 --batch-start "2024-01-01" --batch-end "2024-01-31"

Batch history is stored locally in ~/.daana/batch/history.csv.

Workflow Commands

Generate Template

Create a new workflow file from a template:

daana-cli generate workflow -o workflow.yaml

Check Workflow

Validate your workflow configuration before deploying:

daana-cli check workflow workflow.yaml

This validates:

  • Workflow YAML structure
  • Model file exists and is valid
  • Mapping files exist and are valid
  • Connection profile exists

Compile Workflow

Compile your workflow YAML to JSON for deployment:

daana-cli compile workflow -i workflow.yaml -o workflow.json

Execute Workflow

Run the data transformation:

# Full execution
daana-cli execute --workflow-id 123
 
# With batch parameters (for incremental loading)
daana-cli execute --workflow-id 123 --batch-start "2024-01-01" --batch-end "2024-01-31"

Complete Example

Here's a complete workflow showing all components together:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.
 
  model: models/ecommerce-model.yaml
 
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml
 
  connection: production
 
  advanced:
    batch_column: updated_at
    read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

Quick Reference

Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.

Workflow Fields

FieldTypeRequiredDescription
idstring
Unique identifier for the workflow
namestring
Human-readable name for the workflow
definitionstring
One-line description of the workflow purpose
descriptionstring
Detailed description of the workflow
connectionstring
Connection profile to use for execution
modelModelReference
Path to the data model file
mappingslist of string
List of mapping files to execute
Order matters: list parent entities before children
advancedobject
Advanced workflow settings
advanced.batch_columnstring
Column used for batch/incremental filtering
advanced.read_logicstring
SQL template for batch filtering
Syntax is database-specific - consult your platform's SQL documentation
Variables like ${BATCH_START} are resolved at runtime

✓ = required, ○ = optional

Best Practices

  1. Use descriptive workflow IDs - Choose names that reflect the business domain (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS)
  2. Order mappings by dependency - Independent entities first, then dependent ones
  3. Validate before deploying - Always run daana-cli check workflow before deployment
  4. Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
  5. Use environment-specific connections - Create separate connection profiles for dev, staging, and production
  6. Document your workflows - Use the description field for detailed documentation
  7. Pre-compile for production - Use compiled JSON files for faster startup in production environments
  8. Test incrementally - Test incremental logic with small date ranges first

Next Steps