Workflow

Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.

Workflow Structure

A workflow file has four main components:

Workflow metadata - ID, name, and description of the workflow
Model reference - Path to the data model file
Mappings list - Ordered list of mapping files to execute
Connection profile - Which database connection to use

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into business-ready entities
 
  model: models/ecommerce-model.yaml
 
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/order-mapping.yaml
 
  connection: dev

Tip: Generate a workflow template with daana-cli generate workflow -o workflow.yaml to get started quickly.

Workflow Metadata

Every workflow needs identifying information:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.

Field	Purpose
`id`	Unique identifier (UPPER_SNAKE_CASE). Used to reference the workflow in commands.
`name`	Human-readable name. Displayed in logs and status reports.
`definition`	One-line summary of what the workflow does.
`description`	Optional detailed explanation with business context. Supports multi-line text.

Naming Convention: Use UPPERCASE names for workflow IDs (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS). The ID should clearly reflect the business domain.

Model Reference

The model field points to your data model file:

workflow:
  model: models/ecommerce-model.yaml

You can reference either:

YAML files (.yaml, .yml) - Will be auto-compiled during workflow execution
JSON files (.json) - Pre-compiled model for production use

# Development: use YAML for easier editing
model: models/ecommerce-model.yaml
 
# JSON is optional if you already have a compiled model
model: models/ecommerce-model.json

Best Practice: Use YAML unless you have a specific reason to maintain pre-compiled JSON files.

Mappings

The mappings field is an ordered list of mapping file paths:

workflow:
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml

Mapping Order Matters

Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:

mappings:
  # 1. Independent entities first (no foreign keys)
  - mappings/customer-mapping.yaml
  - mappings/product-mapping.yaml
 
  # 2. Then entities that reference the above
  - mappings/order-mapping.yaml      # References CUSTOMER
 
  # 3. Finally, entities that reference those
  - mappings/order-item-mapping.yaml  # References ORDER and PRODUCT

Rule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.

Connection Profile

The connection field specifies which database connection profile to use:

workflow:
  connection: dev

This references a named profile from your connections.yaml file:

# connections.yaml
connections:
  dev:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb
 
  production:
    type: postgresql
    host: prod.example.com
    # ...

See Connections for details on configuring connection profiles.

Batch Processing

For large datasets, configure batch processing in the advanced section. This enables incremental data loading when mappings use ingestion_strategy: INCREMENTAL.

Basic Batch Configuration

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms e-commerce data with incremental loading
 
  model: models/ecommerce-model.yaml
  mappings:
    - mappings/order-mapping.yaml
  connection: production
 
  advanced:
    batch_column: updated_at
    read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

batch_column

The column used to track which records have been processed. Choose a timestamp column that updates whenever a record changes:

advanced:
  batch_column: updated_at

Common choices:

updated_at - Standard timestamp column
modified_date - Alternative naming
popln_tmstp - Population timestamp in some systems

read_logic

A SQL WHERE clause template that filters source data based on batch boundaries. Use placeholder variables:

${BATCH_START} or p_batch_start_value - Start of the batch window
${BATCH_END} or p_batch_end_value - End of the batch window

advanced:
  read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

Note: The SQL syntax is database-specific. Consult your platform's documentation for the correct date/timestamp comparison operators.

Automatic Batch Tracking

When executing workflows with INCREMENTAL mappings:

First run - Processes all historical data (from epoch to now)
Subsequent runs - Automatically continues from where the last batch ended
Manual override - Use --batch-start and --batch-end flags for custom ranges

# Automatic: continues from last batch
daana-cli execute
 
# Manual: specify exact range
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

Daana tracks batch progress (in ~/.daana/batch/history.csv) so subsequent runs continue automatically.

Workflow Commands

Generate Template

Create a new workflow file from a template:

daana-cli generate workflow -o workflow.yaml

Check Workflow

Validate your workflow configuration before deploying:

daana-cli check workflow workflow.yaml

This validates:

Workflow YAML structure
Model file exists and is valid
Mapping files exist and are valid
Connection profile exists

Execute Workflow

Run the data transformation:

# Full execution
daana-cli execute
 
# With batch parameters (for incremental loading)
daana-cli execute --batch-start "2024-01-01" --batch-end "2024-01-31"

Complete Example

Here's a complete workflow showing all components together:

workflow:
  id: ECOMMERCE_WORKFLOW
  name: E-commerce Data Pipeline
  definition: Transforms raw e-commerce data into analytics-ready format
  description: |
    This workflow processes customer and order data from the
    operational database and transforms it into business entities
    for analytics and reporting.
 
  model: models/ecommerce-model.yaml
 
  mappings:
    - mappings/customer-mapping.yaml
    - mappings/product-mapping.yaml
    - mappings/order-mapping.yaml
    - mappings/order-item-mapping.yaml
 
  connection: production
 
  advanced:
    batch_column: updated_at
    read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"

Quick Reference

Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.

Workflow Fields

Field	Type	Required	Description
id	`string`	✓	Unique identifier for the workflow
name	`string`	✓	Human-readable name for the workflow
definition	`string`	✓	One-line description of the workflow purpose
description	`string`	○	Detailed description of the workflow
connection	`string`	✓	Connection profile to use for execution
model	`ModelReference`	✓	Path to the data model file
mappings	list of `string`	✓	List of mapping files to execute ⚠ Order matters: list parent entities before children
advanced	`object`	○	Advanced workflow settings
└advanced.batch_column	`string`	○	Column used for batch/incremental filtering
└advanced.read_logic	`string`	○	SQL template for batch filtering ⚠ Syntax is database-specific - consult your platform's SQL documentation ⚠ Variables like ${BATCH_START} are resolved at runtime

✓ = required, ○ = optional

Best Practices

Use descriptive workflow IDs - Choose names that reflect the business domain (e.g., ECOMMERCE_WORKFLOW, CUSTOMER_ANALYTICS)
Order mappings by dependency - Independent entities first, then dependent ones
Validate before deploying - Always run daana-cli check workflow before deployment
Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
Use environment-specific connections - Create separate connection profiles for dev, staging, and production
Document your workflows - Use the description field for detailed documentation
Keep models in YAML - Unless you have a specific reason to maintain compiled JSON files
Test incrementally - Test incremental logic with small date ranges first

Next Steps

Configure connections to your data sources
Create mappings to define data transformations
Follow the tutorial for a complete end-to-end example

Mapping Connections