Workflow
Workflows orchestrate the execution of your data transformations. They bring together your model, mappings, and connection profiles into an executable pipeline.
Workflow Structure
A workflow file has four main components:
- Workflow metadata - ID, name, and description of the workflow
- Model reference - Path to the data model file
- Mappings list - Ordered list of mapping files to execute
- Connection profile - Which database connection to use
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into business-ready entities
model: models/ecommerce-model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/order-mapping.yaml
connection: devTip: Generate a workflow template with
daana-cli generate workflow -o workflow.yamlto get started quickly.
Workflow Metadata
Every workflow needs identifying information:
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into analytics-ready format
description: |
This workflow processes customer and order data from the
operational database and transforms it into business entities
for analytics and reporting.| Field | Purpose |
|---|---|
id | Unique identifier (UPPER_SNAKE_CASE). Used to reference the workflow in commands. |
name | Human-readable name. Displayed in logs and status reports. |
definition | One-line summary of what the workflow does. |
description | Optional detailed explanation with business context. Supports multi-line text. |
Naming Convention: Use UPPERCASE names for workflow IDs (e.g.,
ECOMMERCE_WORKFLOW,CUSTOMER_ANALYTICS). The ID should clearly reflect the business domain.
Model Reference
The model field points to your data model file:
workflow:
model: models/ecommerce-model.yamlYou can reference either:
- YAML files (
.yaml,.yml) - Will be auto-compiled during workflow execution - JSON files (
.json) - Pre-compiled model for production use
# Development: use YAML for easier editing
model: models/ecommerce-model.yaml
# Production: use pre-compiled JSON for faster startup
model: models/ecommerce-model.jsonBest Practice: Use YAML during development and pre-compiled JSON in production for faster workflow startup.
Mappings
The mappings field is an ordered list of mapping file paths:
workflow:
mappings:
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
- mappings/order-mapping.yaml
- mappings/order-item-mapping.yamlMapping Order Matters
Mappings execute in the order listed. When entities have relationships (foreign keys), list parent entities before child entities:
mappings:
# 1. Independent entities first (no foreign keys)
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
# 2. Then entities that reference the above
- mappings/order-mapping.yaml # References CUSTOMER
# 3. Finally, entities that reference those
- mappings/order-item-mapping.yaml # References ORDER and PRODUCTRule: If entity A has a relationship to entity B, mapping B must come before mapping A in the list.
Connection Profile
The connection field specifies which database connection profile to use:
workflow:
connection: devThis references a named profile from your connections.yaml file:
# connections.yaml
connections:
dev:
type: postgresql
host: localhost
port: 5432
user: dev
password: devpass
database: customerdb
production:
type: postgresql
host: prod.example.com
# ...See Connections for details on configuring connection profiles.
Batch Processing
For large datasets, configure batch processing in the advanced section. This enables incremental data loading when mappings use ingestion_strategy: INCREMENTAL.
Basic Batch Configuration
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms e-commerce data with incremental loading
model: models/ecommerce-model.yaml
mappings:
- mappings/order-mapping.yaml
connection: production
advanced:
batch_column: updated_at
read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"batch_column
The column used to track which records have been processed. Choose a timestamp column that updates whenever a record changes:
advanced:
batch_column: updated_atCommon choices:
updated_at- Standard timestamp columnmodified_date- Alternative namingpopln_tmstp- Population timestamp in some systems
read_logic
A SQL WHERE clause template that filters source data based on batch boundaries. Use placeholder variables:
${BATCH_START}orp_batch_start_value- Start of the batch window${BATCH_END}orp_batch_end_value- End of the batch window
advanced:
read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"Note: The SQL syntax is database-specific. Consult your platform's documentation for the correct date/timestamp comparison operators.
Automatic Batch Tracking
When executing workflows with INCREMENTAL mappings:
- First run - Processes all historical data (from epoch to now)
- Subsequent runs - Automatically continues from where the last batch ended
- Manual override - Use
--batch-startand--batch-endflags for custom ranges
# Automatic: continues from last batch
daana-cli execute --workflow-id 123
# Manual: specify exact range
daana-cli execute --workflow-id 123 --batch-start "2024-01-01" --batch-end "2024-01-31"Batch history is stored locally in ~/.daana/batch/history.csv.
Workflow Commands
Generate Template
Create a new workflow file from a template:
daana-cli generate workflow -o workflow.yamlCheck Workflow
Validate your workflow configuration before deploying:
daana-cli check workflow workflow.yamlThis validates:
- Workflow YAML structure
- Model file exists and is valid
- Mapping files exist and are valid
- Connection profile exists
Compile Workflow
Compile your workflow YAML to JSON for deployment:
daana-cli compile workflow -i workflow.yaml -o workflow.jsonExecute Workflow
Run the data transformation:
# Full execution
daana-cli execute --workflow-id 123
# With batch parameters (for incremental loading)
daana-cli execute --workflow-id 123 --batch-start "2024-01-01" --batch-end "2024-01-31"Complete Example
Here's a complete workflow showing all components together:
workflow:
id: ECOMMERCE_WORKFLOW
name: E-commerce Data Pipeline
definition: Transforms raw e-commerce data into analytics-ready format
description: |
This workflow processes customer and order data from the
operational database and transforms it into business entities
for analytics and reporting.
model: models/ecommerce-model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/product-mapping.yaml
- mappings/order-mapping.yaml
- mappings/order-item-mapping.yaml
connection: production
advanced:
batch_column: updated_at
read_logic: "updated_at > '${BATCH_START}' AND updated_at <= '${BATCH_END}'"Quick Reference
Looking up a specific field? Here's the complete reference for all fields in workflow.yaml.
Workflow Fields
| Field | Type | Required | Description |
|---|---|---|---|
| id | string | ✓ | Unique identifier for the workflow |
| name | string | ✓ | Human-readable name for the workflow |
| definition | string | ✓ | One-line description of the workflow purpose |
| description | string | ○ | Detailed description of the workflow |
| connection | string | ✓ | Connection profile to use for execution |
| model | ModelReference | ✓ | Path to the data model file |
| mappings | list of string | ✓ | List of mapping files to execute ⚠ Order matters: list parent entities before children |
| advanced | object | ○ | Advanced workflow settings |
| └advanced.batch_column | string | ○ | Column used for batch/incremental filtering |
| └advanced.read_logic | string | ○ | SQL template for batch filtering ⚠ Syntax is database-specific - consult your platform's SQL documentation ⚠ Variables like ${BATCH_START} are resolved at runtime |
✓ = required, ○ = optional
Best Practices
- Use descriptive workflow IDs - Choose names that reflect the business domain (e.g.,
ECOMMERCE_WORKFLOW,CUSTOMER_ANALYTICS) - Order mappings by dependency - Independent entities first, then dependent ones
- Validate before deploying - Always run
daana-cli check workflowbefore deployment - Configure batch processing - For large datasets, use INCREMENTAL ingestion with batch settings
- Use environment-specific connections - Create separate connection profiles for dev, staging, and production
- Document your workflows - Use the
descriptionfield for detailed documentation - Pre-compile for production - Use compiled JSON files for faster startup in production environments
- Test incrementally - Test incremental logic with small date ranges first
Next Steps
- Configure connections to your data sources
- Create mappings to define data transformations
- Follow the tutorial for a complete end-to-end example