Tutorial

Chapter 6 - Going to Production

Goal: Understand the CI/CD workflow for Daana projects and practice validation locally.

Prerequisites: You must have completed Chapter 5: Mastering DMDL.


The Real-World Workflow

In production environments, data teams follow a structured workflow to ensure quality and prevent outages. Here's how it typically works with Daana:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  1. DEVELOP     │     │  2. TEST        │     │  3. REVIEW      │
│  (Local)        │────▶│  (Dev Schema)   │────▶│  (Pull Request) │
│                 │     │                 │     │                 │
│  - Edit YAML    │     │  - Deploy to    │     │  - Code review  │
│  - Check syntax │     │    dev schema   │     │  - CI tests run │
└─────────────────┘     └─────────────────┘     └─────────────────┘


                        ┌─────────────────┐     ┌─────────────────┐
                        │  5. EXECUTE     │◀────│  4. MERGE       │
                        │  (Production)   │     │  (Main Branch)  │
                        │                 │     │                 │
                        │  - Run workflow │     │  - Auto-deploy  │
                        │  - Load data    │     │    triggered    │
                        └─────────────────┘     └─────────────────┘

Step 1: The Development Phase

When developing locally, daana-cli check workflow is your go-to validation command. It validates the model, all mappings, and the connection profile in one shot:

daana-cli check workflow

Validation in action. Introduce an intentional error in your mapping:

# Edit mappings/order-mapping.yaml and change the entity reference to something invalid:
#   entity_id: ORDER_INVALID  (instead of ORDER)

Now run validation:

daana-cli check workflow

You'll see output similar to this (warnings about hardcoded credentials and sslmode come from connections.yaml and are expected for the local tutorial):

Checking workflow: workflow.yaml

Workflow: BOOK_RETAILER_WORKFLOW
  Workflow ID: 1424339281

✘ Errors:
  Mapping Model Validation:
    • order-mapping.yaml: Entity 'ORDER_INVALID' not found in model (did you mean: ORDER_LINE?) Available: [CUSTOMER, ORDER, PRODUCT, ORDER_LINE]

⚠ Warnings:
  Entity Not Mapped:
    • model.yaml: Entity 'ORDER' defined in model but has no mapping
  Connection Profile Warning:
    • connections.yaml: [dev] 'password' appears to be hardcoded. Use environment variables for security: ${PASSWORD}
    • connections.yaml: [dev] 'user' appears to be hardcoded. Use environment variables for security: ${USER}
    • connections.yaml: [dev] root.sslmode='disable' is insecure for production. Use 'require' or 'verify-full'

Summary: 1 error(s), 4 warning(s)
Error: workflow validation failed

The ✘ Errors section is what you must fix before proceeding. The ⚠ Warnings are advisory: the "Connection Profile" group is unrelated to your edit and will remain until you replace the hardcoded credentials with environment variables (see "Best Practices" at the end of this chapter).

Fix it by changing back to entity_id: ORDER, then verify:

daana-cli check workflow
# Workflow valid

Note: daana-cli check validates structural correctness (entity/attribute references, YAML syntax). It does NOT validate column names against the database schema - those errors are caught at deploy/execute time.


Step 2: Testing in a Dev Schema

Before deploying to production, you deploy to a developer-specific schema. This lets you:

  • Test actual SQL execution
  • Verify data transformations
  • Catch runtime issues

Configure Multiple Environments

Open connections.yaml and set up separate schemas:

See Connection Profiles for all supported fields, database types, and SSL options.

connections:
  # Your personal development environment
  dev:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb
    sslmode: disable
    target_schema: daana_dw_yourname  # Developer-specific schema

  # Shared production environment
  production:
    type: postgresql
    host: localhost
    port: 5432
    user: dev
    password: devpass
    database: customerdb
    sslmode: disable
    target_schema: daana_dw  # Production schema

Deploy and Test Locally

The exercises below mix CLI commands with SQL queries. Open a psql shell in a second terminal so the first stays free for daana-cli invocations:

docker exec -it daana-customerdb psql -U dev -d customerdb

Deploy and execute against your dev schema:

daana-cli deploy --connection dev
daana-cli execute --connection dev

Query your dev schema to verify (in the psql terminal):

SELECT COUNT(*) FROM daana_dw_yourname.view_customer;

Step 3: The CI/CD Pipeline

Once your changes work locally, you commit and push to trigger CI/CD.

Typical CI Configuration

Here's what a CI pipeline might look like (GitHub Actions example):

# .github/workflows/daana-ci.yml
name: Daana CI

on:
  pull_request:
    paths:
      - 'model.yaml'
      - 'workflow.yaml'
      - 'mappings/**'
      - 'connections.yaml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Daana CLI
        run: |
          # Download and install daana-cli

      - name: Validate Workflow
        run: daana-cli check workflow

  deploy-staging:
    needs: validate
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Staging
        run: |
          daana-cli deploy --connection staging
          daana-cli execute --connection staging

      - name: Run Data Quality Tests
        run: |
          # Run assertions on staging data

What CI Tests Catch

The CI pipeline validates:

  • Syntax errors: Invalid YAML, missing fields
  • Reference errors: Attributes referencing non-existent entities
  • SQL errors: Invalid column names, type mismatches
  • Schema drift: Source table changes that break mappings

Step 4: Deploying to Production

After PR approval and merge, production deployment happens:

# Production deployment (typically automated)
daana-cli deploy --connection production
daana-cli execute --connection production

Simulate This Locally

Simulate the full flow:

# 1. Make a change (add a comment to model.yaml)
echo "# Updated: $(date)" >> model.yaml

# 2. Validate (CI would do this)
daana-cli check workflow

# 3. Deploy to "staging" (your dev schema)
daana-cli deploy --connection dev

# 4. Test in staging
daana-cli execute --connection dev

# 5. If all good, deploy to "production"
daana-cli deploy --connection production
daana-cli execute --connection production

# 6. Then verify production data from your psql terminal
SELECT COUNT(*) FROM daana_dw.view_order;

Hands-On: The Pre-Deployment Checklist

Before deploying any changes, run through this checklist:

Exercise 1: Full Validation Sweep

# check workflow validates everything (model + mappings + connections)
daana-cli check workflow && echo "All checks passed - safe to deploy!"

Note: check workflow validates your model, all mappings, and connection profiles in one command.

If the check fails, fix the issue before proceeding.

Exercise 2: Simulate a Runtime Error

daana-cli check catches structural errors, but some errors only appear at deploy time. Reproduce this case:

  1. Edit mappings/order-mapping.yaml and change a column name to something that doesn't exist:

    - id: order_status
      transformation_expression: order_status_TYPO  # This column doesn't exist!
    
  2. Run check (it passes - check validates YAML structure, not database schema):

    daana-cli check workflow
    # ✓ Workflow valid
    
  3. Try to deploy:

    daana-cli deploy
    

    deploy runs a pre-deploy validation step that issues each transformation expression against the source database. It fails before any DDL is run, with a structured message that names the file, the attribute, the missing column, and the underlying SQLSTATE:

    Error: pre-deploy validation failed: validation failed with 1 error(s):
      - order-mapping.yaml: invalid transformation expression for 'order_status': pq: column "order_status_typo" does not exist at column 8 (42703)
    

Key Learning:

  • check validates YAML structure and references against the model.
  • deploy runs a pre-deploy validation step that catches schema-level errors (missing columns, type mismatches) before applying any DDL, so a typo here cannot leave the warehouse in a half-deployed state.
  • Both are important in your workflow.

Fix it by changing back to order_status before continuing.

Exercise 3: Verify Your Fix

# Redeploy with the fix
daana-cli deploy

# Execute to ensure everything works
daana-cli execute

Verify data is correct (in the psql terminal):

SELECT order_id, order_status FROM daana_dw.view_order LIMIT 3;

Best Practices

1. Always Validate Before Commit

# Add to your pre-commit hook or run manually
daana-cli check workflow

2. Use Descriptive Schema Names

# Good - clear ownership and purpose
target_schema: daana_dw_alice_feature123
target_schema: daana_dw_staging
target_schema: daana_dw_prod

# Bad - ambiguous
target_schema: dw
target_schema: test

3. Environment Variables for Secrets

Never commit passwords. Use environment variables:

connections:
  production:
    type: postgresql
    host: "${PROD_DB_HOST}"
    user: "${PROD_DB_USER}"
    password: "${PROD_DB_PASSWORD}"
    database: "${PROD_DB_NAME}"

4. Version Control Everything

Your Daana project should be in Git:

my-project/
├── model.yaml           # Version controlled
├── workflow.yaml        # Version controlled
├── mappings/            # Version controlled
├── connections.yaml     # Version controlled (no secrets!)
└── .github/workflows/   # CI/CD configuration

Summary

You've learned the production workflow:

  1. Develop locally with daana-cli check for fast feedback
  2. Test in dev schema to validate actual SQL execution
  3. CI validates on pull request
  4. Deploy to production after merge

This workflow ensures:

  • Errors caught early (before reaching any database)
  • Changes tested in isolation (dev schemas)
  • Code review and approval gates
  • Automated, repeatable deployments

One topic remains: building a dimensional consumption layer (dimensions and facts) from Daana's output views.

Next: Chapter 7 - Building Your Analytics Layer