DMDL
Connections

Connections

Connection profiles define how Daana connects to your data warehouse databases where source data lives and transformed business entities are stored.

Understanding the Two-Database Architecture

Daana uses a two-database architecture:

DatabasePurposeConfiguration
internaldbDaana's control/metadata database. Stores model definitions, workflows, and orchestrates transformations.~/.daana/config.yaml
customerdbYour data warehouse. Contains source data and receives transformed business entities.connections.yaml

This separation keeps Daana's metadata isolated from your business data.

~/.daana/config.yaml     → internaldb (Daana's control database)
connections.yaml         → customerdb (your data warehouse)

Connection Profile Structure

A connections file defines named profiles for different environments:

connections:
  dev:
    type: "postgresql"
    host: "localhost"
    port: 5432
    user: "dev"
    password: "devpass"
    database: "customerdb"
    sslmode: "disable"
    target_schema: "daana_dw"
 
  production:
    type: "postgresql"
    host: "prod.example.com"
    port: 5432
    user: "${DW_USER}"
    password: "${DW_PASSWORD}"
    database: "analytics"
    sslmode: "require"
    target_schema: "daana_dw"

Tip: Generate a template with daana-cli generate connections -o connections.yaml to get started quickly.

Environment Variables

Use environment variable syntax ${VAR_NAME} for sensitive values. Daana automatically interpolates these at runtime:

connections:
  production:
    type: "postgresql"
    host: "${PROD_DB_HOST}"
    user: "${PROD_DB_USER}"
    password: "${PROD_DB_PASSWORD}"
    database: "analytics"

Set the variables before running commands:

export PROD_DB_HOST="prod.example.com"
export PROD_DB_USER="daana_svc"
export PROD_DB_PASSWORD="secure-password"
 
daana-cli deploy workflow.yaml --connection-profiles connections.yaml --profile production

Security: Never commit passwords or credentials to version control. Always use environment variables for sensitive values.

Supported Database Types

Daana supports multiple data warehouse platforms. Each has specific configuration requirements.

PostgreSQL (Default and Recommended)

PostgreSQL is the default and most thoroughly tested dialect with Daana. It offers excellent performance, reliability, and full support for all Daana features.

When to use:

  • Default choice for new Daana projects
  • When you need full feature support and best compatibility
  • On-premise deployments with full control
  • Cloud deployments on AWS RDS, Azure Database, or Google Cloud SQL

Required Fields

FieldTypeRequiredDescription
hoststring
Database server hostname
portint
Database port number
Default: 5432
userstring
Database username
Use environment variables in production: ${DB_USER}
databasestring
Database name

Optional Fields

FieldTypeRequiredDescription
passwordstring
Database password
Never hardcode passwords. Use ${VAR_NAME} syntax.
sslmodestring
SSL connection mode
Default: disable
sslmode='disable' is insecure for production. Use 'require' or 'verify-full'.
connect_timeoutint
Connection timeout in seconds
application_namestring
Application name for connection identification
pool_min_connsint
Minimum pool connections
pool_max_connsint
Maximum pool connections

Example Configuration:

connections:
  dev:
    type: "postgresql"
    host: "localhost"
    port: 5432
    user: "dev"
    password: "devpass"
    database: "customerdb"
    sslmode: "disable"
    target_schema: "daana_dw"

PostgreSQL is the most thoroughly tested dialect with Daana

For production, always set sslmode to 'require' or 'verify-full'

BigQuery

Google BigQuery is a serverless, highly scalable cloud data warehouse. It excels at analytical queries on massive datasets with pay-per-query pricing.

When to use:

  • Large-scale analytics workloads (terabytes to petabytes)
  • When using Google Cloud Platform
  • Serverless operation with no infrastructure management
  • Pay-per-query pricing model preferred

Required Fields

FieldTypeRequiredDescription
project_idstring
Google Cloud project ID
datasetstring
BigQuery dataset name
locationstring
Dataset location/region

Authentication Options

FieldTypeRequiredDescription
credentials_filestring
Path to service account JSON key
Use environment variables for credential paths in production.
use_adcbool
Use Application Default Credentials

Optional Fields

FieldTypeRequiredDescription
timeoutint
Query timeout in seconds

Example Configuration:

connections:
  bigquery-prod:
    type: "bigquery"
    project_id: "my-gcp-project"
    dataset: "daana_dw"
    location: "US"
    credentials_file: "${BQ_CREDENTIALS_FILE}"

Service account needs BigQuery Data Editor and BigQuery Job User roles

Either credentials_file or use_adc is required for authentication

Microsoft SQL Server

Microsoft SQL Server is an enterprise RDBMS widely used in Windows environments with deep integration with Microsoft tools and services.

When to use:

  • Organization uses Microsoft SQL Server infrastructure
  • Integration with .NET applications and Microsoft stack
  • Existing SQL Server databases as source data
  • Azure SQL Database deployments

Required Fields

FieldTypeRequiredDescription
hoststring
SQL Server hostname
portint
SQL Server port
Default: 1433
userstring
Database username
databasestring
Database name

Optional Fields

FieldTypeRequiredDescription
passwordstring
Database password
Never hardcode passwords. Use ${VAR_NAME} syntax.
encryptbool
Enable connection encryption
Enable encryption for production: encrypt: true
trust_server_certificatebool
Trust server certificate without validation
Set to false in production and use proper certificates.
connection_timeoutint
Connection timeout in seconds

Example Configuration:

connections:
  mssql-prod:
    type: "mssql"
    host: "sqlserver.example.com"
    port: 1433
    user: "${MSSQL_USER}"
    password: "${MSSQL_PASSWORD}"
    database: "DataWarehouse"
    encrypt: true
    trust_server_certificate: false
    target_schema: "daana_dw"

MSSQL uses T-SQL which differs from standard SQL in some areas

Enable encryption and use trusted certificates in production

Oracle

Oracle Database is a multi-model enterprise RDBMS widely used in large organizations for mission-critical workloads.

When to use:

  • Existing Oracle database infrastructure
  • Enterprise environments with Oracle licensing
  • Mission-critical workloads requiring high availability
  • Financial services and regulated industries

Required Fields

FieldTypeRequiredDescription
hoststring
Oracle server hostname
portint
Oracle listener port
Default: 1521
userstring
Database username

Authentication Options

FieldTypeRequiredDescription
service_namestring
Oracle service name
sidstring
Oracle SID

Optional Fields

FieldTypeRequiredDescription
passwordstring
Database password
Never hardcode passwords. Use ${VAR_NAME} syntax.
connect_timeoutint
Connection timeout in seconds

Example Configuration:

connections:
  oracle-prod:
    type: "oracle"
    host: "oracle.example.com"
    port: 1521
    user: "${ORACLE_USER}"
    password: "${ORACLE_PASSWORD}"
    service_name: "ORCLPDB1"
    target_schema: "DAANA_DW"

Requires Oracle client libraries installed on the system

Either service_name or sid is required (but not both)

Oracle schema names are typically UPPERCASE

Snowflake

Snowflake is a cloud-native data platform that separates storage and compute, providing near-unlimited scalability with per-second billing.

When to use:

  • Need massive scale with automatic performance tuning
  • Multi-cloud strategy (AWS, Azure, GCP)
  • Working with semi-structured data (JSON, Avro, Parquet)
  • Instant scaling for variable workloads

Required Fields

FieldTypeRequiredDescription
accountstring
Snowflake account identifier
userstring
Snowflake username
warehousestring
Compute warehouse name
databasestring
Snowflake database name
schemastring
Snowflake schema name

Authentication Options

FieldTypeRequiredDescription
passwordstring
Snowflake password
Never hardcode passwords. Use ${VAR_NAME} syntax.
private_key_pathstring
Path to RSA private key

Optional Fields

FieldTypeRequiredDescription
rolestring
Snowflake role to use
regionstring
Snowflake region

Example Configuration:

connections:
  snowflake-prod:
    type: "snowflake"
    account: "xy12345.us-east-1.aws"
    user: "${SNOWFLAKE_USER}"
    password: "${SNOWFLAKE_PASSWORD}"
    warehouse: "DAANA_WH"
    database: "ANALYTICS"
    schema: "DAANA_DW"
    role: "TRANSFORMER"

Account identifier should be in format 'account.region.cloud'

Warehouse must be running for queries (costs apply)

Either password or private_key_path is required for authentication

Using Connection Profiles

Connection profiles are used with commands that interact with your data warehouse:

# Install framework (sets up both metadata DB and data warehouse)
daana-cli install --connection-profiles connections.yaml --profile dev
 
# Deploy transformations
daana-cli deploy workflow.yaml --connection-profiles connections.yaml --profile production
 
# Execute workflows
daana-cli execute --workflow-id 123 --connection-profiles connections.yaml --profile production

Profile Selection

When multiple profiles exist:

# Explicitly select a profile
daana-cli check connections --connection dev
 
# If only one profile exists, it's auto-selected
daana-cli check connections

Validating Connection Profiles

Before deploying, validate your connection profiles to catch configuration errors:

# Validate all profiles in the file
daana-cli check connections
 
# Validate a specific profile
daana-cli check connections --connection dev
 
# Strict mode (treat warnings as errors)
daana-cli check connections --connection prod --strict
 
# JSON output for CI/CD
daana-cli check connections --format json

What Validation Checks

The validator verifies:

CategoryWhat's Checked
StructureYAML syntax, required fields present
DialectValid database type, dialect-specific required fields
SecurityHardcoded credentials (warns), SSL/encryption settings
FilesCredential files exist (for BigQuery, Snowflake key-pair)

Example Validation Output

Valid profile with warnings:

✓ Profile 'dev' valid
  Type: postgresql

⚠  Profile 'dev' warnings:
  • [dev] root.sslmode='disable' is insecure for production. Use 'require' or 'verify-full'

Invalid profile:

✘ Profile 'prod' errors:
  • [prod] root: PostgreSQL requires field 'port'
  • [prod] root: PostgreSQL requires field 'user'
  • [prod] root: PostgreSQL requires field 'database'

Generating a Template

Generate a connections template with examples for all supported dialects:

daana-cli generate connections -o connections.yaml

This creates a well-commented template with:

  • Development PostgreSQL profile (uncommented, ready to use)
  • Production PostgreSQL example (commented)
  • Examples for BigQuery, MSSQL, Oracle, Snowflake (commented)

Complete Example

Here's a complete connections file for a multi-environment setup:

# Connection profiles for my-project
connections:
 
  # Development - local Docker PostgreSQL
  dev:
    type: "postgresql"
    host: "localhost"
    port: 5432
    user: "dev"
    password: "devpass"
    database: "customerdb"
    sslmode: "disable"
    target_schema: "daana_dw"
 
  # Staging - AWS RDS PostgreSQL
  staging:
    type: "postgresql"
    host: "staging-db.example.com"
    port: 5432
    user: "${STAGING_DB_USER}"
    password: "${STAGING_DB_PASSWORD}"
    database: "analytics_staging"
    sslmode: "require"
    target_schema: "daana_dw"
 
  # Production - AWS RDS PostgreSQL
  production:
    type: "postgresql"
    host: "prod-db.example.com"
    port: 5432
    user: "${PROD_DB_USER}"
    password: "${PROD_DB_PASSWORD}"
    database: "analytics"
    sslmode: "verify-full"
    target_schema: "daana_dw"
    connect_timeout: 30
    pool_max_conns: 20

Best Practices

  1. Use environment variables for secrets (${VAR_NAME} syntax)
  2. Enable SSL/TLS encryption for production databases
  3. Create separate profiles for dev, staging, and production
  4. Validate profiles before deployment with 'daana-cli check connections'
  5. Use least-privilege database credentials
  6. Document profiles with comments explaining their purpose
  7. Keep connections.yaml in version control (secrets are in env vars)
  8. Use target_schema to separate Daana output from source data

Next Steps