Connections
Connection profiles define how Daana connects to your data warehouse databases where source data lives and transformed business entities are stored.
Understanding the Two-Database Architecture
Daana uses a two-database architecture:
| Database | Purpose | Configuration |
|---|---|---|
| internaldb | Daana's control/metadata database. Stores model definitions, workflows, and orchestrates transformations. | ~/.daana/config.yaml |
| customerdb | Your data warehouse. Contains source data and receives transformed business entities. | connections.yaml |
This separation keeps Daana's metadata isolated from your business data.
~/.daana/config.yaml → internaldb (Daana's control database)
connections.yaml → customerdb (your data warehouse)Connection Profile Structure
A connections file defines named profiles for different environments:
connections:
dev:
type: "postgresql"
host: "localhost"
port: 5432
user: "dev"
password: "devpass"
database: "customerdb"
sslmode: "disable"
target_schema: "daana_dw"
production:
type: "postgresql"
host: "prod.example.com"
port: 5432
user: "${DW_USER}"
password: "${DW_PASSWORD}"
database: "analytics"
sslmode: "require"
target_schema: "daana_dw"Tip: Generate a template with
daana-cli generate connections -o connections.yamlto get started quickly.
Environment Variables
Use environment variable syntax ${VAR_NAME} for sensitive values. Daana automatically interpolates these at runtime:
connections:
production:
type: "postgresql"
host: "${PROD_DB_HOST}"
user: "${PROD_DB_USER}"
password: "${PROD_DB_PASSWORD}"
database: "analytics"Set the variables before running commands:
export PROD_DB_HOST="prod.example.com"
export PROD_DB_USER="daana_svc"
export PROD_DB_PASSWORD="secure-password"
daana-cli deploy workflow.yaml --connection-profiles connections.yaml --profile productionSecurity: Never commit passwords or credentials to version control. Always use environment variables for sensitive values.
Supported Database Types
Daana supports multiple data warehouse platforms. Each has specific configuration requirements.
PostgreSQL (Default and Recommended)
PostgreSQL is the default and most thoroughly tested dialect with Daana. It offers excellent performance, reliability, and full support for all Daana features.
When to use:
- Default choice for new Daana projects
- When you need full feature support and best compatibility
- On-premise deployments with full control
- Cloud deployments on AWS RDS, Azure Database, or Google Cloud SQL
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
| host | string | ✓ | Database server hostname |
| port | int | ✓ | Database port number Default: 5432 |
| user | string | ✓ | Database username ⚠ Use environment variables in production: ${DB_USER} |
| database | string | ✓ | Database name |
Optional Fields
| Field | Type | Required | Description |
|---|---|---|---|
| password | string | ○ | Database password ⚠ Never hardcode passwords. Use ${VAR_NAME} syntax. |
| sslmode | string | ○ | SSL connection mode Default: disable⚠ sslmode='disable' is insecure for production. Use 'require' or 'verify-full'. |
| connect_timeout | int | ○ | Connection timeout in seconds |
| application_name | string | ○ | Application name for connection identification |
| pool_min_conns | int | ○ | Minimum pool connections |
| pool_max_conns | int | ○ | Maximum pool connections |
Example Configuration:
connections:
dev:
type: "postgresql"
host: "localhost"
port: 5432
user: "dev"
password: "devpass"
database: "customerdb"
sslmode: "disable"
target_schema: "daana_dw"⚠ PostgreSQL is the most thoroughly tested dialect with Daana
⚠ For production, always set sslmode to 'require' or 'verify-full'
BigQuery
Google BigQuery is a serverless, highly scalable cloud data warehouse. It excels at analytical queries on massive datasets with pay-per-query pricing.
When to use:
- Large-scale analytics workloads (terabytes to petabytes)
- When using Google Cloud Platform
- Serverless operation with no infrastructure management
- Pay-per-query pricing model preferred
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
| project_id | string | ✓ | Google Cloud project ID |
| dataset | string | ✓ | BigQuery dataset name |
| location | string | ✓ | Dataset location/region |
Authentication Options
| Field | Type | Required | Description |
|---|---|---|---|
| credentials_file | string | ○ | Path to service account JSON key ⚠ Use environment variables for credential paths in production. |
| use_adc | bool | ○ | Use Application Default Credentials |
Optional Fields
| Field | Type | Required | Description |
|---|---|---|---|
| timeout | int | ○ | Query timeout in seconds |
Example Configuration:
connections:
bigquery-prod:
type: "bigquery"
project_id: "my-gcp-project"
dataset: "daana_dw"
location: "US"
credentials_file: "${BQ_CREDENTIALS_FILE}"⚠ Service account needs BigQuery Data Editor and BigQuery Job User roles
⚠ Either credentials_file or use_adc is required for authentication
Microsoft SQL Server
Microsoft SQL Server is an enterprise RDBMS widely used in Windows environments with deep integration with Microsoft tools and services.
When to use:
- Organization uses Microsoft SQL Server infrastructure
- Integration with .NET applications and Microsoft stack
- Existing SQL Server databases as source data
- Azure SQL Database deployments
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
| host | string | ✓ | SQL Server hostname |
| port | int | ✓ | SQL Server port Default: 1433 |
| user | string | ✓ | Database username |
| database | string | ✓ | Database name |
Optional Fields
| Field | Type | Required | Description |
|---|---|---|---|
| password | string | ○ | Database password ⚠ Never hardcode passwords. Use ${VAR_NAME} syntax. |
| encrypt | bool | ○ | Enable connection encryption ⚠ Enable encryption for production: encrypt: true |
| trust_server_certificate | bool | ○ | Trust server certificate without validation ⚠ Set to false in production and use proper certificates. |
| connection_timeout | int | ○ | Connection timeout in seconds |
Example Configuration:
connections:
mssql-prod:
type: "mssql"
host: "sqlserver.example.com"
port: 1433
user: "${MSSQL_USER}"
password: "${MSSQL_PASSWORD}"
database: "DataWarehouse"
encrypt: true
trust_server_certificate: false
target_schema: "daana_dw"⚠ MSSQL uses T-SQL which differs from standard SQL in some areas
⚠ Enable encryption and use trusted certificates in production
Oracle
Oracle Database is a multi-model enterprise RDBMS widely used in large organizations for mission-critical workloads.
When to use:
- Existing Oracle database infrastructure
- Enterprise environments with Oracle licensing
- Mission-critical workloads requiring high availability
- Financial services and regulated industries
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
| host | string | ✓ | Oracle server hostname |
| port | int | ✓ | Oracle listener port Default: 1521 |
| user | string | ✓ | Database username |
Authentication Options
| Field | Type | Required | Description |
|---|---|---|---|
| service_name | string | ○ | Oracle service name |
| sid | string | ○ | Oracle SID |
Optional Fields
| Field | Type | Required | Description |
|---|---|---|---|
| password | string | ○ | Database password ⚠ Never hardcode passwords. Use ${VAR_NAME} syntax. |
| connect_timeout | int | ○ | Connection timeout in seconds |
Example Configuration:
connections:
oracle-prod:
type: "oracle"
host: "oracle.example.com"
port: 1521
user: "${ORACLE_USER}"
password: "${ORACLE_PASSWORD}"
service_name: "ORCLPDB1"
target_schema: "DAANA_DW"⚠ Requires Oracle client libraries installed on the system
⚠ Either service_name or sid is required (but not both)
⚠ Oracle schema names are typically UPPERCASE
Snowflake
Snowflake is a cloud-native data platform that separates storage and compute, providing near-unlimited scalability with per-second billing.
When to use:
- Need massive scale with automatic performance tuning
- Multi-cloud strategy (AWS, Azure, GCP)
- Working with semi-structured data (JSON, Avro, Parquet)
- Instant scaling for variable workloads
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
| account | string | ✓ | Snowflake account identifier |
| user | string | ✓ | Snowflake username |
| warehouse | string | ✓ | Compute warehouse name |
| database | string | ✓ | Snowflake database name |
| schema | string | ✓ | Snowflake schema name |
Authentication Options
| Field | Type | Required | Description |
|---|---|---|---|
| password | string | ○ | Snowflake password ⚠ Never hardcode passwords. Use ${VAR_NAME} syntax. |
| private_key_path | string | ○ | Path to RSA private key |
Optional Fields
| Field | Type | Required | Description |
|---|---|---|---|
| role | string | ○ | Snowflake role to use |
| region | string | ○ | Snowflake region |
Example Configuration:
connections:
snowflake-prod:
type: "snowflake"
account: "xy12345.us-east-1.aws"
user: "${SNOWFLAKE_USER}"
password: "${SNOWFLAKE_PASSWORD}"
warehouse: "DAANA_WH"
database: "ANALYTICS"
schema: "DAANA_DW"
role: "TRANSFORMER"⚠ Account identifier should be in format 'account.region.cloud'
⚠ Warehouse must be running for queries (costs apply)
⚠ Either password or private_key_path is required for authentication
Using Connection Profiles
Connection profiles are used with commands that interact with your data warehouse:
# Install framework (sets up both metadata DB and data warehouse)
daana-cli install --connection-profiles connections.yaml --profile dev
# Deploy transformations
daana-cli deploy workflow.yaml --connection-profiles connections.yaml --profile production
# Execute workflows
daana-cli execute --workflow-id 123 --connection-profiles connections.yaml --profile productionProfile Selection
When multiple profiles exist:
# Explicitly select a profile
daana-cli check connections --connection dev
# If only one profile exists, it's auto-selected
daana-cli check connectionsValidating Connection Profiles
Before deploying, validate your connection profiles to catch configuration errors:
# Validate all profiles in the file
daana-cli check connections
# Validate a specific profile
daana-cli check connections --connection dev
# Strict mode (treat warnings as errors)
daana-cli check connections --connection prod --strict
# JSON output for CI/CD
daana-cli check connections --format jsonWhat Validation Checks
The validator verifies:
| Category | What's Checked |
|---|---|
| Structure | YAML syntax, required fields present |
| Dialect | Valid database type, dialect-specific required fields |
| Security | Hardcoded credentials (warns), SSL/encryption settings |
| Files | Credential files exist (for BigQuery, Snowflake key-pair) |
Example Validation Output
Valid profile with warnings:
✓ Profile 'dev' valid
Type: postgresql
⚠ Profile 'dev' warnings:
• [dev] root.sslmode='disable' is insecure for production. Use 'require' or 'verify-full'Invalid profile:
✘ Profile 'prod' errors:
• [prod] root: PostgreSQL requires field 'port'
• [prod] root: PostgreSQL requires field 'user'
• [prod] root: PostgreSQL requires field 'database'Generating a Template
Generate a connections template with examples for all supported dialects:
daana-cli generate connections -o connections.yamlThis creates a well-commented template with:
- Development PostgreSQL profile (uncommented, ready to use)
- Production PostgreSQL example (commented)
- Examples for BigQuery, MSSQL, Oracle, Snowflake (commented)
Complete Example
Here's a complete connections file for a multi-environment setup:
# Connection profiles for my-project
connections:
# Development - local Docker PostgreSQL
dev:
type: "postgresql"
host: "localhost"
port: 5432
user: "dev"
password: "devpass"
database: "customerdb"
sslmode: "disable"
target_schema: "daana_dw"
# Staging - AWS RDS PostgreSQL
staging:
type: "postgresql"
host: "staging-db.example.com"
port: 5432
user: "${STAGING_DB_USER}"
password: "${STAGING_DB_PASSWORD}"
database: "analytics_staging"
sslmode: "require"
target_schema: "daana_dw"
# Production - AWS RDS PostgreSQL
production:
type: "postgresql"
host: "prod-db.example.com"
port: 5432
user: "${PROD_DB_USER}"
password: "${PROD_DB_PASSWORD}"
database: "analytics"
sslmode: "verify-full"
target_schema: "daana_dw"
connect_timeout: 30
pool_max_conns: 20Best Practices
- Use environment variables for secrets (${VAR_NAME} syntax)
- Enable SSL/TLS encryption for production databases
- Create separate profiles for dev, staging, and production
- Validate profiles before deployment with 'daana-cli check connections'
- Use least-privilege database credentials
- Document profiles with comments explaining their purpose
- Keep connections.yaml in version control (secrets are in env vars)
- Use target_schema to separate Daana output from source data
Next Steps
- Define your workflow to orchestrate transformations
- Create mappings to define data transformations
- Follow the tutorial for a complete end-to-end example