Tutorial

Tutorial: Transform Olist E-Commerce Data

Welcome to the Daana CLI tutorial! In this 30-minute hands-on guide, you'll transform real e-commerce data using Daana CLI.

Before starting: Make sure you've completed the Setup Guide. You should have:

  • ✅ Databases running (customerdb on port 5432, internaldb on port 5434)
  • ✅ Olist data loaded (~99,441 orders in stage.olist_*_dataset tables)
  • Daana framework installed (daana-cli install completed successfully)
  • ✅ Working directory: olist-quickstart/

What You'll Learn

  • How to work with the Olist dataset (Brazilian e-commerce data)
  • How to define business models with proper definition and description fields
  • How to map raw operational data to business entities
  • How to iterate on your model without losing work
  • How to execute the complete transformation pipeline

What You'll Build

You'll transform raw Olist e-commerce data into clean business entities:

Before: Raw operational tables (stage.olist_orders_dataset, stage.olist_customers_dataset) After: Clean business entities (CUSTOMER, ORDER) in the daana_dw schema


Part 1: Understanding Your Source Data (5 min)

What is Olist?

Olist is a real Brazilian e-commerce dataset with:

  • 100,000+ orders from 2016-2018
  • 99,000+ customers across Brazil
  • 32,000+ products in various categories
  • Seller information
  • Order reviews and ratings

This is real operational data - messy, spread across multiple tables, with inconsistencies. Perfect for learning data transformation!

Verify Data is Loaded

If you completed the setup, Olist data should already be loaded. Let's verify:

Explore the Source Data

Let's see what we're working with:

# From olist-quickstart/ directory
# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdb

Inside psql:

-- See all schemas
\dn
 
-- See all tables in the stage schema (where Olist data lives)
\dt stage.*
 
-- Look at orders (note: tables are in 'stage' schema with '_dataset' suffix)
SELECT * FROM stage.olist_orders_dataset LIMIT 5;
 
-- Look at customers
SELECT * FROM stage.olist_customers_dataset LIMIT 5;
 
-- Count orders
SELECT COUNT(*) FROM stage.olist_orders_dataset;
-- Should show ~99,441 orders
 
-- Exit
\q

Important: All Olist tables are in the stage schema and have the _dataset suffix. If you just use \dt you won't see them because they're not in the default public schema!

Key Insight: This data is operational (how the system records transactions). We need to transform it into business entities (what analysts need to answer questions).

📊 Olist Table Schemas - What Columns Are Available?

Before we create mappings, let's understand what data we have to work with:

Customers table (stage.olist_customers_dataset):

\d stage.olist_customers_dataset

Key columns:

  • customer_id (text) - Unique customer identifier
  • customer_unique_id (text) - Alternative customer ID
  • customer_zip_code_prefix (integer) - 5-digit postal code
  • customer_city (text) - City name
  • customer_state (text) - Two-letter state code (SP, RJ, etc.)

Orders table (stage.olist_orders_dataset):

\d stage.olist_orders_dataset

Key columns:

  • order_id (text) - Unique order identifier
  • customer_id (text) - Links to customer
  • order_status (text) - Order status (delivered, shipped, etc.)
  • order_purchase_timestamp (timestamp) - When order was placed
  • order_delivered_customer_date (timestamp) - When order was delivered

Pro Tip: You'll map these source columns to your business model attributes in Part 3!


Part 2: Understand Your Business Model (10 min)

Understanding Models

A model defines your business entities - what matters to your business, not how the database stores it.

Critical: Definition and Description

IMPORTANT: Always include definition and description fields! These are not optional:

  • definition: A short business definition (1-2 sentences)
  • description: Detailed explanation for analysts and developers

These fields provide crucial business context and make your model self-documenting.

View the Business Model

The olist-quickstart project includes a complete business model. Let's look at it:

# From olist-quickstart/ directory
cat model.yaml

You'll see a complete model with two entities: CUSTOMER and ORDER. Here's the structure:

model:
  id: "OLIST_BUSINESS_MODEL"
  name: "OlistBusinessModel"
  definition: "Business representation of Olist e-commerce platform"
  description: "Transforms raw Olist operational data into clean business entities for analytics"
 
  entities:
    # =========================================================================
    # CUSTOMER - Individual shoppers on the Olist platform
    # =========================================================================
    - id: "CUSTOMER"
      name: "CUSTOMER"
      definition: "A customer who makes purchases on the Olist platform"
      description: "Represents individual customers with their contact information and location"
      attributes:
        - id: "CUSTOMER_ID"
          name: "CUSTOMER_ID"
          definition: "Unique identifier for the customer"
          description: "Primary key from the olist_customers table"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_CITY"
          name: "CUSTOMER_CITY"
          definition: "City where the customer is located"
          description: "Customer's city for delivery and analysis"
          type: "STRING"
          effective_timestamp: true
 
        - id: "CUSTOMER_STATE"
          name: "CUSTOMER_STATE"
          definition: "State where the customer is located"
          description: "Two-letter Brazilian state code (e.g., SP, RJ)"
          type: "STRING"
          effective_timestamp: true
 
    # =========================================================================
    # ORDER - Purchase transactions made by customers
    # =========================================================================
    - id: "ORDER"
      name: "ORDER"
      definition: "A purchase order placed by a customer"
      description: "Represents individual orders with their status and timestamps"
      attributes:
        - id: "ORDER_ID"
          name: "ORDER_ID"
          definition: "Unique identifier for the order"
          description: "Primary key from the olist_orders table"
          type: "STRING"
          effective_timestamp: true
 
        - id: "ORDER_STATUS"
          name: "ORDER_STATUS"
          definition: "Current status of the order"
          description: "Status like delivered, shipped, canceled, processing"
          type: "STRING"
          effective_timestamp: true
 
        - id: "ORDER_PURCHASE_TS"
          name: "ORDER_PURCHASE_TS"
          definition: "When the order was placed"
          description: "Timestamp when customer completed the purchase"
          type: "START_TIMESTAMP"
 
        - id: "ORDER_DELIVERED_TS"
          name: "ORDER_DELIVERED_TS"
          definition: "When the order was delivered"
          description: "Timestamp when order was delivered to the customer"
          type: "END_TIMESTAMP"
 
        # Grouped attribute example: Order value with currency
        - id: "ORDER_VALUE"
          name: "ORDER_VALUE"
          definition: "Total monetary value of the order"
          description: "Complete order value including amount and currency denomination"
          effective_timestamp: true
          group:
            - id: "ORDER_VALUE_AMOUNT"
              name: "ORDER_VALUE_AMOUNT"
              definition: "Monetary value of the order"
              description: "Total order value in the specified currency"
              type: "NUMBER"
            - id: "ORDER_VALUE_CURRENCY"
              name: "ORDER_VALUE_CURRENCY"
              definition: "Currency of the order value"
              description: "Three-letter currency code (typically BRL)"
              type: "UNIT"
 
  # ===========================================================================
  # RELATIONSHIPS - How entities connect to each other
  # ===========================================================================
  relationships:
    - name: "IS_PLACED_BY"
      definition: "Order is placed by a customer"
      description: "Links orders to the customer who placed them. One customer can place many orders."
      source_entity_id: "ORDER"
      target_entity_id: "CUSTOMER"

🔑 Key Concepts Explained

id vs name:

  • id: Technical identifier used internally (must be unique, uppercase)
  • name: Display name (often same as id for clarity)
  • Both are required for all entities, attributes, and relationships

effective_timestamp: true:

  • Tracks how this attribute changes over time
  • Enables historical analysis ("What was this customer's city in 2017?")
  • Essential for slowly changing dimensions

Timestamp Types:

  • START_TIMESTAMP: Use for any single timestamp, when an event occurs, or when something begins
  • END_TIMESTAMP: When something ends or completes (always used together with START_TIMESTAMP)

When to use each type:

  • Single timestamp: Use START_TIMESTAMP
    • Example: Transaction timestamp, event occurrence, order placed time
    • This is the generic timestamp type - use it whenever you have a single point in time
  • Period of time (duration): Use both START_TIMESTAMP + END_TIMESTAMP together
    • Example: Order lifecycle (placed → delivered), trip duration (A → B), customer status period
    • START_TIMESTAMP marks when it begins, END_TIMESTAMP marks when it ends

Why this matters: These types tell Daana's transformation engine how to handle temporal data correctly. START_TIMESTAMP is your default choice for any timestamp attribute.

Grouped Attributes (see full model for example):

  • Groups related fields together (e.g., ORDER_VALUE includes amount + currency)
  • Keeps context for related data points

Why Definition and Description Matter

Notice how each field has both definition and description:

  • Analysts can understand what "ORDER_STATUS" means without asking
  • New team members can onboard faster
  • Data governance is built into your model
  • Documentation stays in sync with your code

Validate Your Model

Before proceeding, let's validate the model structure:

# From olist-quickstart/ directory
daana-cli check model model.yaml

Checkpoint: You should see:

✓ Model validation passed
  - 2 entities validated
  - 1 relationship validated
  - All definitions and descriptions present

What just happened? The check command validates:

  • YAML structure and syntax
  • Entity and attribute definitions
  • Relationship references
  • Naming conventions
  • Required fields (definition, description)

The model is now ready to be used in your workflow!


Part 3: Map Source Data to Your Model (10 min)

Now we'll map the Olist tables to your business entities.

Generate Mapping Templates

Daana can generate templates for you from your model. You can generate templates for all entities at once or one at a time:

# From olist-quickstart/ directory
 
# Option 1: Generate templates for ALL entities in the model
daana-cli generate mapping \
  --model model.yaml \
  --all-entities \
  --out-dir mappings/
 
# Option 2: Generate template for a specific entity
daana-cli generate mapping \
  --model model.yaml \
  --entity CUSTOMER \
  -o mappings/customer-mapping.yaml

Tip: For models with many entities, use --all-entities with --out-dir to generate all mapping templates in one command. This is much faster than generating them one by one!

The generated templates include placeholders for:

  • Source table configuration
  • Column mappings for all entity attributes
  • Transformation expressions
  • Effective timestamp handling

You'll see: Mapping templates generated successfully (or Mapping template generated successfully for single entity)

Edit the Mapping

Open mappings/customer-mapping.yaml and configure it:

entity_id: CUSTOMER
 
mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false
 
    tables:
      - connection: dev
        table: stage.olist_customers_dataset
 
        primary_keys:
          - customer_id
 
        ingestion_strategy: FULL
 
        # Default timestamp for all attributes with effective_timestamp: true
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP
 
        attributes:
          - id: CUSTOMER_ID
            transformation_expression: customer_id
 
          - id: CUSTOMER_CITY
            transformation_expression: customer_city
 
          - id: CUSTOMER_STATE
            transformation_expression: customer_state

Validate the Mapping

# Validate mapping references and transformations
daana-cli check mapping mappings/customer-mapping.yaml

Checkpoint: You should see "✓ Mapping validation passed"

What was validated?

  • Entity and attribute references match the model
  • Transformation expressions are syntactically valid
  • Table and connection configurations are correct
  • All required fields are present

Do the Same for ORDER

# Generate template
daana-cli generate mapping \
  --model model.yaml \
  --entity ORDER \
  -o mappings/order-mapping.yaml

Edit mappings/order-mapping.yaml:

entity_id: ORDER
 
mapping_groups:
  - name: default_mapping_group
    allow_multiple_identifiers: false
 
    tables:
      - connection: dev
        table: stage.olist_orders_dataset
 
        primary_keys:
          - order_id
 
        ingestion_strategy: FULL
 
        # Default timestamp for all attributes with effective_timestamp: true
        entity_effective_timestamp_expression: CURRENT_TIMESTAMP
 
        attributes:
          - id: ORDER_ID
            transformation_expression: order_id
 
          - id: ORDER_STATUS
            transformation_expression: order_status
 
          - id: ORDER_PURCHASE_TS
            transformation_expression: order_purchase_timestamp
 
          - id: ORDER_DELIVERED_TS
            transformation_expression: order_delivered_customer_date
 
      - connection: dev
        table: stage.olist_order_payments_dataset_v2
 
        primary_keys:
          - order_id
 
        ingestion_strategy: FULL
 
        attributes:
          - id: ORDER_VALUE_AMOUNT
            transformation_expression: "round(sum(payment_value) over (partition by order_id)::numeric,2)"
 
          - id: ORDER_VALUE_CURRENCY
            transformation_expression: "'BRL'"
 
    relationships:
      - id: ORDER_IS_PLACED_BY_CUSTOMER
        source_table: stage.olist_orders_dataset
        target_transformation_expression: customer_id

Then validate:

daana-cli check mapping mappings/order-mapping.yaml

Checkpoint: Mapping validation passed! Your mappings are ready to be deployed.


Part 4: Iterating on Your Model (5 min)

Real-World Scenario

You realize you forgot to add CUSTOMER_ZIP_CODE! Let's add it without losing our mapping work.

1. Update Your Model

Edit model.yaml and add to CUSTOMER attributes.

We will now add the CUSTOMER_ZIP_CODE into the model.yaml. Copy "CUSTOMER_ZIP_CODE" information below and add it to your model after "CUSTOMER_STATE":

- id: "CUSTOMER_ZIP_CODE"
  name: "CUSTOMER_ZIP_CODE"
  definition: "Customer's postal code"
  description: "5-digit Brazilian postal code (CEP) for delivery routing"
  type: "STRING"
  effective_timestamp: false

2. Validate Updated Model

daana-cli check model model.yaml

Checkpoint: Model validation passed with the new attribute!

3. Merge Your Mapping

This is the magic command that preserves your existing mappings:

daana-cli merge mapping \
  --model model.yaml \
  --mapping mappings/customer-mapping.yaml \
  --out mappings/customer-mapping-updated.yaml \
  --diff

This:

  • ✅ Keeps all your existing attribute mappings
  • ✅ Adds a placeholder for the new CUSTOMER_ZIP_CODE attribute
  • ✅ Preserves your identifier mappings
  • ✅ Shows a diff of what changed (with --diff flag)

4. Complete the New Mapping

Edit mappings/customer-mapping-updated.yaml and verify the CUSTOMER_ZIP_CODE mapping:

- id: "CUSTOMER_ZIP_CODE"
  transformation_expression: "customer_zip_code_prefix"

5. Validate the Updated Mapping

# Replace the old mapping with the updated one
mv mappings/customer-mapping-updated.yaml mappings/customer-mapping.yaml
 
# Validate it
daana-cli check mapping mappings/customer-mapping.yaml

Success! You've updated your model and mapping without starting over.

Key Takeaway: The merge command is essential for iterative development. Use it every time you update your model! The changes will be applied when you run deploy in the next section.


Part 5: Create and Execute Workflow (10 min)

Review Your Project Files

The init --example olist command already created the configuration files you need:

  • connections.yaml - Database connection profiles (already configured for local dev)
  • workflow.yaml - Workflow orchestration (already references model.yaml)

Let's look at what's already set up:

# View connection profiles
cat connections.yaml

You'll see the dev profile already configured to connect to your local customerdb.

What is a Workflow?

A workflow orchestrates the entire transformation:

  • Which model to use
  • Which mappings to apply
  • Where source data lives (via connection profiles)
  • Where to write transformed data

View the Workflow Configuration

# From olist-quickstart/ directory
cat workflow.yaml

The workflow is already configured:

workflow:
  id: "OLIST_TRANSFORMATION_WORKFLOW"
  name: "OlistTransformationWorkflow"
  definition: "Transform Olist data to business entities"
  description: |
    This workflow reads from Olist operational tables and creates
    clean business entities in the daana_dw schema.
 
  model:
    file: "model.yaml"
 
  mappings:
    - "mappings/customer-mapping.yaml"
    - "mappings/order-mapping.yaml"
 
  connection: "dev"  # References connections.yaml
 
  advanced:
    batch_column: ""
    read_logic: ""

Note: The workflow file already references the mappings we created in Part 3. When you generate new mappings, you'll need to add them to the mappings list.

Validate Your Workflow

# From olist-quickstart/ directory
daana-cli check workflow

Note: The check workflow command auto-detects workflow.yaml in the current directory.

Checkpoint: You should see "✓ Workflow validation passed"

What was validated?

  • Model file exists and is valid
  • All mapping files exist and are valid
  • Connection profile is properly configured
  • All entity references are consistent

Deploy Transformation Infrastructure

Now deploy your workflow! This command:

  • Compiles and materializes your model, mappings, and workflow to the internal database
  • Creates database tables and stored procedures in your data warehouse
  • Sets up the complete transformation infrastructure
# From olist-quickstart/ directory
daana-cli deploy

Note: The deploy command auto-detects workflow.yaml in the current directory and reads connection settings from connections.yaml. No extra flags needed when working from the project directory!

You'll see output showing:

  • Model being processed and materialized
  • Mappings being compiled and loaded
  • Workflow being registered
  • Database objects being created (tables, views, stored procedures)

Checkpoint: "Deployment completed successfully"

What just happened? The deploy command automatically:

  1. Compiled and materialized your YAML model to the internal database
  2. Compiled and materialized your YAML mappings
  3. Registered the workflow
  4. Created all necessary database structures in your data warehouse

Everything happens in one command - no separate compile/materialize steps needed!

Execute the Transformation!

Now for the moment of truth - transform your data!

# From olist-quickstart/ directory
daana-cli execute

Note: Like deploy, the execute command auto-detects workflow.yaml and uses your connection settings automatically.

This will:

  1. Extract data from Olist source tables
  2. Transform according to your mappings
  3. Load into daana_dw schema in your data warehouse

You'll see:

Workflow execution completed successfully

Checkpoint: "Workflow execution completed successfully"


Part 6: View Your Results! (2 min)

See Your Transformed Data

# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdb

Inside psql:

-- See views to access the transformed data
\dv daana_dw.*
 
-- Query your clean CUSTOMER entity
SELECT * FROM daana_dw.view_customer LIMIT 10;
 
-- Count customers
SELECT COUNT(*) FROM daana_dw.view_customer;
 
-- See orders
SELECT * FROM daana_dw.view_order LIMIT 10;
 
-- Join customers and orders
SELECT
  c."CUSTOMER_CITY",
  c."CUSTOMER_STATE",
  COUNT(*) as order_count
FROM daana_dw.view_customer c
JOIN daana_dw.view_order_customer_x x ON c.customer_key=x.customer_key
JOIN daana_dw.view_order o ON x.order_key = o.order_key
GROUP BY c."CUSTOMER_CITY", c."CUSTOMER_STATE"
ORDER BY order_count DESC
LIMIT 10;

🎉 Congratulations! You've transformed raw operational data into clean business entities!


What You Accomplished

✅ Loaded real-world e-commerce data (Olist) ✅ Defined business entities with proper definitions and descriptions ✅ Mapped source tables to your model ✅ Learned to iterate on models without losing work ✅ Executed a complete transformation pipeline ✅ Created queryable business entities in daana_dw schema

Understanding What Happened

Before

  • Raw tables: stage.olist_customers_dataset, stage.olist_orders_dataset, stage.olist_products_dataset
  • Operational structure (how the system works)
  • Mixed concerns, denormalized, messy

After

  • Clean tables: daana_dw.customer, daana_dw.order
  • Business structure (what analysts need)
  • Self-documenting with definitions and descriptions
  • Ready for BI tools!

Next Steps

Explore the Olist Dataset Further

Try adding more entities:

  • PRODUCT (from olist_products)
  • SELLER (from olist_sellers)
  • REVIEW (from olist_order_reviews)

Learn More About DMDL

Connect Your BI Tool

Your transformed data in daana_dw is ready for:

  • Tableau
  • Power BI
  • Looker
  • Metabase

Connection details:

Host: localhost
Port: 5432
Database: customerdb
Schema: daana_dw
User: dev
Password: devpass

Experiment!

Try these challenges:

  1. Add CUSTOMER_NAME by concatenating first and last name
  2. Calculate ORDER_TOTAL from order items
  3. Add a PRODUCT entity
  4. Create a relationship between ORDER and PRODUCT

Troubleshooting

No data in daana_dw tables

Check that:

  • Source tables have data: SELECT COUNT(*) FROM stage.olist_customers_dataset;
  • Mappings are correct: Review your *-mapping.yaml files
  • Workflow references the right mappings

Compilation errors

  • "Entity not found": Check entity names match exactly (case-sensitive)
  • "Invalid attribute type": Use STRING, NUMBER, TIMESTAMP, BOOLEAN
  • "Missing definition": All entities and attributes need definition and description fields!

Execution fails

  • Check connection profile points to correct database
  • Verify source tables exist: \dt in psql
  • Check logs: docker logs daana-customerdb

Need Help?


Ready to build your own models? You now have all the skills to transform any operational data into clean business entities with Daana CLI!