Tutorial: Transform Olist E-Commerce Data
Welcome to the Daana CLI tutorial! In this 30-minute hands-on guide, you'll transform real e-commerce data using Daana CLI.
Before starting: Make sure you've completed the Setup Guide. You should have:
- ✅ Databases running (customerdb on port 5432, internaldb on port 5434)
- ✅ Olist data loaded (~99,441 orders in stage.olist_*_dataset tables)
- ✅ Daana framework installed (
daana-cli installcompleted successfully)- ✅ Working directory:
olist-quickstart/
What You'll Learn
- How to work with the Olist dataset (Brazilian e-commerce data)
- How to define business models with proper definition and description fields
- How to map raw operational data to business entities
- How to iterate on your model without losing work
- How to execute the complete transformation pipeline
What You'll Build
You'll transform raw Olist e-commerce data into clean business entities:
Before: Raw operational tables (stage.olist_orders_dataset, stage.olist_customers_dataset)
After: Clean business entities (CUSTOMER, ORDER) in the daana_dw schema
Part 1: Understanding Your Source Data (5 min)
What is Olist?
Olist is a real Brazilian e-commerce dataset with:
- 100,000+ orders from 2016-2018
- 99,000+ customers across Brazil
- 32,000+ products in various categories
- Seller information
- Order reviews and ratings
This is real operational data - messy, spread across multiple tables, with inconsistencies. Perfect for learning data transformation!
Verify Data is Loaded
If you completed the setup, Olist data should already be loaded. Let's verify:
Explore the Source Data
Let's see what we're working with:
# From olist-quickstart/ directory
# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdbInside psql:
-- See all schemas
\dn
-- See all tables in the stage schema (where Olist data lives)
\dt stage.*
-- Look at orders (note: tables are in 'stage' schema with '_dataset' suffix)
SELECT * FROM stage.olist_orders_dataset LIMIT 5;
-- Look at customers
SELECT * FROM stage.olist_customers_dataset LIMIT 5;
-- Count orders
SELECT COUNT(*) FROM stage.olist_orders_dataset;
-- Should show ~99,441 orders
-- Exit
\qImportant: All Olist tables are in the
stageschema and have the_datasetsuffix. If you just use\dtyou won't see them because they're not in the defaultpublicschema!
Key Insight: This data is operational (how the system records transactions). We need to transform it into business entities (what analysts need to answer questions).
📊 Olist Table Schemas - What Columns Are Available?
Before we create mappings, let's understand what data we have to work with:
Customers table (stage.olist_customers_dataset):
\d stage.olist_customers_datasetKey columns:
customer_id(text) - Unique customer identifiercustomer_unique_id(text) - Alternative customer IDcustomer_zip_code_prefix(integer) - 5-digit postal codecustomer_city(text) - City namecustomer_state(text) - Two-letter state code (SP, RJ, etc.)
Orders table (stage.olist_orders_dataset):
\d stage.olist_orders_datasetKey columns:
order_id(text) - Unique order identifiercustomer_id(text) - Links to customerorder_status(text) - Order status (delivered, shipped, etc.)order_purchase_timestamp(timestamp) - When order was placedorder_delivered_customer_date(timestamp) - When order was delivered
Pro Tip: You'll map these source columns to your business model attributes in Part 3!
Part 2: Understand Your Business Model (10 min)
Understanding Models
A model defines your business entities - what matters to your business, not how the database stores it.
Critical: Definition and Description
IMPORTANT: Always include
definitionanddescriptionfields! These are not optional:
definition: A short business definition (1-2 sentences)description: Detailed explanation for analysts and developersThese fields provide crucial business context and make your model self-documenting.
View the Business Model
The olist-quickstart project includes a complete business model. Let's look at it:
# From olist-quickstart/ directory
cat model.yamlYou'll see a complete model with two entities: CUSTOMER and ORDER. Here's the structure:
model:
id: "OLIST_BUSINESS_MODEL"
name: "OlistBusinessModel"
definition: "Business representation of Olist e-commerce platform"
description: "Transforms raw Olist operational data into clean business entities for analytics"
entities:
# =========================================================================
# CUSTOMER - Individual shoppers on the Olist platform
# =========================================================================
- id: "CUSTOMER"
name: "CUSTOMER"
definition: "A customer who makes purchases on the Olist platform"
description: "Represents individual customers with their contact information and location"
attributes:
- id: "CUSTOMER_ID"
name: "CUSTOMER_ID"
definition: "Unique identifier for the customer"
description: "Primary key from the olist_customers table"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_CITY"
name: "CUSTOMER_CITY"
definition: "City where the customer is located"
description: "Customer's city for delivery and analysis"
type: "STRING"
effective_timestamp: true
- id: "CUSTOMER_STATE"
name: "CUSTOMER_STATE"
definition: "State where the customer is located"
description: "Two-letter Brazilian state code (e.g., SP, RJ)"
type: "STRING"
effective_timestamp: true
# =========================================================================
# ORDER - Purchase transactions made by customers
# =========================================================================
- id: "ORDER"
name: "ORDER"
definition: "A purchase order placed by a customer"
description: "Represents individual orders with their status and timestamps"
attributes:
- id: "ORDER_ID"
name: "ORDER_ID"
definition: "Unique identifier for the order"
description: "Primary key from the olist_orders table"
type: "STRING"
effective_timestamp: true
- id: "ORDER_STATUS"
name: "ORDER_STATUS"
definition: "Current status of the order"
description: "Status like delivered, shipped, canceled, processing"
type: "STRING"
effective_timestamp: true
- id: "ORDER_PURCHASE_TS"
name: "ORDER_PURCHASE_TS"
definition: "When the order was placed"
description: "Timestamp when customer completed the purchase"
type: "START_TIMESTAMP"
- id: "ORDER_DELIVERED_TS"
name: "ORDER_DELIVERED_TS"
definition: "When the order was delivered"
description: "Timestamp when order was delivered to the customer"
type: "END_TIMESTAMP"
# Grouped attribute example: Order value with currency
- id: "ORDER_VALUE"
name: "ORDER_VALUE"
definition: "Total monetary value of the order"
description: "Complete order value including amount and currency denomination"
effective_timestamp: true
group:
- id: "ORDER_VALUE_AMOUNT"
name: "ORDER_VALUE_AMOUNT"
definition: "Monetary value of the order"
description: "Total order value in the specified currency"
type: "NUMBER"
- id: "ORDER_VALUE_CURRENCY"
name: "ORDER_VALUE_CURRENCY"
definition: "Currency of the order value"
description: "Three-letter currency code (typically BRL)"
type: "UNIT"
# ===========================================================================
# RELATIONSHIPS - How entities connect to each other
# ===========================================================================
relationships:
- name: "IS_PLACED_BY"
definition: "Order is placed by a customer"
description: "Links orders to the customer who placed them. One customer can place many orders."
source_entity_id: "ORDER"
target_entity_id: "CUSTOMER"🔑 Key Concepts Explained
id vs name:
id: Technical identifier used internally (must be unique, uppercase)name: Display name (often same as id for clarity)- Both are required for all entities, attributes, and relationships
effective_timestamp: true:
- Tracks how this attribute changes over time
- Enables historical analysis ("What was this customer's city in 2017?")
- Essential for slowly changing dimensions
Timestamp Types:
START_TIMESTAMP: Use for any single timestamp, when an event occurs, or when something beginsEND_TIMESTAMP: When something ends or completes (always used together with START_TIMESTAMP)
When to use each type:
- Single timestamp: Use
START_TIMESTAMP- Example: Transaction timestamp, event occurrence, order placed time
- This is the generic timestamp type - use it whenever you have a single point in time
- Period of time (duration): Use both
START_TIMESTAMP+END_TIMESTAMPtogether- Example: Order lifecycle (placed → delivered), trip duration (A → B), customer status period
- START_TIMESTAMP marks when it begins, END_TIMESTAMP marks when it ends
Why this matters: These types tell Daana's transformation engine how to handle temporal data correctly. START_TIMESTAMP is your default choice for any timestamp attribute.
Grouped Attributes (see full model for example):
- Groups related fields together (e.g., ORDER_VALUE includes amount + currency)
- Keeps context for related data points
Why Definition and Description Matter
Notice how each field has both definition and description:
- Analysts can understand what "ORDER_STATUS" means without asking
- New team members can onboard faster
- Data governance is built into your model
- Documentation stays in sync with your code
Validate Your Model
Before proceeding, let's validate the model structure:
# From olist-quickstart/ directory
daana-cli check model model.yaml✅ Checkpoint: You should see:
✓ Model validation passed
- 2 entities validated
- 1 relationship validated
- All definitions and descriptions presentWhat just happened? The check command validates:
- YAML structure and syntax
- Entity and attribute definitions
- Relationship references
- Naming conventions
- Required fields (definition, description)
The model is now ready to be used in your workflow!
Part 3: Map Source Data to Your Model (10 min)
Now we'll map the Olist tables to your business entities.
Generate Mapping Templates
Daana can generate templates for you from your model. You can generate templates for all entities at once or one at a time:
# From olist-quickstart/ directory
# Option 1: Generate templates for ALL entities in the model
daana-cli generate mapping \
--model model.yaml \
--all-entities \
--out-dir mappings/
# Option 2: Generate template for a specific entity
daana-cli generate mapping \
--model model.yaml \
--entity CUSTOMER \
-o mappings/customer-mapping.yamlTip: For models with many entities, use
--all-entitieswith--out-dirto generate all mapping templates in one command. This is much faster than generating them one by one!
The generated templates include placeholders for:
- Source table configuration
- Column mappings for all entity attributes
- Transformation expressions
- Effective timestamp handling
You'll see: Mapping templates generated successfully (or Mapping template generated successfully for single entity)
Edit the Mapping
Open mappings/customer-mapping.yaml and configure it:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
- connection: dev
table: stage.olist_customers_dataset
primary_keys:
- customer_id
ingestion_strategy: FULL
# Default timestamp for all attributes with effective_timestamp: true
entity_effective_timestamp_expression: CURRENT_TIMESTAMP
attributes:
- id: CUSTOMER_ID
transformation_expression: customer_id
- id: CUSTOMER_CITY
transformation_expression: customer_city
- id: CUSTOMER_STATE
transformation_expression: customer_stateValidate the Mapping
# Validate mapping references and transformations
daana-cli check mapping mappings/customer-mapping.yaml✅ Checkpoint: You should see "✓ Mapping validation passed"
What was validated?
- Entity and attribute references match the model
- Transformation expressions are syntactically valid
- Table and connection configurations are correct
- All required fields are present
Do the Same for ORDER
# Generate template
daana-cli generate mapping \
--model model.yaml \
--entity ORDER \
-o mappings/order-mapping.yamlEdit mappings/order-mapping.yaml:
entity_id: ORDER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
- connection: dev
table: stage.olist_orders_dataset
primary_keys:
- order_id
ingestion_strategy: FULL
# Default timestamp for all attributes with effective_timestamp: true
entity_effective_timestamp_expression: CURRENT_TIMESTAMP
attributes:
- id: ORDER_ID
transformation_expression: order_id
- id: ORDER_STATUS
transformation_expression: order_status
- id: ORDER_PURCHASE_TS
transformation_expression: order_purchase_timestamp
- id: ORDER_DELIVERED_TS
transformation_expression: order_delivered_customer_date
- connection: dev
table: stage.olist_order_payments_dataset_v2
primary_keys:
- order_id
ingestion_strategy: FULL
attributes:
- id: ORDER_VALUE_AMOUNT
transformation_expression: "round(sum(payment_value) over (partition by order_id)::numeric,2)"
- id: ORDER_VALUE_CURRENCY
transformation_expression: "'BRL'"
relationships:
- id: ORDER_IS_PLACED_BY_CUSTOMER
source_table: stage.olist_orders_dataset
target_transformation_expression: customer_idThen validate:
daana-cli check mapping mappings/order-mapping.yaml✅ Checkpoint: Mapping validation passed! Your mappings are ready to be deployed.
Part 4: Iterating on Your Model (5 min)
Real-World Scenario
You realize you forgot to add CUSTOMER_ZIP_CODE! Let's add it without losing our mapping work.
1. Update Your Model
Edit model.yaml and add to CUSTOMER attributes.
We will now add the CUSTOMER_ZIP_CODE into the model.yaml. Copy "CUSTOMER_ZIP_CODE" information below and add it to your model after "CUSTOMER_STATE":
- id: "CUSTOMER_ZIP_CODE"
name: "CUSTOMER_ZIP_CODE"
definition: "Customer's postal code"
description: "5-digit Brazilian postal code (CEP) for delivery routing"
type: "STRING"
effective_timestamp: false2. Validate Updated Model
daana-cli check model model.yaml✅ Checkpoint: Model validation passed with the new attribute!
3. Merge Your Mapping
This is the magic command that preserves your existing mappings:
daana-cli merge mapping \
--model model.yaml \
--mapping mappings/customer-mapping.yaml \
--out mappings/customer-mapping-updated.yaml \
--diffThis:
- ✅ Keeps all your existing attribute mappings
- ✅ Adds a placeholder for the new CUSTOMER_ZIP_CODE attribute
- ✅ Preserves your identifier mappings
- ✅ Shows a diff of what changed (with --diff flag)
4. Complete the New Mapping
Edit mappings/customer-mapping-updated.yaml and verify the CUSTOMER_ZIP_CODE mapping:
- id: "CUSTOMER_ZIP_CODE"
transformation_expression: "customer_zip_code_prefix"5. Validate the Updated Mapping
# Replace the old mapping with the updated one
mv mappings/customer-mapping-updated.yaml mappings/customer-mapping.yaml
# Validate it
daana-cli check mapping mappings/customer-mapping.yaml✅ Success! You've updated your model and mapping without starting over.
Key Takeaway: The
mergecommand is essential for iterative development. Use it every time you update your model! The changes will be applied when you rundeployin the next section.
Part 5: Create and Execute Workflow (10 min)
Review Your Project Files
The init --example olist command already created the configuration files you need:
connections.yaml- Database connection profiles (already configured for local dev)workflow.yaml- Workflow orchestration (already references model.yaml)
Let's look at what's already set up:
# View connection profiles
cat connections.yamlYou'll see the dev profile already configured to connect to your local customerdb.
What is a Workflow?
A workflow orchestrates the entire transformation:
- Which model to use
- Which mappings to apply
- Where source data lives (via connection profiles)
- Where to write transformed data
View the Workflow Configuration
# From olist-quickstart/ directory
cat workflow.yamlThe workflow is already configured:
workflow:
id: "OLIST_TRANSFORMATION_WORKFLOW"
name: "OlistTransformationWorkflow"
definition: "Transform Olist data to business entities"
description: |
This workflow reads from Olist operational tables and creates
clean business entities in the daana_dw schema.
model:
file: "model.yaml"
mappings:
- "mappings/customer-mapping.yaml"
- "mappings/order-mapping.yaml"
connection: "dev" # References connections.yaml
advanced:
batch_column: ""
read_logic: ""Note: The workflow file already references the mappings we created in Part 3. When you generate new mappings, you'll need to add them to the
mappingslist.
Validate Your Workflow
# From olist-quickstart/ directory
daana-cli check workflowNote: The
check workflowcommand auto-detectsworkflow.yamlin the current directory.
✅ Checkpoint: You should see "✓ Workflow validation passed"
What was validated?
- Model file exists and is valid
- All mapping files exist and are valid
- Connection profile is properly configured
- All entity references are consistent
Deploy Transformation Infrastructure
Now deploy your workflow! This command:
- Compiles and materializes your model, mappings, and workflow to the internal database
- Creates database tables and stored procedures in your data warehouse
- Sets up the complete transformation infrastructure
# From olist-quickstart/ directory
daana-cli deployNote: The
deploycommand auto-detectsworkflow.yamlin the current directory and reads connection settings fromconnections.yaml. No extra flags needed when working from the project directory!
You'll see output showing:
- Model being processed and materialized
- Mappings being compiled and loaded
- Workflow being registered
- Database objects being created (tables, views, stored procedures)
✅ Checkpoint: "Deployment completed successfully"
What just happened? The deploy command automatically:
- Compiled and materialized your YAML model to the internal database
- Compiled and materialized your YAML mappings
- Registered the workflow
- Created all necessary database structures in your data warehouse
Everything happens in one command - no separate compile/materialize steps needed!
Execute the Transformation!
Now for the moment of truth - transform your data!
# From olist-quickstart/ directory
daana-cli executeNote: Like
deploy, theexecutecommand auto-detectsworkflow.yamland uses your connection settings automatically.
This will:
- Extract data from Olist source tables
- Transform according to your mappings
- Load into
daana_dwschema in your data warehouse
You'll see:
Workflow execution completed successfully✅ Checkpoint: "Workflow execution completed successfully"
Part 6: View Your Results! (2 min)
See Your Transformed Data
# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdbInside psql:
-- See views to access the transformed data
\dv daana_dw.*
-- Query your clean CUSTOMER entity
SELECT * FROM daana_dw.view_customer LIMIT 10;
-- Count customers
SELECT COUNT(*) FROM daana_dw.view_customer;
-- See orders
SELECT * FROM daana_dw.view_order LIMIT 10;
-- Join customers and orders
SELECT
c."CUSTOMER_CITY",
c."CUSTOMER_STATE",
COUNT(*) as order_count
FROM daana_dw.view_customer c
JOIN daana_dw.view_order_customer_x x ON c.customer_key=x.customer_key
JOIN daana_dw.view_order o ON x.order_key = o.order_key
GROUP BY c."CUSTOMER_CITY", c."CUSTOMER_STATE"
ORDER BY order_count DESC
LIMIT 10;🎉 Congratulations! You've transformed raw operational data into clean business entities!
What You Accomplished
✅ Loaded real-world e-commerce data (Olist)
✅ Defined business entities with proper definitions and descriptions
✅ Mapped source tables to your model
✅ Learned to iterate on models without losing work
✅ Executed a complete transformation pipeline
✅ Created queryable business entities in daana_dw schema
Understanding What Happened
Before
- Raw tables:
stage.olist_customers_dataset,stage.olist_orders_dataset,stage.olist_products_dataset - Operational structure (how the system works)
- Mixed concerns, denormalized, messy
After
- Clean tables:
daana_dw.customer,daana_dw.order - Business structure (what analysts need)
- Self-documenting with definitions and descriptions
- Ready for BI tools!
Next Steps
Explore the Olist Dataset Further
Try adding more entities:
- PRODUCT (from
olist_products) - SELLER (from
olist_sellers) - REVIEW (from
olist_order_reviews)
Learn More About DMDL
- Model Language Guide - Advanced modeling concepts
- Mapping Templates - Complex transformations
- Commands Reference - All CLI commands
Connect Your BI Tool
Your transformed data in daana_dw is ready for:
- Tableau
- Power BI
- Looker
- Metabase
Connection details:
Host: localhost
Port: 5432
Database: customerdb
Schema: daana_dw
User: dev
Password: devpassExperiment!
Try these challenges:
- Add
CUSTOMER_NAMEby concatenating first and last name - Calculate
ORDER_TOTALfrom order items - Add a
PRODUCTentity - Create a relationship between ORDER and PRODUCT
Troubleshooting
No data in daana_dw tables
Check that:
- Source tables have data:
SELECT COUNT(*) FROM stage.olist_customers_dataset; - Mappings are correct: Review your
*-mapping.yamlfiles - Workflow references the right mappings
Compilation errors
- "Entity not found": Check entity names match exactly (case-sensitive)
- "Invalid attribute type": Use STRING, NUMBER, TIMESTAMP, BOOLEAN
- "Missing definition": All entities and attributes need definition and description fields!
Execution fails
- Check connection profile points to correct database
- Verify source tables exist:
\dtin psql - Check logs:
docker logs daana-customerdb
Need Help?
- Check the Commands Reference for detailed command usage
- Review Configuration for connection setup
- See example files in
example_yaml/directory
Ready to build your own models? You now have all the skills to transform any operational data into clean business entities with Daana CLI!