Tutorial
Tutorial β Transform Olist E-Commerce Data
Welcome to the Daana CLI tutorial! In this 30-minute hands-on guide, you'll transform real e-commerce data using Daana CLI.
Before starting: Make sure you've completed the Setup Guide. You should have:
- Database running (customerdb on port 5432)
- Olist data loaded (~99,441 orders in stage.olist_*_dataset tables)
- Daana framework installed (
daana-cli installcompleted successfully)- Working directory:
olist-quickstart/
What You'll Learn
- How to work with the Olist dataset (Brazilian e-commerce data)
- How to define business models with proper definition and description fields
- How to map raw operational data to business entities
- How to iterate on your model without losing work
- How to execute the complete transformation pipeline
What You'll Build
You'll transform raw Olist e-commerce data into clean business entities:
Before: Raw operational tables (stage.olist_orders_dataset, stage.olist_customers_dataset) After: Clean business entities (CUSTOMER, ORDER) in the daana_dw schema
Part 1: Understanding Your Source Data (5 min)
What is Olist?
Olist is a real Brazilian e-commerce dataset with:
- 100,000+ orders from 2016-2018
- 99,000+ customers across Brazil
- 32,000+ products in various categories
- Seller information
- Order reviews and ratings
This is real operational data - messy, spread across multiple tables, with inconsistencies. Perfect for learning data transformation!
Verify Data is Loaded
If you completed the setup, Olist data should already be loaded. Let's verify:
Explore the Source Data
Let's see what we're working with:
# From olist-quickstart/ directory
# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdb
Inside psql:
-- See all schemas
\dn
-- See all tables in the stage schema (where Olist data lives)
\dt stage.*
-- Look at orders (note: tables are in 'stage' schema with '_dataset' suffix)
SELECT * FROM stage.olist_orders_dataset LIMIT 5;
-- Look at customers
SELECT * FROM stage.olist_customers_dataset LIMIT 5;
-- Count orders
SELECT COUNT(*) FROM stage.olist_orders_dataset;
-- Should show ~99,441 orders
-- Exit
\q
Important: All Olist tables are in the
stageschema and have the_datasetsuffix. If you just use\dtyou won't see them because they're not in the defaultpublicschema!
Key Insight: This data is operational (how the system records transactions). We need to transform it into business entities (what analysts need to answer questions).
π Olist Table Schemas - What Columns Are Available?
Before we create mappings, let's understand what data we have to work with:
Customers table (stage.olist_customers_dataset):
\d stage.olist_customers_dataset
Key columns:
customer_id(text) - Unique customer identifiercustomer_unique_id(text) - Alternative customer IDcustomer_zip_code_prefix(integer) - 5-digit postal codecustomer_city(text) - City namecustomer_state(text) - Two-letter state code (SP, RJ, etc.)
Orders table (stage.olist_orders_dataset):
\d stage.olist_orders_dataset
Key columns:
order_id(text) - Unique order identifiercustomer_id(text) - Links to customerorder_status(text) - Order status (delivered, shipped, etc.)order_purchase_timestamp(timestamp) - When order was placedorder_delivered_customer_date(timestamp) - When order was delivered
Pro Tip: You'll map these source columns to your business model attributes in Part 3!
Part 2: Understand Your Business Model (10 min)
Understanding Models
A model defines your business entities - what matters to your business, not how the database stores it.
Critical: Definition and Description
IMPORTANT: Always include
definitionanddescriptionfields! These are not optional:
definition: A short business definition (1-2 sentences)description: Detailed explanation for analysts and developersThese fields provide crucial business context and make your model self-documenting.
View the Business Model
The olist-quickstart project includes a complete business model. Let's look at it:
# From olist-quickstart/ directory
cat model.yaml
You'll see a complete model with two entities: CUSTOMER and ORDER. Here's the structure:
model:
id: OLIST_BUSINESS_MODEL
name: OlistBusinessModel
definition: Business representation of Olist e-commerce platform
description: Transforms raw Olist operational data into clean business entities for analytics
entities:
# =========================================================================
# CUSTOMER - Individual shoppers on the Olist platform
# =========================================================================
- id: CUSTOMER
name: CUSTOMER
definition: A customer who makes purchases on the Olist platform
description: Represents individual customers with their contact information and location
attributes:
- id: CUSTOMER_ID
name: CUSTOMER_ID
definition: Unique identifier for the customer
description: Primary key from the olist_customers table
type: STRING
effective_timestamp: true
- id: CUSTOMER_CITY
name: CUSTOMER_CITY
definition: City where the customer is located
description: Customer's city for delivery and analysis
type: STRING
effective_timestamp: true
- id: CUSTOMER_STATE
name: CUSTOMER_STATE
definition: State where the customer is located
description: Two-letter Brazilian state code (e.g., SP, RJ)
type: STRING
effective_timestamp: true
# =========================================================================
# ORDER - Purchase transactions made by customers
# =========================================================================
- id: ORDER
name: ORDER
definition: A purchase order placed by a customer
description: Represents individual orders with their status and timestamps
attributes:
- id: ORDER_ID
name: ORDER_ID
definition: Unique identifier for the order
description: Primary key from the olist_orders table
type: STRING
effective_timestamp: true
- id: ORDER_STATUS
name: ORDER_STATUS
definition: Current status of the order
description: Status like delivered, shipped, canceled, processing
type: STRING
effective_timestamp: true
- id: ORDER_PURCHASE_TS
name: ORDER_PURCHASE_TS
definition: When the order was placed
description: Timestamp when customer completed the purchase
type: START_TIMESTAMP
- id: ORDER_DELIVERED_TS
name: ORDER_DELIVERED_TS
definition: When the order was delivered
description: Timestamp when order was delivered to the customer
type: END_TIMESTAMP
# Grouped attribute example: Order value with currency
- id: ORDER_VALUE
name: ORDER_VALUE
definition: Total monetary value of the order
description: Complete order value including amount and currency denomination
effective_timestamp: true
group:
- id: ORDER_VALUE_AMOUNT
name: ORDER_VALUE_AMOUNT
definition: Monetary value of the order
description: Total order value in the specified currency
type: NUMBER
- id: ORDER_VALUE_CURRENCY
name: ORDER_VALUE_CURRENCY
definition: Currency of the order value
description: Three-letter currency code (typically BRL)
type: UNIT
# ===========================================================================
# RELATIONSHIPS - How entities connect to each other
# ===========================================================================
relationships:
- name: IS_PLACED_BY
definition: Order is placed by a customer
description: Links orders to the customer who placed them. One customer can place many orders.
source_entity_id: ORDER
target_entity_id: CUSTOMER
π Key Concepts Explained
id vs name:
id: Technical identifier used internally (must be unique, uppercase)name: Display name (often same as id for clarity)- Both are required for all entities, attributes, and relationships
effective_timestamp: true:
- Tracks how this attribute changes over time
- Enables historical analysis ("What was this customer's city in 2017?")
- Essential for slowly changing dimensions
Timestamp Types:
START_TIMESTAMP: Use for any single timestamp, when an event occurs, or when something beginsEND_TIMESTAMP: When something ends or completes (always used together with START_TIMESTAMP)
When to use each type:
- Single timestamp: Use
START_TIMESTAMP- Example: Transaction timestamp, event occurrence, order placed time
- This is the generic timestamp type - use it whenever you have a single point in time
- Period of time (duration): Use both
START_TIMESTAMP+END_TIMESTAMPtogether- Example: Order lifecycle (placed β delivered), trip duration (A β B), customer status period
- START_TIMESTAMP marks when it begins, END_TIMESTAMP marks when it ends
Why this matters: These types tell Daana's transformation engine how to handle temporal data correctly. START_TIMESTAMP is your default choice for any timestamp attribute.
Grouped Attributes (see full model for example):
- Groups related fields together (e.g., ORDER_VALUE includes amount + currency)
- Keeps context for related data points
Why Definition and Description Matter
Notice how each field has both definition and description:
- Analysts can understand what "ORDER_STATUS" means without asking
- New team members can onboard faster
- Data governance is built into your model
- Documentation stays in sync with your code
Validate Your Model
Before proceeding, let's validate the model structure:
# From olist-quickstart/ directory
daana-cli check model model.yaml
Checkpoint: You should see:
β Model validation passed
- 2 entities validated
- 1 relationship validated
- All definitions and descriptions present
What just happened? The check command validates:
- YAML structure and syntax
- Entity and attribute definitions
- Relationship references
- Naming conventions
- Required fields (definition, description)
The model is now ready to be used in your workflow!
Part 3: Map Source Data to Your Model (10 min)
Now we'll map the Olist tables to your business entities.
Generate Mapping Templates
Daana can generate templates for you from your model. You can generate templates for all entities at once or one at a time:
# From olist-quickstart/ directory
# Option 1: Generate templates for ALL entities in the model
daana-cli generate mapping \
--model model.yaml \
--all-entities \
--out-dir mappings/
# Option 2: Generate template for a specific entity
daana-cli generate mapping \
--model model.yaml \
--entity CUSTOMER \
-o mappings/customer-mapping.yaml
Tip: For models with many entities, use
--all-entitieswith--out-dirto generate all mapping templates in one command. This is much faster than generating them one by one!
The generated templates include placeholders for:
- Source table configuration
- Column mappings for all entity attributes
- Transformation expressions
- Effective timestamp handling
You'll see: Mapping templates generated successfully (or Mapping template generated successfully for single entity)
Edit the Mapping
Open mappings/customer-mapping.yaml and configure it:
entity_id: CUSTOMER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
- connection: dev
table: stage.olist_customers_dataset
primary_keys:
- customer_id
ingestion_strategy: FULL
# Default timestamp for all attributes with effective_timestamp: true
entity_effective_timestamp_expression: CURRENT_TIMESTAMP
attributes:
- id: CUSTOMER_ID
transformation_expression: customer_id
- id: CUSTOMER_CITY
transformation_expression: customer_city
- id: CUSTOMER_STATE
transformation_expression: customer_state
Validate the Mapping
# Validate mapping references and transformations
daana-cli check mapping mappings/customer-mapping.yaml
Checkpoint: You should see "β Mapping validation passed"
What was validated?
- Entity and attribute references match the model
- Transformation expressions are syntactically valid
- Table and connection configurations are correct
- All required fields are present
Do the Same for ORDER
# Generate template
daana-cli generate mapping \
--model model.yaml \
--entity ORDER \
-o mappings/order-mapping.yaml
Edit mappings/order-mapping.yaml:
entity_id: ORDER
mapping_groups:
- name: default_mapping_group
allow_multiple_identifiers: false
tables:
- connection: dev
table: stage.olist_orders_dataset
primary_keys:
- order_id
ingestion_strategy: FULL
# Default timestamp for all attributes with effective_timestamp: true
entity_effective_timestamp_expression: CURRENT_TIMESTAMP
attributes:
- id: ORDER_ID
transformation_expression: order_id
- id: ORDER_STATUS
transformation_expression: order_status
- id: ORDER_PURCHASE_TS
transformation_expression: order_purchase_timestamp
- id: ORDER_DELIVERED_TS
transformation_expression: order_delivered_customer_date
- connection: dev
table: stage.olist_order_payments_dataset_v2
primary_keys:
- order_id
ingestion_strategy: FULL
attributes:
- id: ORDER_VALUE_AMOUNT
transformation_expression: "round(sum(payment_value) over (partition by order_id)::numeric,2)"
- id: ORDER_VALUE_CURRENCY
transformation_expression: "'BRL'"
relationships:
- id: ORDER_IS_PLACED_BY_CUSTOMER
source_table: stage.olist_orders_dataset
target_transformation_expression: customer_id
Then validate:
daana-cli check mapping mappings/order-mapping.yaml
Checkpoint: Mapping validation passed! Your mappings are ready to be deployed.
Part 4: Iterating on Your Model (5 min)
Real-World Scenario
You realize you forgot to add CUSTOMER_ZIP_CODE! Let's add it without losing our mapping work.
1. Update Your Model
Edit model.yaml and add to CUSTOMER attributes.
We will now add the CUSTOMER_ZIP_CODE into the model.yaml. Copy "CUSTOMER_ZIP_CODE" information below and add it to your model after "CUSTOMER_STATE":
- id: CUSTOMER_ZIP_CODE
name: CUSTOMER_ZIP_CODE
definition: Customer's postal code
description: 5-digit Brazilian postal code (CEP) for delivery routing
type: STRING
effective_timestamp: false
2. Validate Updated Model
daana-cli check model model.yaml
Checkpoint: Model validation passed with the new attribute!
3. Merge Your Mapping
This command preserves your existing mappings:
daana-cli merge mapping \
--model model.yaml \
--mapping mappings/customer-mapping.yaml \
--out mappings/customer-mapping-updated.yaml \
--diff
This:
- Keeps all your existing attribute mappings
- Adds a placeholder for the new CUSTOMER_ZIP_CODE attribute
- Preserves your identifier mappings
- Shows a diff of what changed (with --diff flag)
4. Complete the New Mapping
Edit mappings/customer-mapping-updated.yaml and verify the CUSTOMER_ZIP_CODE mapping:
- id: CUSTOMER_ZIP_CODE
transformation_expression: customer_zip_code_prefix
5. Validate the Updated Mapping
# Replace the old mapping with the updated one
mv mappings/customer-mapping-updated.yaml mappings/customer-mapping.yaml
# Validate it
daana-cli check mapping mappings/customer-mapping.yaml
Success! You've updated your model and mapping without starting over.
Key Takeaway: The
mergecommand is essential for iterative development. Use it every time you update your model! The changes will be applied when you rundeployin the next section.
Part 5: Create and Execute Workflow (10 min)
Review Your Project Files
The init --example olist command already created the configuration files you need:
connections.yaml- Database connection profiles (already configured for local dev)workflow.yaml- Workflow orchestration (already references model.yaml)
Let's look at what's already set up:
# View connection profiles
cat connections.yaml
You'll see the dev profile already configured to connect to your local customerdb.
What is a Workflow?
A workflow orchestrates the entire transformation:
- Which model to use
- Which mappings to apply
- Where source data lives (via connection profiles)
- Where to write transformed data
View the Workflow Configuration
# From olist-quickstart/ directory
cat workflow.yaml
The workflow is already configured:
workflow:
id: OLIST_TRANSFORMATION_WORKFLOW
name: OlistTransformationWorkflow
definition: Transform Olist data to business entities
description: |
This workflow reads from Olist operational tables and creates
clean business entities in the daana_dw schema.
model:
file: model.yaml
mappings:
- mappings/customer-mapping.yaml
- mappings/order-mapping.yaml
connection: dev # References connections.yaml
advanced:
batch_expression: ""
Note: The workflow file already references the mappings we created in Part 3. When you generate new mappings, you'll need to add them to the
mappingslist.
Validate Your Workflow
# From olist-quickstart/ directory
daana-cli check workflow
Note: The
check workflowcommand auto-detectsworkflow.yamlin the current directory.
Checkpoint: You should see "β Workflow validation passed"
What was validated?
- Model file exists and is valid
- All mapping files exist and are valid
- Connection profile is properly configured
- All entity references are consistent
Deploy Transformation Infrastructure
Now deploy your workflow! This command:
- Creates database schemas, tables, and functions in your data warehouse
- Sets up the complete transformation infrastructure
# From olist-quickstart/ directory
daana-cli deploy
Note: The
deploycommand auto-detectsworkflow.yamlin the current directory and reads connection settings fromconnections.yaml. No extra flags needed when working from the project directory!
You'll see output showing:
- Model being processed and materialized
- Mappings being compiled and loaded
- Workflow being registered
- Database objects being created (schemas, tables, views, functions)
Checkpoint: "Deployment completed successfully"
What just happened? The deploy command automatically:
- Generated SQL from your YAML model and mappings
- Created schemas, tables, and functions in your data warehouse
- Registered the workflow configuration
Everything happens in one command - no separate compile/materialize steps needed!
Execute the Transformation!
Now for the moment of truth - transform your data!
# From olist-quickstart/ directory
daana-cli execute
Note: Like
deploy, theexecutecommand auto-detectsworkflow.yamland uses your connection settings automatically.
This will:
- Extract data from Olist source tables
- Transform according to your mappings
- Load into
daana_dwschema in your data warehouse
You'll see:
Workflow execution completed successfully
Checkpoint: "Workflow execution completed successfully"
Part 6: View Your Results! (2 min)
See Your Transformed Data
# Connect to customerdb
docker exec -it daana-customerdb psql -U dev -d customerdb
Inside psql:
-- See views to access the transformed data
\dv daana_dw.*
-- Query your clean CUSTOMER entity
SELECT * FROM daana_dw.view_customer LIMIT 10;
-- Count customers
SELECT COUNT(*) FROM daana_dw.view_customer;
-- See orders
SELECT * FROM daana_dw.view_order LIMIT 10;
-- Join customers and orders
SELECT
c."CUSTOMER_CITY",
c."CUSTOMER_STATE",
COUNT(*) as order_count
FROM daana_dw.view_customer c
JOIN daana_dw.view_order_customer_x x ON c.customer_key=x.customer_key
JOIN daana_dw.view_order o ON x.order_key = o.order_key
GROUP BY c."CUSTOMER_CITY", c."CUSTOMER_STATE"
ORDER BY order_count DESC
LIMIT 10;
Congratulations! You've transformed raw operational data into clean business entities!
What You Accomplished
Loaded real-world e-commerce data (Olist) Defined business entities with proper definitions and descriptions Mapped source tables to your model Learned to iterate on models without losing work Executed a complete transformation pipeline Created queryable business entities in daana_dw schema
Understanding What Happened
Before
- Raw tables:
stage.olist_customers_dataset,stage.olist_orders_dataset,stage.olist_products_dataset - Operational structure (how the system works)
- Mixed concerns, denormalized, messy
After
- Clean tables:
daana_dw.customer,daana_dw.order - Business structure (what analysts need)
- Self-documenting with definitions and descriptions
- Ready for BI tools!
Next Steps
Explore the Olist Dataset Further
Try adding more entities:
- PRODUCT (from
olist_products) - SELLER (from
olist_sellers) - REVIEW (from
olist_order_reviews)
Learn More About DMDL
- Model Language Guide - Advanced modeling concepts
- Mapping Templates - Complex transformations
- Commands Reference - All CLI commands
Connect Your BI Tool
Your transformed data in daana_dw is ready for:
- Tableau
- Power BI
- Looker
- Metabase
Connection details:
Host: localhost
Port: 5432
Database: customerdb
Schema: daana_dw
User: dev
Password: devpass
Experiment!
Try these challenges:
- Add
CUSTOMER_NAMEby concatenating first and last name - Calculate
ORDER_TOTALfrom order items - Add a
PRODUCTentity - Create a relationship between ORDER and PRODUCT
Troubleshooting
No data in daana_dw tables
Check that:
- Source tables have data:
SELECT COUNT(*) FROM stage.olist_customers_dataset; - Mappings are correct: Review your
*-mapping.yamlfiles - Workflow references the right mappings
Compilation errors
- "Entity not found": Check entity names match exactly (case-sensitive)
- "Invalid attribute type": Use STRING, NUMBER, TIMESTAMP, BOOLEAN
- "Missing definition": All entities and attributes need definition and description fields!
Execution fails
- Check connection profile points to correct database
- Verify source tables exist:
\dtin psql - Check logs:
docker logs daana-customerdb
Need Help?
- Check the Commands Reference for detailed command usage
- Review Configuration for connection setup
- See example files in
example_yaml/directory
Ready to build your own models? You now have all the skills to transform any operational data into clean business entities with Daana CLI!