Chapter 4: Loading Strategies
Goal: Understand the different ingestion strategies and when to use each one.
Prerequisites: You must have completed Chapter 3: Structuring Business Logic.
The Key Question
Before choosing a strategy, ask: "Does my source contain history, or just current state?"
| Source Contains | Strategy | Example |
|---|---|---|
| Current state only (1 row per entity) | FULL | Reference tables, dimensions |
| Complete history (N rows per entity) | FULL_LOG | Change logs, snapshots, CDC |
| Current state, large table | INCREMENTAL | Large orders table with updated_at |
| Immutable events | TRANSACTIONAL | Clickstream, sensor data |
The Four Strategies
FULL - Source Has Current State Only
Use when your source table has exactly one row per entity representing its current state.
ingestion_strategy: FULLHow it works: Daana reads the entire table each run. Over multiple runs, Daana compares snapshots to detect what changed and builds history automatically.
Example source (one row per product):
product_id | name | price
1 | The Handmaid's | 16.99 ← Current state only
2 | American Gods | 22.99 ← Current state onlyFULL_LOG - Source Contains Complete History
Use when your source table has multiple rows per entity, each representing a historical version.
ingestion_strategy: FULL_LOGHow it works: Daana treats each row as a point-in-time record and preserves the complete history from the source.
Example source (multiple rows per customer):
customer_id | status | updated_at
1 | Active | 2024-11-09 ← Version 1
1 | Inactive | 2024-11-25 ← Version 2 (status changed)This is what you used in Chapter 1! Both
stage.customersandstage.customer_addressescontain multiple rows per entity, so both use FULL_LOG.
INCREMENTAL - Like FULL, But Faster
Use for large tables where reading everything is too slow. Requires a watermark column.
ingestion_strategy: INCREMENTAL
effective_datetime: updated_atHow it works: Only reads rows where updated_at > last_successful_run.
TRANSACTIONAL - Append-Only Events
Use for immutable data that's inserted once and never updated.
ingestion_strategy: TRANSACTIONALHow it works: Appends new rows without checking for updates. Fastest strategy.
Hands-On Exercises
Exercise 1: Examine FULL_LOG Sources
Both of your Chapter 1 sources use FULL_LOG. Let's see why:
# See multiple rows per customer in stage.customers
docker exec -it daana-customerdb psql -U dev -d customerdb -c \
"SELECT customer_id, account_status, updated_at
FROM stage.customers
WHERE customer_id = 1
ORDER BY updated_at;"You'll see customer 1 appears twice (Active then Inactive). This is a FULL_LOG pattern!
customer_id | account_status | updated_at
-------------+----------------+------------------------
1 | Active | 2024-11-09 17:44:59+00
1 | Inactive | 2024-11-25 14:30:22+00
(2 rows)# See multiple rows per customer in stage.customer_addresses
docker exec -it daana-customerdb psql -U dev -d customerdb -c \
"SELECT customer_id, city, snapshot_date
FROM stage.customer_addresses
WHERE customer_id = 2
ORDER BY snapshot_date DESC LIMIT 5;"Most customers appear many times (daily snapshots). Also FULL_LOG!
customer_id | city | snapshot_date
-------------+--------------+---------------
2 | Philadelphia | 2024-11-29
2 | Boston | 2024-11-28
2 | Boston | 2024-11-27
2 | Boston | 2024-11-26
2 | Boston | 2024-11-25
(5 rows)Snapshotted sources naturally have many duplicate records (simply meaning nothing has changed over several days). Daana automatically removes such duplicates and only tracks actual changes in data.
Exercise 2: Add New History and Re-Execute
# Add a new address snapshot (Emily moves to SF!)
docker exec -it daana-customerdb psql -U dev -d customerdb -c "
INSERT INTO stage.customer_addresses
(customer_id, address_type, is_default, recipient_name, address_line1, city, state, postal_code, country, snapshot_date)
VALUES (2, 'Shipping', TRUE, 'Emily Johnson', '100 Tech Drive', 'San Francisco', 'CA', '94105', 'USA', '2024-11-30');"# Re-execute
daana-cli execute# See Emily's complete history: Boston → Philadelphia → San Francisco
# Note: History views use lowercase column names
docker exec -it daana-customerdb psql -U dev -d customerdb -c \
"SELECT customer_city, eff_tmstp FROM daana_dw.view_customer_hist
WHERE customer_key = '2' ORDER BY eff_tmstp DESC LIMIT 3;"Result:
customer_city | eff_tmstp
---------------+------------------------
San Francisco | 2024-11-30 00:00:00
Philadelphia | 2024-11-29 00:00:00
Boston | 2024-11-05 00:00:00
(3 rows)Exercise 3: Clean Up
docker exec -it daana-customerdb psql -U dev -d customerdb -c \
"DELETE FROM stage.customer_addresses WHERE snapshot_date = '2024-11-30';"
daana-cli executeBest Practices
- Look at your source data - Does it have 1 row or N rows per entity?
- FULL_LOG for history - If source tracks changes as separate rows, use FULL_LOG
- FULL for current state - If source only has current state, use FULL
- INCREMENTAL for scale - Switch from FULL when tables get large (>1M rows)
- TRANSACTIONAL for events - Use for immutable append-only data