Data Warehousing Fundamentals
This guide covers the fundamentals of what a data warehouse is, its purpose, and why you need one. Whether you're new to data warehousing or coming from a Data Lake background, this will help you understand the core concepts that Daana builds upon.
Fun fact: "Daana" is an old Persian word meaning "knowledge" - reflecting the goal of turning raw data into organized, accessible knowledge.
What is a Data Warehouse?
A data warehouse is a dedicated system that organizes and consolidates information from various parts of a business into one structured, queryable repository.
The problem it solves: Business data typically lives in many different systems - sales records in one, customer data in another, product details somewhere else. Each system records information differently, making it difficult to combine and analyze data as a whole. A data warehouse brings all this together into one organized place.
The business value: By consolidating data, you can connect dots that were previously invisible. A retail manager can see which products sell best in different regions, whether complaints are rising, and how customer buying patterns evolve - insights that are impossible when data is fragmented across systems.
Data Lake vs Data Warehouse
If you're coming from a Data Lake environment, you might wonder: "Why do I need a warehouse on top of my lake?"
| Data Lake | Data Warehouse |
|---|---|
| Stores raw data in any format | Stores structured, modeled data |
| Schema-on-read (interpret when querying) | Schema-on-write (structure when loading) |
| Optimized for storage cost | Optimized for query performance and consistency |
| Good for data scientists exploring raw data | Good for business users needing reliable answers |
| "Dump everything, figure it out later" | "Organize according to business meaning" |
They're complementary: Most modern architectures use both. The Data Lake holds raw data; the Data Warehouse provides the structured, business-aligned layer on top. Daana helps you build that structured layer without moving data out of your environment.
Why Structure Matters
Key Concept: Business-Oriented Organization
A data warehouse isn't just about dumping data in one place - it's about organizing data according to how the business thinks about it, not how source systems happen to store it.
For example, "customer" might mean different things in different systems:
- Sales system: Anyone who has ever made a purchase
- Support system: Anyone who has ever called in
- Marketing system: Anyone in the email list
A data warehouse creates one unified definition of "customer" that makes sense to the business as a whole.
The Four Pillars of Data Warehousing
These four fundamental principles were first articulated by Bill Inmon, widely considered the "father of data warehousing." They define what makes a data warehouse different from an operational database:
Data warehousing is a logical design pattern, not a specific technology. These principles can be applied on any database platform - BigQuery, Snowflake, Databricks, or traditional databases.
Subject-Oriented
Data is organized around business subjects (customers, sales, products) rather than around source system structures.
| Operational Database | Data Warehouse |
|---|---|
| Organized by system (CRM tables, ERP tables) | Organized by business concept (CUSTOMER, ORDER, PRODUCT) |
| Structure reflects technical implementation | Structure reflects business meaning |
Integrated
Data from multiple sources is standardized and reconciled. When you query "sales," it means the same thing regardless of which source system the data came from.
What integration involves:
- Standardizing formats (dates, currencies, codes)
- Reconciling inconsistencies
- Creating unified business definitions
- Enabling apples-to-apples comparisons
Time-Variant
A data warehouse tracks changes over time. While operational systems focus on current state ("What's in stock right now?"), a warehouse preserves history ("How has inventory changed over the past year?").
Why this matters:
- Trend analysis and forecasting
- Point-in-time reporting
- Understanding historical performance
- Answering "what was true on date X?"
Non-Volatile
Once data is loaded, it isn't overwritten or deleted. The warehouse maintains a permanent, auditable record.
Why this matters:
- Historical data is never lost
- Reports can be reproduced exactly
- Regulatory compliance and auditability
- Building trust in the data
The Three-Layer Architecture
Every data warehouse follows a conceptual three-layer architecture. These layers can be implemented as multiple physical layers, but understanding their distinct purposes is essential.
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Staging / Raw Ingestion │
│ DAS - Data As System sees it │
│ Also known as: Bronze layer, Landing zone │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Integration / Business Model │
│ DAB - Data As Business sees it │
│ Also known as: Silver layer, Conformed layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Access / Consumption │
│ DAR - Data As Requirements demand it │
│ Also known as: Gold layer, Marts, Data Products │
└─────────────────────────────────────────────────────────────┘Layer 1: Staging (DAS - Data As System)
Purpose: Collect and persist raw data from source systems.
Data arrives messy and inconsistent - different formats, missing values, duplicates. The staging layer captures this raw data before any transformation. This is also called the bronze layer in medallion architecture terminology.
Best practice: Persist your staging layer. Having the original source data available enables reprocessing, debugging, and auditing.
Layer 2: Integration (DAB - Data As Business)
Purpose: Transform and organize data to represent the business.
This is where the four pillars are implemented. Data is:
- Cleaned and validated
- Organized around business subjects
- Historized (time-variant)
- Made reusable across use cases
The integration layer creates a unified view - as if the company had one large system instead of many disconnected ones. Even with four different sales systems, there's only one representation of "Sales" in the integration layer. This is also called the silver layer or conformed layer.
Important: The integration layer is not optimized for specific reports or dashboards. It's designed for reusability and handling change - serving as a stable foundation for all downstream use cases.
Layer 3: Access (DAR - Data As Requirements)
Purpose: Specialize data for specific analytical use cases.
This layer creates data products - dimensions, facts, marts, reports - tailored to specific consumers. Because it builds on the integration layer, these products:
- Use consistent definitions
- Produce consistent results
- Can be built without understanding source system complexity
This is also called the gold layer or consumption layer.
How Daana Fits In
Daana automates the creation of the integration layer (DAB) based on your business model definition, using the Focal Framework - a metadata-driven architecture designed for agility and non-destructive development.
The workflow:
-
You define the business view - What does "customer" mean? How do you define "order" or "product"? Daana forces you to think like a business owner first.
-
You map source data - Connect your raw data (from BigQuery, Snowflake, or other platforms) to your business definitions using transformation expressions.
-
Daana generates the layers - The integration layer is built automatically, with full data pipelines. No code required beyond transformation expressions.
-
You build consumption on top - Use Daana's clean, historized views to build your access layer (dimensions, facts, marts).
Key benefits:
- Data stays in place - Nothing moves out of your environment
- Complete lineage - Trace any value from raw source to final report
- Shared vocabulary - A comprehensive data dictionary ensures business and technical teams speak the same language
- Non-destructive evolution - Add new attributes or entities without breaking existing pipelines
In short, Daana implements data warehousing best practices automatically, letting you focus on defining what matters to your business.
Related Documentation
- Focal Framework - The underlying architecture Daana uses
- Understanding Daana Output - What Daana creates and how to use it
- DMDL: Model Definition - How to define your business entities