Concepts
Data Warehousing Fundamentals

Data Warehousing Fundamentals

This guide covers the fundamentals of what a data warehouse is, its purpose, and why you need one. Whether you're new to data warehousing or coming from a Data Lake background, this will help you understand the core concepts that Daana builds upon.

Fun fact: "Daana" is an old Persian word meaning "knowledge" - reflecting the goal of turning raw data into organized, accessible knowledge.


What is a Data Warehouse?

A data warehouse is a dedicated system that organizes and consolidates information from various parts of a business into one structured, queryable repository.

The problem it solves: Business data typically lives in many different systems - sales records in one, customer data in another, product details somewhere else. Each system records information differently, making it difficult to combine and analyze data as a whole. A data warehouse brings all this together into one organized place.

The business value: By consolidating data, you can connect dots that were previously invisible. A retail manager can see which products sell best in different regions, whether complaints are rising, and how customer buying patterns evolve - insights that are impossible when data is fragmented across systems.


Data Lake vs Data Warehouse

If you're coming from a Data Lake environment, you might wonder: "Why do I need a warehouse on top of my lake?"

Data LakeData Warehouse
Stores raw data in any formatStores structured, modeled data
Schema-on-read (interpret when querying)Schema-on-write (structure when loading)
Optimized for storage costOptimized for query performance and consistency
Good for data scientists exploring raw dataGood for business users needing reliable answers
"Dump everything, figure it out later""Organize according to business meaning"

They're complementary: Most modern architectures use both. The Data Lake holds raw data; the Data Warehouse provides the structured, business-aligned layer on top. Daana helps you build that structured layer without moving data out of your environment.


Why Structure Matters

Key Concept: Business-Oriented Organization

A data warehouse isn't just about dumping data in one place - it's about organizing data according to how the business thinks about it, not how source systems happen to store it.

For example, "customer" might mean different things in different systems:

  • Sales system: Anyone who has ever made a purchase
  • Support system: Anyone who has ever called in
  • Marketing system: Anyone in the email list

A data warehouse creates one unified definition of "customer" that makes sense to the business as a whole.


The Four Pillars of Data Warehousing

These four fundamental principles were first articulated by Bill Inmon, widely considered the "father of data warehousing." They define what makes a data warehouse different from an operational database:

Data warehousing is a logical design pattern, not a specific technology. These principles can be applied on any database platform - BigQuery, Snowflake, Databricks, or traditional databases.

Subject-Oriented

Data is organized around business subjects (customers, sales, products) rather than around source system structures.

Operational DatabaseData Warehouse
Organized by system (CRM tables, ERP tables)Organized by business concept (CUSTOMER, ORDER, PRODUCT)
Structure reflects technical implementationStructure reflects business meaning

Integrated

Data from multiple sources is standardized and reconciled. When you query "sales," it means the same thing regardless of which source system the data came from.

What integration involves:

  • Standardizing formats (dates, currencies, codes)
  • Reconciling inconsistencies
  • Creating unified business definitions
  • Enabling apples-to-apples comparisons

Time-Variant

A data warehouse tracks changes over time. While operational systems focus on current state ("What's in stock right now?"), a warehouse preserves history ("How has inventory changed over the past year?").

Why this matters:

  • Trend analysis and forecasting
  • Point-in-time reporting
  • Understanding historical performance
  • Answering "what was true on date X?"

Non-Volatile

Once data is loaded, it isn't overwritten or deleted. The warehouse maintains a permanent, auditable record.

Why this matters:

  • Historical data is never lost
  • Reports can be reproduced exactly
  • Regulatory compliance and auditability
  • Building trust in the data

The Three-Layer Architecture

Every data warehouse follows a conceptual three-layer architecture. These layers can be implemented as multiple physical layers, but understanding their distinct purposes is essential.

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Staging / Raw Ingestion                            │
│ DAS - Data As System sees it                                │
│ Also known as: Bronze layer, Landing zone                   │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Integration / Business Model                       │
│ DAB - Data As Business sees it                              │
│ Also known as: Silver layer, Conformed layer                │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Access / Consumption                               │
│ DAR - Data As Requirements demand it                        │
│ Also known as: Gold layer, Marts, Data Products             │
└─────────────────────────────────────────────────────────────┘

Layer 1: Staging (DAS - Data As System)

Purpose: Collect and persist raw data from source systems.

Data arrives messy and inconsistent - different formats, missing values, duplicates. The staging layer captures this raw data before any transformation. This is also called the bronze layer in medallion architecture terminology.

Best practice: Persist your staging layer. Having the original source data available enables reprocessing, debugging, and auditing.

Layer 2: Integration (DAB - Data As Business)

Purpose: Transform and organize data to represent the business.

This is where the four pillars are implemented. Data is:

  • Cleaned and validated
  • Organized around business subjects
  • Historized (time-variant)
  • Made reusable across use cases

The integration layer creates a unified view - as if the company had one large system instead of many disconnected ones. Even with four different sales systems, there's only one representation of "Sales" in the integration layer. This is also called the silver layer or conformed layer.

⚠️

Important: The integration layer is not optimized for specific reports or dashboards. It's designed for reusability and handling change - serving as a stable foundation for all downstream use cases.

Layer 3: Access (DAR - Data As Requirements)

Purpose: Specialize data for specific analytical use cases.

This layer creates data products - dimensions, facts, marts, reports - tailored to specific consumers. Because it builds on the integration layer, these products:

  • Use consistent definitions
  • Produce consistent results
  • Can be built without understanding source system complexity

This is also called the gold layer or consumption layer.


How Daana Fits In

Daana automates the creation of the integration layer (DAB) based on your business model definition, using the Focal Framework - a metadata-driven architecture designed for agility and non-destructive development.

The workflow:

  1. You define the business view - What does "customer" mean? How do you define "order" or "product"? Daana forces you to think like a business owner first.

  2. You map source data - Connect your raw data (from BigQuery, Snowflake, or other platforms) to your business definitions using transformation expressions.

  3. Daana generates the layers - The integration layer is built automatically, with full data pipelines. No code required beyond transformation expressions.

  4. You build consumption on top - Use Daana's clean, historized views to build your access layer (dimensions, facts, marts).

Key benefits:

  • Data stays in place - Nothing moves out of your environment
  • Complete lineage - Trace any value from raw source to final report
  • Shared vocabulary - A comprehensive data dictionary ensures business and technical teams speak the same language
  • Non-destructive evolution - Add new attributes or entities without breaking existing pipelines

In short, Daana implements data warehousing best practices automatically, letting you focus on defining what matters to your business.


Related Documentation