Understanding the Focal Framework

A Comprehensive Guide to Metadata-Driven Data Warehousing

Introduction

Traditional data warehouses often suffer from common problems: they become brittle over time, expensive to change, and semantically disconnected from business meaning. The Focal Framework offers a fundamental departure from these conventional approaches by introducing a metadata-driven architecture engineered for agility and stability.

At its core, the Focal Framework is a self-describing system - meaning the system not only stores your data but also maintains complete documentation about what that data means, where it came from, and how it should be interpreted. The physical data layer is intentionally generic, acting as a "mirror" of the definitions held within the metadata layer.

Think of it like a library where every book comes with a detailed card catalogue entry that describes not just the book's contents, but also its history, how it relates to other books, and instructions for how to read it. The Focal Framework creates this comprehensive documentation automatically for your data.

Key Concept: The Two-Layer Architecture

The framework's core architectural principle is the strict separation of data and metadata. The framework operates on two fundamental layers that work together:

Data Layer: Stores your actual business information in generic, flexible structures. This layer is intentionally simple - it acts as a "mirror" that reflects whatever is defined in the metadata.

Metadata Layer: The active, operational core of the entire solution. It governs data structure, dictates data loading logic, and provides the semantic context necessary for data interpretation. This is not passive documentation - it is the engine that drives everything.

What makes this architecture special is that the metadata layer describes itself as well. This creates a completely self-documenting system where you can trace the complete lineage of any piece of information - conceptually, logically, and physically. Consequently, all implementation, from schema definition to data ingestion, begins with formal definitions in the metadata.

The Metadata Model: The Engine of the Framework

In a profound shift from traditional approaches, the metadata model is not passive documentation but the active, operational core of the entire Focal solution. All processes reference this central repository.

The metadata model is organized into three primary, interrelated subject areas:

Technical: Manages the operational aspects of data movement, such as batch control and source system tracking.

Functional: Supports implementation-specific logic and provides traceability for custom code solutions.

Semantic: Represents all data models and their relationships - this is considered the "heart and soul" of the solution.

The Semantic Model is built on the elegant concept of representing data models as rows of data. It uses its own set of Focals (such as Model, Entity, Attribute, Domain, and Domain Value) to capture the complete structure of any data model. In a powerful display of self-application, the metadata model is itself a perfect implementation of Focal principles.

Key Concept: Complete Model Lineage

This structure enables a full, queryable model lineage that traces an attribute from its business definition all the way to its physical storage location. A typical path would be:

Business Definition: An attribute is first defined in the enterprise Business Information Model

Logical Model Representation: The attribute is represented in the Focal logical model, assigned to an Atomic Context

Physical Mapping: The metadata specifies which generic column in which physical table pattern stores the actual values

The Semantic Key: Connecting Data to Meaning

At the heart of the Focal Framework is the semantic key (specifically, the TYPE_KEY column) - a special identifier that connects each piece of data to its corresponding metadata description. This key is a foreign key that references the unique key of an Atomic Context defined in the semantic metadata model. It is the bridge that connects a physical row of data to its full logical definition.

How Dynamic Data Interpretation Works

The physical data in Focal tables is intentionally generic and requires its semantic context from the metadata to be understood. The data layer is effectively meaningless on its own; it is the combination of data and metadata that provides a complete picture.

Consider this example of how interpretation works:

Physical View: A raw database row contains a loan_key, a type_key of 234, a val_string of '1000', and a uom of 'EUR'. Without context, this row is indecipherable.

Logical Interpretation: By executing a JOIN from this data table to the metadata model on type_key = 234, the system retrieves the full semantic definition. The metadata reveals that type_key 234 represents the "Loan Amount" Atomic Context. The generic row is instantly translated into a meaningful business fact: "The Loan Amount is 1000 EUR."

Because the type_key-to-Atomic-Context link is a consistent pattern across all Descriptor tables, this join mechanism is standardized and repeatable. There is no need for custom logic or lookups for different tables or attributes.

Key Concept: Dynamic Semantic Correction

The architectural separation of physical storage and semantic definition enables a level of dynamism impossible in traditional systems. For example, if an attribute name is entered into the metadata with a typo, a simple UPDATE statement on the corresponding metadata row can correct it. Immediately, any logical view or report that joins to the metadata will reflect the corrected name. This change requires no code recompilation, no data migration, and zero system downtime.

Ensemble Modelling: Understanding the Landscape

In the data warehousing world, three modelling styles have emerged that prioritize agility, maintainability, and auditability. These are often called "ensemble modelling" styles because they share common principles while offering different implementations:

Focal Framework
Data Vault
Anchor Modelling

While each approach has its nuances, they all decompose information into distinct constructs to enhance flexibility and scalability. The following table shows how terminology maps across these three approaches:

Concept	Focal	Data Vault	Anchor
Core Business Concept	Focal	Hub	Anchor
Association/Relationship	Relation	Link	Tie
Descriptive Data	Descriptor	Satellite	Attribute

Understanding this terminology mapping helps when reading documentation or comparing approaches. Despite different names, the underlying concepts are remarkably similar.

The Three Categories of Data

Understanding these categories is essential for grasping how the Focal Framework organizes information:

Category	Description	Example
Core Business Concepts (Keys)	Stable identifiers for business entities that rarely change	Customer ID, Product ID, Order Number
Descriptive Data	Attributes that describe entities and change over time	Customer name, address, email, phone number
Relationship Data	Connections between entities that evolve over time	Customer-to-Order, Product-to-Category

Core Principles of Ensemble Modelling

These three principles guide how data is structured:

Keys are stable and not historized: Once a business entity gets an identifier, that identifier never changes. This provides a solid anchor point for all related data.

Descriptive data is temporal: Attributes are stored in their own structures with timestamps, preserving the complete history of changes.

Relationships are temporal and many-to-many: Connections between entities are stored separately, allowing relationships to change over time without affecting the core entities.

Key Concept: Non-Destructive Development

The ultimate goal of this architecture is non-destructive development - the ability to add, change, or remove data requirements without affecting existing tables or code. When your business needs evolve, you can extend the system rather than rebuild it.

This is achieved through the "typed" table structure. Adding a new business attribute does not require a DDL command like ALTER TABLE. Instead, the process is managed entirely through metadata:

A new attribute is defined as part of a new or existing Atomic Context in the metadata model
Data for this new attribute is then loaded as new rows into the appropriate, pre-existing table structure

This principle eradicates the primary source of risk and delay in data warehouse evolution - schema and code regression - thereby enabling true development agility. While Anchor Modelling also avoids destructive DDL changes, it does so by adding new tables, introducing structural change. Focal avoids both destructive DDL and structural changes to the data layer, making the system exceptionally stable.

The Physical Focal Model

Learning Objective: Understand the four table types that make up a Focal data model and how they differ from traditional table design.

The Focal Framework makes a strategic shift away from conventional, attribute-specific table design. Instead, it employs a small set of generic, pattern-based table structures. This approach provides extreme flexibility and stability, as the physical schema is decoupled from the logical business model and does not need to change when new business requirements emerge.

Every business entity in a Focal model is represented by up to four physical structures. This consistent pattern - following the DRY (Don't Repeat Yourself) principle - means you can build a data warehouse for any industry using the same reusable structures and code.

Understanding Typed vs. Flat Table Structures

The fundamental difference between a traditional "flat" table and Focal's "typed" structure is how a logical record is stored:

Flat table: Stores one logical record in one physical row with named columns (e.g., loan_key, loan_amount, loan_start_date)

Typed table: Splits a single logical record across multiple physical rows, where each row represents a single Atomic Context identified by a TYPE_KEY

For example, a single loan (loan_key 1234) might be stored across two rows in a Descriptor table: one row for Loan Amount (type_key 234) and another for Loan Period (type_key 567). All attributes belonging to an Atomic Context are stored together on a single row.

Key Concept: The Atomic Context

The Atomic Context is a foundational organizing principle within the framework. It is defined as the smallest set of one-to-many attributes that, when taken together, provides a complete, unambiguous business fact by answering a single, atomic business query.

Simple Atomic Context: May consist of a single attribute. The query "What is your first name?" can be answered with the first_name attribute alone.

Complex Atomic Context: Requires multiple attributes to be complete. The query "What was your end-of-day balance yesterday?" is not answered by the value 1000, which is ambiguous. It is only partially answered by 1000 EUR. A complete, unambiguous answer requires three attributes: a balance date, a balance value, and a balance currency.

Crucially, the Atomic Context - not the individual attribute - is the primary unit of storage in the physical implementation. In Focal physical tables, the name of the Atomic Context is TYPE_KEY. Join the TYPE_KEY to the Atomic Context part of the metadata model, and you can find out anything about what it stands for, what attributes are involved, in what physical columns the attribute values are stored, and much more.

The Four Physical Table Types

1. The IDFR Table (Identifier Table)

Purpose: Achieves key integration by managing the relationship between business identifiers and surrogate keys.

In real-world systems, the same business entity often has multiple identifiers. A customer might have an internal ID, a social security number, a loyalty program number, and an external partner ID. The IDFR table maps all these business identifiers to a single, stable surrogate key.

Practical Example: Customer Identification

Consider a customer who can be identified by either their internal customer ID (14598AC2) or their social security number (19990310-1693). The IDFR table stores both:

Customer_IDFR	Row_Status	Customer_Key
14598AC2	Y (Active)	1234
19990310-1693	Y (Active)	1234

Both identifiers point to the same Customer_Key (1234), allowing the system to recognize these as the same customer regardless of which identifier is used.

IDFR Table Column Reference:

[ENTITY]_IDFR: The business identifier string from source systems
EFF_TMSTP: Effective timestamp - when the row became valid
VER_TMSTP: Version timestamp for bi-temporal support (multiple versions at same effective time)
ROW_ST: Row status (Y=Active, N=Inactive)
[ENTITY]_KEY: The stable surrogate key for the entity

Audit Columns:

DATA_KEY: Data key column (not used in Daana implementation)
INST_KEY: Batch identifier - indicates which pipeline loaded the row
INST_ROW_KEY: Workflow identifier - indicates which workflow created the metadata for the row
POPLN_TMSTP: Population timestamp - when the row was physically loaded into the table

2. The FOCAL Table

Purpose: Maintains exactly one row per entity instance, identified by the surrogate key.

While the IDFR table may contain multiple rows for the same entity (one per business identifier), the FOCAL table consolidates these into a single authoritative row per entity. This simplifies queries when you need to work with unique entities rather than their various identifiers.

3. The Descriptor Table

Purpose: Stores all descriptive attributes about an entity in a flexible, metadata-driven structure.

The Descriptor table is perhaps the most innovative component of the Focal Framework. Instead of creating fixed columns for each attribute (like traditional tables), it uses generic columns that can store any type of data. A TYPE_KEY identifies what kind of data is stored in each row.

Descriptor Table Column Reference:

Entity Identification Columns:

[ENTITY]_KEY: The surrogate key that provides stable identification of an instance for the entity
TYPE_KEY: The key that represents what type of data is stored on this row

Temporal Columns:

EFF_TMSTP: Effective timestamp - when the row was created/became valid
VER_TMSTP: Version timestamp - provides bi-temporal support when multiple versions exist for the same effective timestamp
SEQ_NBR: Sequence number (not used in Daana implementation)
ROW_ST: Row status indicator (Y=Active, N=Inactive) - combined with temporal columns, tells whether the row is currently active

Value Columns (Generic Data Storage):

STA_TMSTP: A mappable column for timestamp, time, or date data (start/from date)
END_TMSTP: A mappable column for timestamp, time, or date data (end/to date)
VAL_STR: A mappable column for character and string data
VAL_NUM: A mappable column for numeric data
UOM: Unit of measure - a mappable string column used to classify what kind of data resides in VAL_STR or VAL_NUM

Audit Columns:

DATA_KEY: Data key column (not used in Daana implementation)
INST_KEY: Batch identifier - indicates which pipeline loaded the row
INST_ROW_KEY: Workflow identifier - indicates which workflow created the metadata for the row
POPLN_TMSTP: Population timestamp - when the row was physically loaded into the table

Why This Design Matters:

The granularity of data loading follows the pattern: source_table + type_key + target_table. This means each data pipeline is unique to its specific source, attribute type, and target. When you need to add new attributes, you simply create new TYPE_KEYs and map them - existing pipelines remain untouched. This is the essence of non-destructive development.

4. The Relationship Table

Purpose: Captures and tracks relationships between different entities over time.

Relationship tables follow a naming convention: [ENTITY1]_[ENTITY2]_X. A single relationship table can store multiple types of relationships between two entities. For example, an ORDER_CUSTOMER_X table might contain:

BOUGHT_BY: The customer who placed the order
RETURNED_BY: The customer who returned the order
SHIPPED_TO: The customer receiving the shipment

Relationship Table Column Reference:

[ENTITY_01]_KEY: Surrogate key for the driving entity
[ENTITY_02]_KEY: Surrogate key for the related entity
TYPE_KEY: Identifies the type of relationship
EFF_TMSTP / VER_TMSTP: Temporal tracking for bi-temporal support
ROW_ST: Active/Inactive status

Audit Columns:

DATA_KEY: Data key column (not used in Daana implementation)
INST_KEY: Batch identifier - indicates which pipeline loaded the row
INST_ROW_KEY: Workflow identifier - indicates which workflow created the metadata for the row
POPLN_TMSTP: Population timestamp - when the row was physically loaded into the table

Here as well, the TYPE_KEY is how we join into the metadata to get full lineage and understanding of the rows of data in the relationship table(s).

Operational Lineage

In the Focal Framework, not only the logical/conceptual metadata is stored. The framework also logs "how/when" you are loading data. In every physical table there is a column with name INST_KEY (instance_key) which is a synonym for PROCINST_KEY. The procinst_key (inst_key) is the representation of an execution of a pipeline.

In the Focal metadata layer, there is a table called PROCINST_DESC. Here the Focal Framework logs each pipeline that has been run.

If you wonder what pipeline logic was used to load a specific row in a "data table", you can query the Focal Framework:

SELECT DISTINCT pd.val_str
FROM DAANA_METADATA.PROCINST_DESC pd
INNER JOIN DAANA_DW.CUSTOMER_DESC cd
  ON cd.INST_KEY = pd.PROCINST_KEY
WHERE cd.CUSTOMER_KEY = 'CUST_12345'

This will return the SQL used to load that specific row of data. You can now test and check the SQL that loaded a specific row, and verify if something is wrong with the pipeline.

Continue Your Learning

To explore the Focal Framework in greater depth, visit the official YouTube channel for video tutorials and demonstrations:

Focal Framework YouTube Channel (opens in a new tab)

Understanding the Focal Framework

A Comprehensive Guide to Metadata-Driven Data Warehousing

Introduction

Key Concept: The Two-Layer Architecture

The Metadata Model: The Engine of the Framework

Key Concept: Complete Model Lineage

The Semantic Key: Connecting Data to Meaning

How Dynamic Data Interpretation Works

Key Concept: Dynamic Semantic Correction

Ensemble Modelling: Understanding the Landscape

The Three Categories of Data

Core Principles of Ensemble Modelling

Key Concept: Non-Destructive Development

The Physical Focal Model

Understanding Typed vs. Flat Table Structures

Key Concept: The Atomic Context

The Four Physical Table Types

1. The IDFR Table (Identifier Table)

2. The FOCAL Table

3. The Descriptor Table

4. The Relationship Table

Operational Lineage

Continue Your Learning

Related Documentation