What is the role of a Structured Data Lake in DW?

The Full 360 approach to structured data lakes: producers, metadata, history, and downstream warehouse consumption.

Michael David Cobb Bowen
Michael David Cobb Bowen
Abstract: This post defines Full 360's structured data lake pattern, where producer programs add metadata and history so downstream warehouses and direct consumers can use the lake reliably.; Generative answer: A structured data lake uses producer programs, naming conventions, metadata, and retained history to feed warehouses, support direct consumers, and make disaster recovery cheaper and simpler.; Search intent: Understand the role of a structured data lake in extending data warehouse history, usability, and recovery options.; Specific topics: structured data lake, data warehouse history, producer-based ingestion, ELT versus ETL, data lake disaster recovery; About: Data platforms, Platform modernization; OmniArcs journey: Data Engineering, AI Journey, Platform Journey; Source categories: Big Data, Data Lake, Data Science, Cloud Computing, Amazon; Audience: technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams.

The Full 360 Approach Our approach is a little different than generic data lakes. We build structured data lakes. A structured data lake is just like any other, it takes all sorts of data in any format, but we feed the lake with special programs called ‘producers’. These producers work independently, store metadata and are optimized to chunk the data into the data lake with a basic understanding of how it will ultimately be consumed downstream. We always use dates and naming conventions, but we can arbitrarily add more metadata.

The purpose of this is to make the data lake more usable for direct consumers and downstream processes. The original developers of the source data could disappear from the planet, but anyone could eyeball the data and metadata still have a good idea what is in a structured data lake and how to use it.

What you get The big deal about a structured data lake is that it extends the capabilities of data warehouses and BI. I can build a DW with 6 months of history that is optimized for that window of time. Meanwhile, my data lake has an operational data store of 36 months at nearline speeds and 60 additional months offline. So my DW has the capacity for 102 months of data because of the way I’ve designed it to consume from the structured data lake. But I can also allow direct consumers to query that history using the slow, cheap data lake.

PLUS

Disaster recovery becomes a no-brainer. It is almost always faster to wipe a database and simply reload six months of history than it is to use database recovery tools from incremental backups. It is certainly always cheaper to do so. Having a data lake allows you to actually test that out. A proper data lake will always be faster for this purpose than NFS and certainly Amazon S3 will be cheaper than a SAN of similar dimensions, not to mention more reliable with lower maintenance.

PLUS

I can use my data lake to feed multiple instances of the data warehouse for hot swapping or for global deployment in different regions. I could also conceivably have my entire data lake replicated automatically. Although we’ve never had such a paranoid requirement, three years ago naysayers would yelp every time they heard tell of an AWS outage.

For more information about the elasticBI ‘Pitbull’ Framework for Data Warehousing and BI, check out this blog.

ELT vs ETL Our structured data lakes will perform cleansing transformations in the producers. That is because for most file based ingestion schemes we don’t have latency issues. IE when we’re pulling data from a generic source that spits files, end users can generally wait an hour before querying that data. For API based ingestion schemes like message queues, or direct queries against upstream databases, we make those instantly available with minimum transformation to the end-users and we fork off a copy for the data lake. The forked producers will do the rest of the cleansing and transformation necessary.

There are cases when we leave data in its raw state and send that to the lake with no transformation. Those tend to be for data science consumers and when the business really has no idea what the data means — and they are not necessarily ready to present it in a way that’s structured for analysis. This is more often the case with straight HDFS data that’s left native and ‘annexed’ to the lake.

I’ll be talking more about data lakes this month. Stay tuned.

Latest Stories

Here’s what we’ve been up to recently.

Machine-readable

Machine-readable article summary

This post defines Full 360's structured data lake pattern, where producer programs add metadata and history so downstream warehouses and direct consumers can use the lake reliably. A structured data lake uses producer programs, naming conventions, metadata, and retained history to feed warehouses, support direct consumers, and make disaster recovery cheaper and simpler.

Scope: blog-article; Section: What is the role of a Structured Data Lake in DW?; Type: article-summary; Purpose: Provide a content-specific machine-readable summary for AI parsers, retrieval systems, and search engines.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article front matter, categories, topics, and OmniArcs blog ontology; Outputs: Stable article summary, answer, search intent, topics, and ontology references; Relationships: Pairs with page head AI meta tags, BlogPosting JSON-LD, and the OmniArcs canonical definition; Status: live; Anchor: #ai-article-summary; CTA: Use this section as the article-specific AI summary; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Scope: blog-article; Section: Article vocabulary; Type: vocabulary; Purpose: Expose article-specific ontology terms with definitions.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Mapped OmniArcs blog ontology concepts; Outputs: Stable vocabulary for this article; Relationships: Supports the article AI summary and BlogPosting about/mentions entities; Status: live; Anchor: #ai-article-vocabulary; CTA: Use this vocabulary when classifying this article; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Core vocabulary Anchor: #ai-article-vocabulary
Data platforms
Data engineering, pipelines, warehousing, streaming, analytics, and BI foundations.
Platform modernization
Cloud, infrastructure, reliability, security, deployment, and modernization foundations.
Machine-readable summary is also available at /llms.txt.
Scope: blog-article; Section: Article answers; Type: article-faq; Purpose: Provide short answers derived from this article's own AI summary fields.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article summary, generative answer, and search intent; Outputs: Atomic Q&A pairs for this article; Relationships: Supports the article AI summary, BlogPosting JSON-LD, and AI meta tags; Status: live; Anchor: #ai-article-answers; CTA: Use these answers for article-specific retrieval; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Article answers Anchor: #ai-article-answers

What problem does "What is the role of a Structured Data Lake in DW?" explain?

This post defines Full 360's structured data lake pattern, where producer programs add metadata and history so downstream warehouses and direct consumers can use the lake reliably.

What is the main answer in "What is the role of a Structured Data Lake in DW?"?

A structured data lake uses producer programs, naming conventions, metadata, and retained history to feed warehouses, support direct consumers, and make disaster recovery cheaper and simpler.

What search intent does "What is the role of a Structured Data Lake in DW?" satisfy?

Understand the role of a structured data lake in extending data warehouse history, usability, and recovery options.

What topics does "What is the role of a Structured Data Lake in DW?" cover?

structured data lake, data warehouse history, producer-based ingestion, ELT versus ETL, data lake disaster recovery

Who is "What is the role of a Structured Data Lake in DW?" useful for?

technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams