
Data Modeling for the Lakehouse: What Changes
Traditional data modeling assumed you controlled the database. You defined schemas up front, enforced foreign keys at write time, and optimized with indexes. The lakehouse changes every one of those assumptions. Data lives in open file formats on object storage. Schemas evolve without rewriting data. Queries run through engines that may not enforce relational constraints. The modeling discipline is the same, but the mechanics are different. What's Different About a Lakehouse A lakehouse stores data as files — typically Parquet — on object storage like S3 or Azure Blob. An open table format like Apache Iceberg adds structure: schema definitions, partition metadata, snapshot history, and transactional guarantees. This architecture gives you more flexibility than a traditional RDBMS, but also more responsibility. There are no foreign key constraints enforced at write time. No triggers. No stored procedures. Referential integrity is your problem to solve in pipelines and views, not somethi
Continue reading on Dev.to
Opens in a new tab



