

We’re seeing consulting customers putting Lakehouse and BI solutions on-line in just a few weeks. Imagine that the new version is easier to assemble with instructions that are only a few pages with stick figures, and it comes with an Allen wrench. All of the components are modern, mature and capable but complicated and require specialized skills. The Lakehouse is the evolution of the earlier cloud data platform in many pieces that came with “some assembly required”. It builds on these technologies to deliver a true lakehouse data architecture, making it a robust platform that is. Scale-out clustering allows you to only pay for what you use, when you need it. The SPARK engine is the industry standard, and universal for fast queries and on-demand processing. The Parquet format provides far more flexibility than text files, but we can still use JSON, XML and CSV files for portability. File-based data is easier and less-expensive to manage. Data Engineers use Pipelines rather than packages. Data professionals are accustomed to using Notebooks to transform sets of data. Students are coming out of universities coding in Python. Oracle Cloud: Oracle Cloud provides a LakeHouse architecture through. ETL tools like SSIS and Informatica are now has-beens. A LakeHouse architecture is a modern data architecture that combines the strengths. What is so attractive about this Lakehouse thing, anyway? Although SQL Server and Oracle will be around for a long time, many now consider them to be “legacy” databases used to manage line-of-business data. As a Consulting Services Director in a practice with over 200 BI developers and data warehouse engineers, I see first-hand how our customers – large and small – are adopting the Lakehouse for BI, Data science and operational reporting. When I first started attending conference and user group sessions about Lakehouse architecture, I didn’t get it at first, but I do now and it checks all the boxes. However, most first-generation cloud-borne, file-based data products don’t naturally blend with analytic reporting platforms like Power BI. Seems that every vendor began to promote their version of “The Modern Data Warehouse”.

Governance and data quality controls that are needed to provide dimensional constraints just weren’t there. Remember how “Big Data” was going to change everything? Then came the Data Lake, which had great promise, but still left many questions unanswered. Data storage fads have come and gone over the past decade as the industry shifted from on-premises data storage to the cloud.
