The (r)evolutions of Microsoft Fabric

Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, you can leverage the power of data and AI to transform your organization and create new opportunities.
For more Microsoft Fabric topics 👉 Microsoft Fabric | element61

The unification of existing data services into one single platform is a true revolution when we consider the history of the Microsoft Data Warehouse and Business Intelligence services over the last decade. The choice for the data lakehouse is a choice for the future of data analytics and is entangled with the architectural changes released with Fabric.

OneLake, the data lake to rule it all

OneLake is the central denominator which connects the different fabric services: it serves as the common storage location for ingestion, transformation, real-time insights, and Business Intelligence visualizations:

Image
Fabric-Onelake.drawio

OneLake builds on the scalable and cost-efficient storage infrastructure of Azure Data Lake Gen2 (ADLS Gen2) and embraced the delta file format to store the data in an efficient and reliable way. It is a true revolution because Microsoft adopted the open-sourced Delta file format technology (from Databricks) to store all the Fabric-hosted data and adapted its performant data querying (from SQL Server, Synapse dedicated and serverless pools) and analytical engines (from Analysis Services Tabular model and Power BI datasets) to embrace this Delta format. OneLake is also compatible with existing ADLS Gen2 applications such as Azure Databricks, as it keeps on supporting the same ADLS Gen2 APIs and SDKs.
The biggest advantage behind the OneLake strategy is that it minimizes data friction because the data can be used by multiple transformation and analytical engines, and thus it will overcome hinder from the perspectives of data integration, data quality, data governance and data analysis in the unified data lakehouse platform.

The unified data lakehouse platform

Whether your Microsoft-flavored modern data platform was built with Azure Synapse or with Databricks on Azure, the platform workloads mostly existed minimally out of a storage location (Azure Data Lake Gen2), an orchestration or ingestion engine, a transformation or compute engine (Synapse and/or Databricks), and an analytical store and an in-memory engine (Power BI). The Synapse Analytics platform already combined storage and compute, and a link to Power BI, in a Platform-as-a-Service (PaaS) solution.
Fabric combines these 5 already existing -but improved- workloads and added one new workload in a Power BI-based portal: Synapse Data Science, Synapse Data Warehousing, Synapse Data Engineering, Synapse Real Time Analytics (new!), Data Factory and Power BI:

Image
Fabric-Unified Platform.drawio

The main idea behind the unified Fabric platform is that every persona (data engineer, data scientist, Power BI engineer, ...) has its development experience in the programming language of his choice and works on the same data, hosted in OneLake. This means a simplification of the data engineering platform, and thus you don’t have to choose anymore between serverless pools, dedicated pools, DWU’s, or Spark clusters.

With Synapse Analytics or with the Databricks-infused data lakehouse platform, an Azure infrastructure expert still had to provision the data services in Azure, which means that she/he needed to be aware of networking, the communication between the components and overall security. Setting up this platform was mostly done by using Infrastructure-as-Code because it is easy to provision several environments and it ensures consistency, efficiency, and compliancy.

Microsoft Fabric is the real SaaS-ification of the Microsoft data platform because it combines all the data services in one new experience and it can be used off the shelf, in a so-called no-knob experience. Everything comes out of the box, without the need to be an infrastructure expert!

The evolution from server-based SQL Server services (with the SQL engine, Reporting Services, Integration Services and Analysis Services), over serverless Azure Data Services (With Synapse Analytics, Azure SQL Database, Azure Data Factory, and Azure Analysis Services) to Microsoft Fabric in 5 years is quite remarkable!

Direct Lake Mode in Power BI

Next to import mode -where data is loaded into memory- and direct query -where data is queried ad hoc and directly on the source-, Microsoft released a new dataset capability called Direct Lake mode. Direct lake mode is a feature of Power BI that allows users to connect to their existing data lakes without copying or moving the data. OneLake is not the only big data source that can be used to discover new insights, also Azure Data Lake Gen2 or Amazon S3 buckets can be sourced if the file format corresponds to the Delta, a columnar compressed file format.

With import mode, data is scheduled to be imported into the Power BI dataset. After the data is read from the DW model, it is compressed, pre-calculated and stored in the Power BI in-memory store to quickly answer queries from Power BI interactions. The biggest disadvantage of import mode is that data, once stored in the dimensional data model, needs to be translated into the Power BI dataset proprietary file format, and this takes time, but it offers blazingly fast response time!

Image
Direct Lake Mode-Import Mode

On a high level, with Direct lake mode, the queries are sent directly to the Delta tables and the required columns are loaded into the Power BI cache. Because Power BI supports the delta format the data is directly sourced from the OneLake data model and doesn't need to be translated into a highly performant file format for the insights. One of the other Direct Lake mode innovations to speed up queries is the vorder algorithm, which sorts the data in the parquet files in a way like the vertipaq engine does. This enhances both compression and querying speed.

Image
Direct Lake Mode

Is Fabric here to stay?

Fabric is not yet-another-analytics-layer on top of the existing services, but a new platform that was built from the ground up, and using the technologies that were leveraged before in Azure services/platforms. The Azure data stack is maturing at an impressive pace, with new features and capabilities being released regularly to meet the diverse and complex needs of data professionals and business users.
The simplicity of the unified platform will be an accelerator to focus where it matters the most and that's by delivering successful analytics projects.