At moment of writing, it is 23rd of May, 7 p.m., Belgium time. Microsoft Build is happening live in Seattle, USA, and Microsoft just announced Microsoft Fabric, an end-to-end analytics platform taking Analytics and Azure Synapse to the next level.
Which trends drive the origin of Microsoft Fabric?
Microsoft Fabric is Microsoft's response to the paradigm shifts happening in Analytics. Since the announcement of Azure Synapse in 2019 (considered the analytics service in Azure), the world of Analytics has further evolved with some clear growing trends incl.:
- Broad use of Delta Lake and the Delta-format to store data in a Data Lake. The Delta Format has been widely used now across Databricks and Microsoft Azure as the best format to store data in a Data Lake. Not only does it allow ACID transactions, it also allows time travel, huge performance uplift vs. parquet and running Lakehouse - i.e., running your Data Warehouse without your Data Lake.
- Lakehouse as proven set-up to build your Data Warehouse. Rather than keeping a Data Lake & Data warehouse separate, it's clear that the true unified vision of Analytics means 1 Data Lake acting as Lakehouse for both BI, Machine Learning, Batch & real-time analytics.
- Revival of SQL. Although Python & Spark offcourse became mainstream in Analytics & AI, it's clear that SQL remains the easiest language for a lot of Data & Analytics Engineering work. SQL is now again increasingly popular & used again in tools like DBT, Synapse Serverless and Databricks SQL.
- Serverless & autoscaling for the win. Rather than predefining how big you want to make your clusters, Data Engineers embrace Serverless (almost SaaS alike) services like Synapse Serverless and Databricks which scale up when needed & scale down when not used.
Subsequently to these changes in the last years, Azure Synapse has been surrounded by complementary products & smaller new announcements, such as Azure Databricks, Azure Synapse Serverless and the Delta integrations in Azure Data Factory. However, there wasn't a holistic answer: not a true clear vision on how Microsoft envision the future of Analytics. Enters Microsoft Fabric.
What is Microsoft Fabric?
Microsoft Fabric is partly a revolution & partly an evolution.
Microsoft Fabric integrates the best of what was already there incl.:
- Azure Data Factory,
- Azure Synapse Dedicated Pools - now called Synapse Data Warehouse,
- Azure Data Explorer - now called Synapse Real-time Analytics,
- and Azure Machine Learning - now called Synapse Data Science.
= Microsoft Fabric as such enables a one-stop shop to tackle analytics being batch analytics, real-time analytics, machine learning and/or BI. Fabric’s functionality now spans the full data lifecycle, including data ingest and integration, data engineering, real-time analytics, data warehousing, data science, business intelligence and data monitoring/alerting.
Firstly, what's new is that all these workloads are now data “lake-centric” and thus operate on top of a unified data store called OneLake, which leverages the Delta Lake format. This is a HUGE step as it means Microsoft fully adopts the Delta Lake framework for all Analytic workloads.
With OneLake, Microsoft aims to go even one step further than what's out there today: while organizations could already build their own Delta Lake Lakehouse, One Lake offers a unified storage system accessible to all developers and not only Data Engineers. This includes:
- Enabling Analysts who use Power BI and can now use the Direct Lake connector in Power BI to directly query the OneLake (& thus Delta Tables).
- Enabling BI developers who use SQL (not SparkSQL) & who can write their transformations in Fabric in native SQL on OneLake.
- Enabling Data Engineers who use Python and who can (continue to) use Spark to write their data jobs.
- Enabling Machine Learning developers who use Python or other ML tools.
With OneLake, Microsoft envisions to take away the burden for many who want to use data but didn't get or find it in the right format using the right skillset.
2. Direct Lake connection in Power BI
Secondly, the unification of Fabric & Power BI is unprecedented & thus a revolution. Power BI is driving Microsoft Fabric as - both in licensing as well as integration perspective - Power BI Premium customers will be in a pole position to use Fabric.
One of these revolutions is Direct Lake mode, a new connection way to get data into Power BI (next to Import & DirectQuery). When working with data in Microsoft Fabric, a BI engineer doesn’t have to decide whether to import the data into a Power BI model or leave it in OneLake and query it on the fly. With the aid of something called Direct Lake mode, the data in OneLake already is in a Power BI-native format. With Direct Lake mode, the need to import data from OneLake is simply eliminated and reports can be directly real-time, instant and as such infinitely scalable.
3. A unified cockpit
Microsoft Fabric will be a new Azure service available and, as such, will create a new User experience to browse. This new UX will allow to access data, do data engineering & directly extend to Power BI.
What can we find within Microsoft Fabric?
Fabric has seven core components as we speak:
- Data Factory, based on Azure Data Factory and thus providing to build data transformations and data pipelines.
- Synapse Data Engineering, based on the same technology as the Spark pools in Azure Synapse Analytics, it allows for notebooks & thus giving a code-first data engineering. However, a nice touch: Spark resources in Fabric can now be provided on a serverless basis using “live-pools”.
- Synapse Data Science, which provides for training, deployment and management of machine learning models. This component is also based largely on Spark, but incorporates elements of Azure Machine Learning, SQL Server Machine Learning Services and the open source MLflow project, as well.
- Synapse Data Warehousing, known as originally Azure SQL Data Warehouse technology and now “converged” to Lakehouse. Note that to enable this and to connect Fabric to OneLake (a Lakehouse), Microsoft did a huge effort in optimizing the SQL engine to make sure that BI users can benefit from Data-Warehouse alike performance while running actually 'on a Data Lake' and not database.
- Synapse Real-Time Analytics, which aims to combine Azure Event Hubs, Azure Stream Analytics and Azure Data Explorer and supports the OT analytics and thus analytics on IoT, telemetry, log and other streaming data sources.
- Power BI, Microsoft’s flagship business intelligence platform, but soon-to-be enhanced with a new large language model AI-based Copilot experience that can generate DAX (Data Analysis eXpressions — Power BI’s native query language).
- Data Activator - a component that will allow implementing data monitoring/alerting & will be full new technology
How to get started
Microsoft Fabric is currently available in public review, and users can sign up for a free trial to explore its capabilities via Power BI (i.e. Microsoft Fabric free trial). The platform offers a fixed trial capacity of 60 days, enabling users to leverage all features and capabilities, from data integration to machine learning model creation. Existing Power BI Premium customers can seamlessly activate Fabric through the Power BI admin portal, while it will be enabled for all Power BI tenants after July 1, 2023.
As element61, we have been privileged of having an early look at the platform & capabilities. Feel free to reach out to jointly run-through & discuss how Fabric links and fit in your Modern Data Platform