How to load data from on-premises sources into Fabric using DataFlow Gen2 🦾

Introduction

Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, you can leverage the power of data and AI to transform your organization and create new opportunities.
For more Microsoft Fabric topics 👉 Microsoft Fabric | element61

What is the role of an ETL tool in a Modern Data Platform?

Here at element61, we use Azure Data Factory a lot. It is a complete tool to orchestrate data pipelines and is DevOps-enabled. By which, I mean thanks to its git integration, we could follow the changes and test and automate the deployment of the data pipelines to the following environment.

However, today, we’d like to propose a new perspective. With the arrival of Fabric and its extended reach to solve all data issues within one single pane of glass – i.e., the Power BI interface – one might easily have the impression that on-premises data is now out of reach. And this is a perfectly understandable feeling: if my whole data platform is now a SaaS in the cloud, how could anything extract my data stored in a SQL Server running in my data center? In this insight, we will see how combining DataFlow Gen2 and On-Premises Data Gateways could help us get data without writing a single line of code.


Pre-requisites

For this tutorial, we’ll need three ingredients:

  1. An on-premises data source is typically an Oracle or SQL Server running in your data center or anywhere that is not directly reachable from the cloud. In this case, I’ll install SQL Server on my workstation.
  2. An on-premises data gateway is a piece of software to install on a Windows server in your data center. Here again, for the sake of this demo, my laptop will do just fine. This is the same gateway you probably already used for Power BI on-premises connections.
  3. A Fabric workspace: make sure your Power BI workspace is attached to a Fabric capacity deployed in your Azure subscription.

To avoid making this article too long, we won’t go through installing and configuring these components, but the links above will guide you through them.

Now, let's get started...

DataFlow Gen2

To create your Dataflow, open your Power BI / Fabric workspace, and select the “Data Factory” persona. Then hit the New button and create a DataFlow Gen2.

Image
new-dataflow

You’ll see an interface that looks a lot like the Power Query editor. Now Get Data from an SQL Server database.

Image
get-data

Enter the connection details to your data source, and make sure to select your On-Premises Data Gateway in the process.

Image
data-source

Hit “Next”, and the next panel will display the list of tables from your database. Let’s select the Sales. SalesTerritory table from the AdventureWorks database for our example. Then hit “Create”.

Image
select-table

The following screen gets back to the Power Query interface, where you’ll tune the necessary filters you want to apply to your data. Importantly, this is the step to configure where the data needs to be dropped. Evidently, the destination will be a Fabric Lakehouse. Select Lakehouse when clicking on the + sign in the bottom right corner.

Image
select-destination

If you don’t yet have a Lakehouse in your workspace, this is a good time to create one.

Image
lakehouse-connection

Then hit “Next”. Fabric suggests the list of Lakehouse destinations and offers to decide on the table name.

Image
insight-lakehouse-table

Finally, the last screen offers to map the source and destination columns and data types. You can now save the configuration of your DataFlow Gen2 and hit “Publish”.

Image
lakehouse-table-mapping

This automatically triggers the DataFlow, loads the data into your Lakehouse using the On-Premises Data Gateway, and eventually refreshes a Power BI Dataset associated with your Lakehouse so you’re ready to report on the data.

Image
insight-dataflow-run

We can now verify the Lakehouse contains our new table by opening it and listing the tables.

Image
insight-dataflow-result

Conclusion

Thanks to DataFlow Gen2 and the help of On-Premises Data Gateways, we can easily copy data from private data centers to Mircosoft Fabric, without writing any code. We believe the Power Query interface is familiar to many BI developers, so the use of DataFlows Gen2 will not necessarily require the learning curve of Azure Data Factory, and the configuration of Pipelines, DataSets, Linked Services, Key Vault (for the connection string), variables, ForEach loops, etc. DataFlow Gen2 is equipped with numerous advanced features and functionalities that effectively conceal various technical aspects, making it easier for users to focus on their tasks. However, the usual Data Factory experience remains an option and will be preferable to orchestrate many or complex data pipelines.

It is also worth noting that we copied data from a private location while never allowing any incoming network traffic from the cloud. The On-Premises Data Gateway functions by establishing outbound connections to both the data source and Fabric. This means there is no need for any firewall configuration or port opening. The Gateway operates without requiring inbound connections, ensuring secure and hassle-free data transfer. Without a doubt, the same Gateway can be used for multiple sources.