There were many new exciting announcements at the Microsoft Ignite 2020. One of them was the Power BI connection to Azure Databricks.
This was a long-awaited feature for many of us.
Before this connector became available one option to connect to Azure Databricks data was to use the built-in Spark connector from Power BI. But using this connector was resulting in a very poor performance. It was reported many times by users that it can take hours to load ~3-4 GB of data into Power BI.
Another option to have data from Azure Databricks in Power BI was to use the Power BI APIs to push data to a Power BI push dataset. This allows for streaming data to Power BI.
Azure Databricks Connector in Power BI
In order to use the Azure Databricks connector in Power BI you need to have the Power BI Desktop 2.85.681.0 version or later. The connector is currently in Public Preview.
The authentication to Azure Databricks can be done either using Azure Active Directory or a Personal Access Token created in Databricks.
At the moment, the connection to the data from Azure Databricks can be done only through the Import Connectivity mode. With the current Public Preview version of the connector the Direct Query mode connectivity mode does not work yet.
For us the most exciting thing about the release of this connector is the possibility to connect to Delta Lake tables. You can leverage Azure Databricks for all the heavy computations for the data cleaning, transformations and complex calculations and still have the possibility to bring the data to the reporting layer. All the work done by the data scientists and data engineers in Azure Databricks can be made available for reporting to the business users.
When you refresh the data from Power BI your cluster needs to be on, but once you click refresh or have a scheduled refresh the cluster in Azure Databricks will be automatically turned on.
What does this connection mean for us?
Delta Lake is becoming the standard for data lakes and is present in many data pipelines. With this integration in some cases we would not need a service in between Azure Databricks and Power BI, but we could directly be reporting on the data in the delta lake tables from Power BI. With this we are leveraging the performant technology of Databricks to do all the computing and once the data is crunched and modeled we could easily report on it in Power BI. This way the data that the data scientists and data engineers are crunching can be consumed by the business users as well.