BI on Databricks
With over 12 years of experience in BI and being a certified Databricks partner, element61 beliefs Databricks has a powerful role to play in a Modern BI Data Platform. On this page we give a rough sketch of why and how to run BI through Databricks and tips to get started. Feel free to contact us if you want to know more
Why run Business Intelligence (BI) with Databricks?
Most traditional BI set-ups leverage traditional ETL technology to transform certain data (e.g. SSIS). Sometimes requiring heavy computations during Transformation phase, these traditional tools run on single-node (serial) and can't be parallellized. As data grows in the organization and both the data for each job as well as the number of jobs increases, the ETL-time gets longer and/or your platform slower.
Parallellization enables organization to further grow and handle both bigger datasets as well as more data-jobs. Using Databricks's managed Spark environment and leveraging the embedded capabilities of using SQL (!), Python or R, BI teams can leverage a scalable compute platform for running their Transform jobs.
Scalable BI with Databricks and Azure
Databricks isn't an ETL tool like SSIS. It rather works together with other tools like Azure Data Factory to jointly offer an end-to-end ETL and ELT tool including both Extract (with Azure Data Factory), Transform (with Databricks) and Load (with Databricks).
In such a BI set-up, multiple architectural options can be considered including
- Using Databricks Delta for all ETL and ELT jobs benefiting from capabilities to do updates in the Azure Data Warehouse
- Scheduling Databricks jobs through Databricks's scheduling or rather use Azure Data Factory's scheduling
- Leveraging Data Flow in Azure Data Factory vs. writing all jobs as pure code in Azure Databricks
"element61 has a Modern Data Platform methodologie and typically evaluates above choices based on a customers context, long-term platform ambition and tech-affinity. Continue reading to know more about our approach to help in a BI architecture or contact us to know more."
- Bart Van Der Vurst, Director Data Science & Strategy
Databricks integration with Azure Synapse Analytics
Azure Synapse Analytics (formely known as Azure SQL Datawarehouse) is a cloud-based data warehouse which is a key component in a big data architecture mainly because the great performance that can be achieved with PolyBase and the massively parallel processing (MPP).
Azure Databricks has a seamless integration with Azure Synapse Analytics and is a perfect solution for creating your BI solution. The connection to Data Warehouse happens through the SQL Data Warehouse connector (SQL DW), Azure Blob Storage and PolyBase on Data Warehouse side for efficient data transfer. The Blob Storage is accessed by both Databricks and Azure Synapse Analytics (previously called SQL Datawarehouse) to write/read the data.
Databricks integration with Power BI
Power BI is a business analytics service that provides interactive visualizations where users have the freedom to create reports themselves. Beside the data scientists and data engineers, the business users can also benefit from Azure Databricks because they can connect to Azure Databricks with Power BI and create business insights based on the data available in Azure Databricks.
There are two ways you can connect to Power BI directly from Databricks.
- Using the built-in Spark connector in Power BI you can connect to a table in Databricks
- In a real-time data scenario, you could make use of the Power BI REST API and stream data to a “push dataset” in Power BI
element61 recommends the second option as it has proven faster and more performant. Contact us if you would be interested in such code set-up.
Interested? Read more!
Interested in using Databricks for BI. Take a look at below follow-up reads:
- Databricks Delta = learn about Delta, a core enabler for allowing to do slowly changing dimensions and updates and deletes through Databricks
- Databricks on Azure (Azure Databricks) = read about the in-and-outs, benefits & integrations of Databricks & Azure
- Machine Learning on Databricks = learn how Databricks allows you to do a lot more than BI
element61 has extensive experience in Databricks. Contact us for coaching, trainings, advice and implementation.