A lot of organizations have realized the strategic potential of their data assets and started to centralize their data in a data lake. However, one of the big struggles a lot of those companies have is the difficulty to access their data from their data lake with easy to use and standard tools and languages. Wouldn’t it be great to just have easy SQL access to data that resides within your Azure Data Lake?
At the end of 2020, Azure released Azure Synapse Serverless, a revolutionary addition to the Azure Synapse analytics product.
What is Azure Synapse Serverless?
Azure Synapse serverless allows you to easily run a SQL query on data that resides in your data lake without spinning up a dedicated cluster. You can query data that can be stored as Delta format, Parquet format, csv format or any other.
Azure Synapse serverless is true serverless and Microsoft takes care of necessary scaling and easy access. As always, we need to consider the cost before setting up new solutions. At a cost of only five dollars per terabyte of data queried, setting up Azure Synapse serverless is almost a no-brainer, especially if you compare it with keeping your data warehouse running 24/7.
As element61, we’re always looking for opportunities to try out new features and see if they can deliver upon the promises we see in the marketing slidedecks. What better way to test out these new features together with our customers. We recently did a number of hackatons with customers on this new feature. Our conclusion was that – even for organizations with a data warehouse setup – Azure Synapse serverless has a true added value to bring to the table.
Concrete use cases
We want to deep-dive on two valuable use cases which we tackled during these hackathons.
- A first valuable use-case for Azure Synapse serverless can be when you have (business) analyst in your organizations who is very skilled in SQL but might not know Spark or Python. Typically, this analyst would only be able to analyse data from the data warehouse and couldn’t perform ad-hoc analyses on data residing in the data lake. Think about some detailed sensor data that a process engineer wants to get from a specific machine or production line. Using Synapse serverless he or she has an easy way to use SQL to fetch & run his or her analyses on the raw sensor data.
- A second interesting use-case can be found when combining Azure Synapse serverless with Power BI. You might be struggling to keep your data model small & fast but at the same give reporting users all detail they require. The compromise would be to only provide the end user with a limited level of detail to ensure speed in Power BI. With Synapse Serverless we can now ‘direct query’ the detailed data ‘on-demand’ & combine this detailed data – in a hybrid model – with the aggregated data from your data warehouse. For example, think about raw sales transactions records which might not be modelled in your data model in your data warehouse but are stored in your data lake. In Power BI the reporting would start high-level: you analyse the aggregated sales per month, per day & step-by-step you drill-down up until the moment you really start looking at details of individual transactions. At that moment, Synapse Serverless takes over & queries on-demand the needed detailed data from the data lake & brings it – through Direct Query capabilities – to Power BI