Databricks Quarterly Pulse - Sept 2025

The world is changing quickly, and so is Databricks. It can be hard to stay informed of all the latest and greatest, so we will try to do it for you! This is the first installment of our Databricks Quarterly Pulse, where every quarter, we will share our view on the readiness of certain Databricks Features, we will share some of the best practice highlights that we are implementing and we will shine light on some new useful functionalities you can start using right away.

Feature readiness opinion

As is the case for all technology vendors (see Fabric, Databricks & Snowflake) announcing a feature or offering does not mean that it is already available, and even if it is available, it does not mean that you can just blindly use it for production use cases. We try to be critical on the features, and we try to divide them in levels of Feature Readiness as is explained in the below table.

Feature Readiness     Description
Do Not Use Yet  Experimental or unstable; not recommended for any use.
Internal Pilot  Safe to test in isolated environments; not production-ready.
Early Adoption Used in limited production scenarios; feedback and iteration ongoing.
Validated Use Proven in multiple setups; supported by internal champions and documentation.
Trusted Use Fully endorsed; widely adopted across teams with best practices established.

So without further ado, here are our reflections as of right now.

Technology Feature Readiness Status What caught our eye?
Delta Live Tables ⓘ Trusted Use GA Addition of Sinks and Flows, easier UC integration, faster validation/development
Databricks Apps ⓘ Validated Use GA Lakebase integration already possible, so they have a clear focus on that
Lakebase ⓘ Internal Pilot Public Preview Very limited documentation, activating the preview does not always work
LakeFlow Connect ⓘ Early Adoption GA SQL Server & SharePoint now available in Preview
LakeFlow Designer ⓘ Do Not Use Yet Beta Nothing to test yet
Databricks AI/BI ⓘ Validated Use GA Global Filters, Dashboard Themes & Drill-Through functionality (still catching up to most common BI tool functionalities though)
Serverless Compute (Notebooks & DLT)  Validated Use GA Issues where certain flows were a lot more expensive than via classic mode seem to be fixed. They added a Performance Optimized tier, which optimizes for time but more expensive.
Databricks Asset Bundles ⓘ Validated Use GA Incorporating in CICD has become less of a hassle, but testing and validating is still not ideal
Databricks Workflows  Trusted Use GA Developing these workflows is still not great, even with Databricks Asset Bundles, so hoping Lakeflow Designer will help here

Best practice highlights

Which SQL Warehouse mode to choose? Not the Pro one, most likely.

We have three different types of SQL Warehouses: Classic, Pro & Serverless. We got a small comparison table here

Feature Detail  Serverless Pro Pro Classic
Monthly running costs (2X-Small)  2.500 3.000 1.600
Startup Time  Couple of seconds Around 5 minutes Around 5 minutes
Minimum Auto Termination (minutes) 1 (using API) or 5 (using UI)  10 10
Functionalities  All All except intelligent workload management No Predictive I/O, no materialized views, no query federation, no geospatial features, no Genie support, and so on ...    

We would really recommend defaulting to Serverless for most of your SQL workloads. It is more expensive than Classic monthly, but the combination of the low startup time and the low auto termination threshold probably means you will be better off anyway. The other two are options in some specific cases:

  • Classic SQL Warehouse: If you do not need any of the extra functionalities, if your workloads are very predictable and constant for hours, and it is not necessary to spin it up in a whiff
  • Pro SQL Warehouse: It's actually a pretty bad deal, and the only way we would consider it, is if you really need the new functionalities, but organizationally you cannot commit to using a Serverless setup (due to networking or security constraints).

Main takeaway: if you are using Pro, definitely reconsider if it is necessary.

Managed tables taking over from external tables

We used to be all for External Tables, as Managed Tables in the past meant we had some lock-in, as we could not see the files or choose where they were stored. However, this is not the case anymore; you can choose where Managed Tables are being stored, and on top of that, Databricks is really focusing on adding extra features to Managed Tables, and we are at the inflexion point where Managed Tables are becoming the preferred solution. Reasons?

  • Reduced storage and compute costs, if you were not handling all aspects in an optimal way already
  • Automatic table maintenance and optimization which leads to faster query performance across all client types 
  • Automatic upgrades to the latest platform features
  • Some nicer new features are only available on Managed Tables, such as Auto Liquid Clustering which automatically selects new cluster columns depending on the queries happening on your tables

What's more, you can now easily switch from an External Table to a Managed Table in Databricks.

Image
switch from an External Table to a Managed Table in Databricks

However when using this watch out for

  • only available in Databricks Runtime 17.0 or above
  • can cause 1-2 min downtime for reader and writers to that Unity Catalog table
  • for bigger tables (>1 TB) use DBSQL Medium to limit to 1-2 min
  • you need a migration procedure if you have scripts using your external path, or if you are streaming from that table

So, should we just switch from External to Managed? Well, one extra thing to take into account is that the location where you store the managed tables is quite important, as you basically want to store them in the same VNet as where you have your Databricks Workspace, otherwise you will pay for a ton of peering costs (see Virtual Network Pricing | Microsoft Azure). But the managed table location (storage root location) of an existing catalog or schema cannot be changed (and is generally left empty, which defaults then to the metastore storage account), so you might need to create a new schema or catalog, which turns it into a sizeable migration project. Next to that, you should generally also stick to external tables if you have a reader or writer that doesn't support Unity Catalog and can therefore only access tables via their storage path. 

To conclude, having this SET MANAGED command helps but as it will still be a sizeable migration, it really depends how important the above reasons (predictive optimization, etc.) are for you organization. For clients starting from fresh, we would recommend managed tables, for clients having external tables, we are leaning towards staying with external for now if you do not have storage root location set in your catalogs or schemas (the migration needed might be reduced in the future).

Note: for Volumes, we still recommend it to be external at all times. All of the benefits mentioned above are exclusive for Tables, so there is no benefit compared to External Volumes, and with External Volumes we have more flexibility and ownership (not every group of files in the storage account needs to be a Volume per se).

Terraform vs Databricks Asset Bundles: better together

Let's start with a disclaimer, we love Terraform! It has rarely let us down in the past and is still a best-in-class solution for managing MOST of your Databricks resources. Heck, Databricks Asset Bundles even uses Terraform underneath. For most people developing on Databricks though, Terraform generally has been a necessary evil they rather not interact with too much. There is some setup needed beforehand to manage the state, which complicates things. Moreover, for things such as Pipelines, Terraform really feels like overkill due to the need of manually needing to sync your changes to your code base. Terraform also does not have an easy way to queue pipeline execution logic at the end of your deployment. Enter, Databricks Asset Bundles, which alleviates these concerns.

The ideal solution seems to be using both Terraform and Databricks Asset Bundles. Terraform, for managing your "infrastructure" in Databricks. Think SQL warehouses, Compute Clusters, Permissions, Major Unity Catalog Objects, Account Level settings, etc. However for Notebooks, Pipelines and Dashboards, Databricks Asset Bundles are a better fit, as it strikes the right balance between ease of use and operational control. However, this would mean that you are managing two different tools, which for some teams can be a bit rough as well. 

So our take

  • if you have not used Terraform yet
  • if you are fine with manually managing a couple types of resources (like groups, permissions, etc.), 

Databricks Asset Bundles will be a great option, that will take some time to get used to at first, but will prove to be quite handy. 

  • If you are already managing everything with Terraform

There are a couple of benefits of switching to Databricks Asset Bundles for certain resources, such as job execution being possible now more easily and a more logical way of managing Databricks Workflows/Pipelines in multiple environments, but the addition of an extra tool does warrant some pushback.

Final remark, when it comes to things such as Databricks Workflows and Dashboards, things that are most easily created in the UI, we feel like a more immersive Git integration would be really beneficial, like you have in Azure Data Factory. Hopefully with the LakeFlow Designer, which is still in Beta, they are evolving more to such an approach, and if that is the case, then we are buying in for 100%.

What is new in Databricks and can you use tomorrow

Schedule Databricks Jobs/Workflows from ADF

Finally, scheduling Databricks Workflows from Azure Data Factory! Used to be quite some pain using multiple REST API calls, but now available out of the box. In current solutions at clients, it can make sense to combine running a lot of your Databricks Notebooks into a single Databricks Workflow, which allows you to optimize your costs, and then scheduling those in ADF! 

However, you still might need a REST API call after all, because at the moment you can only provide an ID instead of a name for the Workflow, and IDs will definitely be different in different environments. You can also fix this by making some changes to the dynamic template parameter definitions in ADF. Waiting all this time for this functionality to drop and unfortunately it is still not perfect.

Group Compute Clusters

With the introduction of Unity Catalog, there were two types of clusters: Shared vs Dedicated. Shared Cluster allowed you to share the compute with many users but had a ton of limitations like not being able to use rdd functions, not being able to use all dbutils methods, etc.


This made it quite a pain if you wanted to migrate your flows to this Shared Compute, which meant some people still opted for using a Dedicated Compute Cluster, which did not have any limitations, but you could only assign it to a single user, so costly to run this in production as many clusters were active at the same time.  But now, you can assign a group instead of a user, giving you the best of both worlds.


Are there any downsides? 

  • You need to give the Group access to everything you want, so giving access is a bit weirder than before
  • You might still need multiple clusters for multiple groups
  • If you do not switch your code to work on a Shared Compute, then you will not be able to seamlessly switch to using Serverless Notebook Compute. 

But in any case, it can be a huge advantage in Unity Catalog Migrations, to use this cluster first, instead of having to transform all of your code!

Insights released in the last couple of months