Data & AI Summit 2025 insights

It has become a yearly tradition for element61 to be present at the Data & AI Summit of Databricks, and this year was no different. This year, we were represented by Raphael Voortman and Reiner De Smet, who were soaking up all the knowledge that the summit had to offer. For me, I make it a yearly tradition as well to take a bit more time and use a bit more text to bring you, our customers and followers, our reflection on what Databricks is trying to achieve.

You can check out the articles from past years here
2024: Databricks Expands: From Data Engineering to Gen AI & Ingestion
2023: Goodbye Modern Data Platform, Hello Data Intelligence Platform? A Data & AI World Tour recap
2022: Our learnings of the Data & AI Summit 2022 in San Francisco

The summit keeps on growing (and so does Databricks)

The event was as energising as always, and although it continues some of the trends we have seen in past years, they always seem to pull a couple of rabbits out of their hats. We had 5.000 attendees in 2022, 16.000 attendees in 2024, and now in 2025, there were close to 22.000 attendees. At this point, in a couple of years, they probably will need to build their own event venue to make sure they can keep on hosting all interested parties.

This growth curve seems quite steep, but their revenue growth curve is even steeper as they are growing at 50% year over year. Next month, they will be at an annualised revenue of 3.7 billion dollars, a pretty crazy number! Last December, they raised a whopping 10 billion dollars, which values them at 62 billion dollars. And Ali Ghodsi, the Databricks CEO, said they were going to hire around 3.000 people in 2025 to strengthen their current workforce of 8.000 employees. The only thing where they seem to have regressed is the epicness of the main guest they are inviting to the keynote, but to be honest, I would not know who could have followed in the footsteps of Jensen Huang, the Nvidia CEO, who left quite the impression during the summit last year.

Aggressively buying their way into more parts of the Data Ecospace

I could just keep the same title I had here last year, as Databricks continues to be aggressively focused on trying to conquer any and all spaces in a data platform. Last year, their focus was mostly on entering the ingestion space with the introduction of LakeFlow Connect and getting into the Dashboarding space by introducing Databricks AI/BI. This year, their focus has shifted to Apps, Databases and Agentic AI.

Now, to be fair, LakeFlow Connect has been somewhat underwhelming, with only a limited number of connectors available so far, so announcing entry in a certain data space is not the same as conquering it. This is true for many vendors (see Fabric and Snowflake f.e.) as an announced feature might take many months to years to actually be fully usable. Still, Databricks remains incredibly aggressive.

They are mostly buying their way in many of these spaces, having acquired numerous companies over the years, such as Arcion for ingestion capabilities and Tabular for standardised file formats, and this year they have acquired Neon, which powers their new Lakebase offering and Bladebridge, which underpins LakeBridge, a tool designed to simplify migrations from different data platforms to Databricks. The prices they are paying are quite hefty (a whopping 1 billion dollars for Neon, for example), but I think it does make sense if you want to quickly conquer a quite new area as they bring not only technology but also the talent who can keep on pushing it in the right direction. One concern here could be that the companies and people that got acquired might lose motivation after a big payout, but I’m sure Databricks is doing everything it can to keep them engaged.

UFC Fight 317: Databricks vs Snowflake

It is hard to talk about the one without mentioning the other, so I wanted to share my opinion on the Snowflake-Databricks war. There does not seem to be a lot of love lost between Databricks and Snowflake. Every year, they host their respective summits one week apart, and it is before, during, and after these summits that the mudslinging between the two seems to intensify. So many LinkedIn posts, some claiming that Snowflake is x% cheaper, some claiming Databricks is x% cheaper, on the same functionality. These investigations and tests, unfortunately, are often quite biased as most of the time they are pushed by people closely aligned to either Snowflake or Databricks, and they always try to make exaggerated statements to grab attention. Tweaking cluster settings, picking a specific dataset that suits one of the platforms more, etc. We are living in a world where it is increasingly difficult to determine what is real and what is fake (with all the fake news, deep fakes and generated noncurated content), and it's a shame that you cannot fully trust what is being posted on LinkedIn, Medium & Reddit.

A grim story, but what do we know

Unless we get some really independent research done (or we lock one prominent advocate of each of them in one room together with an arbiter), it is good to be critical of statements that are being made.
The truth, most of the time, probably lies somewhere in the middle, which probably means that the performance is on most fronts not that far apart. Snowflake is probably still a bit better on classical warehousing, Databricks is better on streaming, Gen AI, ML, etc.
Competition should have a positive effect on the cost performance quality; they will keep on pushing each other to get better.
They both seem to want to be a fully fledged data platform, so likely even the features and capabilities will mimic each other over time (they both added an ingest component recently). Snowflake faces a more uphill climb to incorporate all the AI & ML capabilities, while Databricks might be a bit slower in getting business users fully integrated into the platform. It could be a reason to choose one way or the other.
A stark difference between the two of them is that the focus on Open Source and the focus on interopability of data formats is a lot higher at Databricks. For managed tables, you can now switch to Iceberg format, which ties into a lot of different clients for sharing and using data, like Snowflake, even.
Both of them announced features at the summit that might take a while to be stable enough for production use.
In x months/years, Fabric might also legitimately enter the discussion, which could make everything a bit murkier. Fortunately, the Fabric pricing model is so different that doing comparison tests with the other two is quite hard, so I would expect fewer LinkedIn wars with Fabric included.

So what got announced at the Databricks Summit

Let's get excited again, as there is a lot of new stuff, as per usual. Most of it can be grouped into two main areas

Democratizing Data & Shifting Focus to Business Users
Elevating Apps and Agents to the next level

Democratizing Data & Shifting Focus to Business Users

Databricks is traditionally a platform that is really well-liked by Data Engineers and more technical people. There is a ton of flexibility possible, whether it is writing notebooks in Python, Scala or SQL, or tweaking your cluster parameters to make them perfect for a specific job. Because of these many capabilities and possibilities, onboarding business users can sometimes be a bit overwhelming. Last year, they already started a bit the revolution to try to get more business users on their platform by adding Serverless for Notebooks (no more tweaking of parameters) and by releasing Databricks AI/BI (to get proper dashboarding on their platform), and this year, they doubled down.

Databricks One

This is a big one. They announced Databricks One, which basically is a separate portal for Business Users, where they do not see all options that a Engineer would need to see, but they only have access to a couple of specific parts like Dashboarding and Apps.

This is a pretty great addition to the platform because, before, if you wanted to get a business user (or maybe your CEO) to access some data, you would be more likely to forward them to a Power BI report, but now Databricks can become the main entry point. The fact that you can have your dashboards, your apps, your data catalog here, in one location with everything intertwined and context aware at the source, and that Databricks sprinkles a lot of intelligent features on top of this, can cause a real shockwave in the industry. This is not in Private Preview yet, so it will take some time before it is actually released, but I think it is a great feature.

Unity Catalog Additions

I love Unity Catalog, from a security and engineering perspective. Having one place where all your permissions can be set for a broad level of objects (Tables, Volumes, ML Models, etc) is just very powerful. But now the Data Governance people can rejoice as well as they are working on making it more like a classical Data Catalog, with Unity Catalog Metrics, an exciting announcement there, available in Public Preview right now.

UC Metrics allows you to define certain metrics and have them served consistently all over your Databricks platform (Apps, Dashboarding, Notebooks, Genie) and in the future also in other BI and monitoring solutions such as Tableau, Collibra and ThoughtSpot. Power BI was not listed as one of the future integrations, which seems like a weird omission. Maybe the Fabric-Databricks rivalry is stopping that?

Next to UC Metrics, they will also introduce UC Discover, which includes

Domains (as you could already see in the Databricks One visual above) to allow you to organise data by business area to align discovery with the organisation's operations
Request for access for easy delivery to consumers
Curated internal marketplace with certification and metadata

Unity Catalog is trying to become a proper Data Catalog!

LakeFlow Designer

Although most Engineers love Databricks for all of its coding possibilities, Databricks realised that low-code/no-code is something that is not going away anytime soon, and they decided to jump on that ship as well (rivalling especially Azure Data Factory/Fabric Pipelines). They announced LakeFlow Designer, which is a visual no-code pipeline builder with the classical drag-and-drop functionalities, but augmented as well with the NLP/Gen AI support that is present in the whole Databricks ecosystem. Business Analysts can create these pipelines, and for the Data Engineers, they can still see the underlying code as everything gets mapped to ANSI SQL, so they can review it in a format that is comfortable for them. The GIF below shows some of the capabilities. A nice functionality you can see at the bottom, where you can get a preview and your output, receiving immediate feedback. Again, here, no Private Preview yet, so it could be multiple quarters away before there is a Public Preview.

Databricks Free Edition

Making Databricks free is a great way to Democratize Data, right? Well, Databricks is free now! But of course not for any production-like setups, it is really meant for onboarding new users, giving people a playground to start with. They also made some of the training content of the Databricks Academy available to everyone now, so they are trying to make it easier for people to just get started with Databricks. In the Databricks Free Edition, it is not yet known what the limitations will be, but you will be able to make apps, make agents, make dashboards, and make pipelines, so it will be a great flavor of everything that is possible. Especially interesting to push Databricks a bit more in Universities, for example.

Elevating Apps and Agents to the next level

Showing your data is one thing, interacting with your data is a whole other thing, and it is clear that Databricks is trying to give us all the tools we need to achieve this.

Lakebase

So this is something different. Where generally Databricks has been focusing exclusively on the Analytical side of the data, OLAP, they seem to dabble here more on the Transactional side, OLTP. So they first had the Data Lake, which they turned into a dual threat, Warehouse & Lake via the Lakehouse. Now they seem to combine their Lake with a more classical Database to form the Lakebase. The release of Lakebase is a direct result of their acquisition of Neon. So what kind of features does Lakebase have

Built on open source Postgres (which is the type of database technology that seems to be leading in the Agentic revolution)
Separation of Compute and Storage, hence Serverless
Includes easy branching on a database level (enabled by the fact it is fast and serverless, it can quickly copy-on-write)
Integrated with the Lakehouse: you can schedule syncs between Lakehouse and Lakebase
Integrates with Lakehouse Apps: that is the part that excites me the most, you can now build your Apps (or Agents) in Databricks, with your Lakehouse Data, but using the Lakebase in the backend to work as your transactional layer, keeping everything in the same platform.

This is in Public Preview right now, but not in Azure West Europe yet. It is being priced at around $ 0.6 per DBU, but not sure how much DBU you would need; still a lot to figure out here.

Agent Bricks

Databricks has been pushing a lot of their chips into the Generative AI war, focusing on their own LLM model DBRX, but also putting a lot of effort into their Mosaic AI offering and Agent Capabilities. However, building high-quality agents is often quite complex as there are many things to tune, it is hard to evaluate the overall process and optimising cost and quality is a difficult journey. So they have now announced Agent Bricks, a simple, no-code approach to build and optimise domain-specific, high-quality AI agent systems for common AI use cases. It has a couple of standard use cases out of the box, as you can see in the picture below.

You can declare your tasks using a high-level description, connect your data sources, and it will automatically make your agent, automatically evaluate and automatically optimise. This is in Beta, not Private Preview, so again couple of quarters away from Public Preview, but it looks like a promising attempt to allow customers to get Agents into production in days instead of months.

Other small nuggets that got announced

MLFlow 3.0 got released
Lakebridge is a new offering allowing easier migrations from things like SQL Server and Synapse Analytics but possibly also from Snowflake
LakeFlow is GA now (and they are adding more connectors, of course)
Databricks Apps is GA now (and the combination of Databricks One + Lakebase + Databricks Apps could be quite amazing)
Databricks AI/BI is GA now
Deep Research Mode on Genie: New feature that allows users to go a bit deeper, finding out what happened and what actually contributed to it happening.
Delta Live Tables got renamed to Lakeflow Declarative Pipelines. I am wondering if Agent Bricks or Lakebase will also be up for a rename at some point in the future
You can now easily convert an external table to a managed table with a single command. Managed tables are getting more and more interesting as the lock-in is getting less, and the extra features like predictive optimisation are impactful.

Summary

The 2025 Data AI Summit highlighted Databricks' continued growth and aggressive expansion into new areas of the data ecosystem. With close to 22.000 attendees, the summit emphasised Databricks' focus on Apps, Databases, and Agentic AI, alongside its steady revenue growth and substantial valuation. Key announcements included the introduction of Databricks One for business users, Unity Catalog enhancements and LakeFlow Designer for no-code pipelines. Databricks also unveiled Lakebase for OLTP databases and Agent Bricks for automatically building AI agents. We cannot wait for these products and features to be actually released, so we can try them out! See you next year!

Data & AI Summit 2025 insights

The summit keeps on growing (and so does Databricks)

Aggressively buying their way into more parts of the Data Ecospace

UFC Fight 317: Databricks vs Snowflake

So what got announced at the Databricks Summit