Unleash the Power of Your Databricks Superhero

TL;DR?!

Databricks is on a mission to boost developer productivity and democratize access to its technology for both technical and non-technical users. With new innovative features such as the AI Assistant, AI Suggested Comments, and Tags, the platform is forging a path towards greater user-friendliness and efficiency. The AI Assistant serves as a valuable ally, offering context-aware suggestions to streamline the coding experience and ask natural language questions. AI-assisted comments automate the documentation of tables and columns, while tags aid in organizing data assets efficiently. Although some features, like the AI Assistant, are still a work in progress, they show promising potential for further advancements in the near future.


Introduction


Databricks is a leading cloud-based platform that enables data professionals to work together and deliver scalable data solutions. 
The platform's rapid evolution can make it challenging to stay abreast of new features (check out our insight about Databricks' Data Intelligence Platform to understand Databricks' vision). To ensure you maximize its potential, this overview will introduce recent additions by Databricks, accompanied by practical tips to enhance your data experience. These features include:

  • AI Assistant: This feature allows you to use natural language to interact with your data and get insights faster. You can ask questions, run commands, and get suggestions from the AI Assistant, which understands your data context and intent. 
  • AI Suggested Comments: This feature helps you document your tables and columns by generating relevant comments based on your table/column names. You can review, edit, and accept the suggested comments, which will improve maintainability. 
  • Tags: This feature enables you to organize and manage your data more efficiently by adding tags to your tables and columns. You can use tags to filter, search, and group, as well as to apply permissions and policies, or even perform cost monitoring. 

These features are designed to help you work smarter and faster with your data. Moreover, Databricks is trying to reduce the barrier for non-technical people to enable everyone to access their technology.  In the following sections, I will demonstrate how to use each of these features and show you some examples of how they can benefit your data projects.

AI Assistant

Databricks Superhero

Imagine having a superhero by your side who can understand your data needs, help you with your tasks, and make your life easier. That's what Databricks intended to do with its AI Assistant. It's a smart and powerful tool that uses context awareness to provide relevant suggestions. It's integrated into Databricks' UI, so you can access it anytime. Moreover, it leverages Unity Catalog metadata as context to optimize your code and queries even more. Has the AI Assistant the potential to be your new superhero? Let's see!

Image
AI Generated Picture - Databricks AI Superhero

 

  • Its superpower is context awareness. For example, it can suggest the best way to join two tables, fix syntax errors, or fill in missing arguments. It can also answer natural language questions about your data, such as "What is the average revenue per customer?" or "How many orders were placed in January?". 
  • It has one mission: make your life easier. It can help you write code faster, avoid common mistakes, and discover new insights from your data. Perhaps it could even come to your rescue when faced with annoying errors. 
  • Every superhero has a hideout, right? The AI Assistant hides within the UI of Databricks. Databricks AI Assistant is seamlessly integrated in the Databricks platform, you can access it from any notebook or from the SQL editor. You don't need to install any additional software, these tools are enabled by default. You can also use Databricks AI Assistant with any language or framework supported by Databricks, such as Python and SQL. 
  • The secret weapon behind its superpowers is Unity Catalog. Databricks AI Assistant leverages information on your data assets (such as tables and views) to give you suggestions. It uses information such as schema, lineage, statistics, tags, comments, etc. Databricks AI Assistant uses this Unity Catalog metadata to understand your data better.

Empowering Productivity

In today's data-driven world, efficiency is key, and having a partner to navigate data complexities is welcome. At its core, the AI Assistant makes use of the Azure OpenAI Service, it seamlessly adapts to your needs, ensuring a smooth transition from idea to execution.

Moreover, it restructures your code and offers the capability to convert between programming languages, facilitating seamless transitions from T-SQL to Spark SQL or from SQL to PySpark. It tries to transform complexity into clarity, and chaos into coherence, every step of the way. 

It's important to stress that the AI Assistant is still a work in progress, it will not always give the expected results. But for now, it's worth experimenting and we look forward to future enhancements. 


Real-World Applications

Before delving into real-world applications, it's essential to provide some guidance. When creating prompts for the Databricks AI Assistant, being specific is key. Consider providing additional context through warm-up questions and offer an example of the expected output.

Code Transformation

For a recent client project, we faced the task of migrating from T-SQL to Spark SQL. To smoothly and efficiently do this transformation we made use of the Databricks AI Assistant.

Image
AI Assistant 1

Natural Language Querying

To harness the power of natural language querying effectively, it's beneficial to start with warm-up questions, which help the assistant understand the context and narrow down the search direction. For instance, asking "/findTables related to account" enables the assistant to present a curated list of potential tables to query. Users can then proceed with more specific queries, like "I want to know how many Accounts I have per Sales Organisation" allowing the assistant to generate accurate insights tailored to the user's needs.

Image
AI Assistant 2

Cell Actions Inside Notebooks

This feature empowers users to trigger various actions at the cell level. By utilizing simple commands like /doc, /fix, and /explain, users can generate documentation, code fixes or explanations relevant to specific cells. 

Image
AI Assistant 3

AI Suggested Comments

Streamlining Documentation

Within Databricks, AI-suggested comments are a feature designed to facilitate documentation processes. Powered by LLM technology, these comments offer support for various data assets, including tables, views and columns. They provide users with descriptions, facilitating understanding and collaboration. Users have the flexibility to edit or accept these comments, tailoring them to their specific requirements. Overall, this feature contributes to streamlining documentation efforts and promoting effective knowledge sharing within the platform.

Unlocking Potential

These comments enhance efficiency by automating the often time-consuming task of documentation, allowing data professionals to focus on more strategic endeavours. Moreover, they promote consistency and accessibility across different projects. Additionally, the implementation of AI-generated comments strengthens data governance practices by providing a centralized and standardized approach to documentation. Moreover, these comments are also available for the AI Assistant.


Hands-On 

Table Level

Effortlessly generate comprehensive documentation for your tables with AI-suggested comments.

Image
AI Suggested Comments 1

Column Level

Get granular insights into your data attributes with AI-generated comments at the column level.

Image
AI Suggested Comments 2

Tags

The Role of Tags 

In Databricks, tags play a pivotal role in organizing and categorizing your data assets, providing a structured framework for efficient data management. Leveraging keys with optional values, these tags offer a flexible way of annotating data assets with relevant metadata. By associating tags with different objects, users can simplify searching and discovery processes, enabling quick access. Moreover, tags give additional context to the Databricks AI Assistant.

Defining Tags

Utilizing tags effectively in Databricks involves understanding the flexibility they offer. While both key-only and key: value pair formats serve distinct purposes, it's essential to consider using both approaches to maximize functionality and accessibility.

  1. Key-Only Tags: These tags consist of a single key without a specific value attached. They are valuable for providing broad categorization and classification of objects.
  2. Key-Value Pair Tags: In contrast, key-value pair tags include additional contextual information by associating a specific value with each key. This format offers a more granular level of detail, enabling precise categorization. 

By incorporating both key-only and key-value pair tags, you can benefit from enhanced organization and searchability. While the Databricks UI currently supports searching based on keys only, leveraging key: value pair tags in code environments provides additional flexibility and granularity in metadata management. This dual approach ensures that users can efficiently navigate and discover relevant data assets, whether through the UI or programmatically, ultimately optimizing the data management process within Databricks.


Implementing Tags

The first approach involves utilizing the user interface, which offers a straightforward way to tag implementation. This method provides users with a visually intuitive way of assigning tags to various tables. However, while convenient, relying solely on the UI for tag management can pose challenges in terms of maintainability. Tags may become disorganized or inconsistently applied over time, leading to difficulties in data governance and discoverability.

Image
Tags 1

To address these concerns and enhance maintainability, an alternative approach involves embedding tags directly into the code that models our tables. By incorporating tags into the notebook code, we integrate metadata seamlessly into the workflow of the tables themselves. This approach not only ensures that relevant information remains closely associated with the corresponding data assets but also streamlines the process of updating and managing tags. Additionally, embedding tags in notebooks offers greater flexibility and control over metadata, allowing for custom tagging conventions and additional contextual information.

Ultimately, the choice between these two methods depends on the specific needs and workflows of your organization. While the UI-based approach offers simplicity and accessibility, embedding tags programmatically provides a more robust solution for maintaining metadata integrity and consistency over time. By carefully considering the trade-offs between ease of use and maintainability, you can implement a tag management strategy that aligns effectively with your data governance objectives.