Secure your data in Microsoft Fabric! 🔐

Introduction

Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, you can leverage the power of data and AI to transform your organization and create new opportunities.
For more Microsoft Fabric topics 👉 Microsoft Fabric | element61

How can I secure my data?

Microsoft recently announced the availability of Row Level Security (RLS) and Column Level Security (CLS) for Fabric Warehouse and SQL Endpoint in public preview in all regions.

RLS and CLS are both security mechanisms that increase the confidentiality and privacy of your data. They restrict access to your data by making the rows (for RLS) or columns (for CLS) unavailable depending on the user requesting them. It allows users to see only the data they are supposed to see. Some use cases include:

  • They allow employees from a business unit (Finance, Human Resources, Sales, IT…) only to see data belonging to this business unit (and not the others).
  • They allow employees from a specific country only to see data belonging to this country (and not the others).
  • They allow employees to see data based on their roles in the company (e.g., a manager can access all the business unit data while an employee is limited to their workstream).

As with all security topics, RLS and CLS require some in-depth thinking to be implemented correctly. Please refer to the Microsoft documentation for more information on best practices, side-channel attacks, and cross-functionality.

What is the objective of this insight?🎯

This insight aims to test both functionalities, RLS and CLS, in the Fabric Warehouse. We want to assess how easy it is to implement those security mechanisms, how well they work, and their impact on performance.

At the same time, we wanted to test this approach using the Direct Lake capability of Power BI as it is a huge step forward: there is no need for multi-layer security.

What methodology did we follow to test the data security?

To ensure optimal data security and privacy in the Fabric Warehouse, a comprehensive insight will be performed in four steps. The first step involves implementing RLS (Row Level Security) in the Fabric Warehouse. The second step will test the correct application of RLS in Power BI. The third step involves the implementation of CLS (Column Level Security) in the Fabric Warehouse. Finally, the fourth step will test the correct application of CLS in Power BI.

To carry out this insight, we will use the taxi dataset provided by Microsoft. This dataset is approximately 2GB in size and will provide sufficient data for detecting any performance impact caused by RLS or CLS. 

We will execute SQL statements directly in a “New SQL query” Sheet in Fabric Warehouse to implement the RLS and CLS.

How do you implement RLS in Fabric Warehouse?

First, we create the security function to implement the logic of the RLS:
CREATE FUNCTION Security.tvf_securitypredicate_2(@PaymentType AS nvarchar(50))
    RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN SELECT 1 AS tvf_securitypredicate_result
WHERE @PaymentType = 'CRD';
GO

 
Second, we implement the security policy.
CREATE SECURITY POLICY TripFilter
ADD FILTER PREDICATE Security.tvf_securitypredicate_2(PaymentType)
ON dbo.Trip
WITH (STATE = ON);
GO

 
We can turn ON and OFF the policy with the below statements.
ALTER SECURITY POLICY TripFilter
WITH (STATE = ON);
 
ALTER SECURITY POLICY TripFilter
WITH (STATE = OFF);

Testing the correct application of RLS in Power BI

To begin, we launched Power BI and loaded the Fabric Warehouse using Direct Lake mode. After that, we created a table with a filter on the [PaymentType] column to make sure that only the 'CRD' categories are shown. 

The functionality was perfect and worked flawlessly as intended. Finally, we assessed the performance of the visual in Power BI by navigating to "Optimize" -> "Performance Analyzer" in the Power BI ribbon.


Performance Analyzer with RLS applied.


Performance Analyzer without RLS.

The impact on performance on both the table and the slicer is negligible, even when hitting the cache (multiple refresh of the visuals).

It is worth noting that, for both the RLS and the CLS, when applied to the Fabric Warehouse, a Direct Lake semantic model will always fall back to a Direct Query as it is the only way for the model to know the security applied at the Warehouse level. Therefore, there might be an impact in performance with bigger datasets due to the switch to Direct Query.

Direct Lake uses Single-Sign-On by default, which means the user already has access to the underlying data, so setting RLS or CLS becomes pointless.

An alternative is to use the fixed identity when using the semantic model to be able to apply the RLS or CLS. That way, it will be using the Direct Lake mode, allowing better performance.

How do you implement CLS in Fabric Warehouse?

First, we deny access to the table to the user:
DENY SELECT ON OBJECT::dbo.Trip TO [gary.goldman@element61.be]
 
Second, we grant access only to specific columns (here, we are excluding the [TotalAmount] column; that column does not appear in the list of columns below):
GRANT SELECT ON dbo.Trip(
[DateID],
[MedallionID],
[HackneyLicenseID],
[PickupTimeID],
[DropoffTimeID],
[PickupGeographyID],
[DropoffGeographyID],
[PickupLatitude],
[PickupLongitude],
[PickupLatLong],
[DropoffLatitude],
[DropoffLongitude],
[DropoffLatLong],
[PassengerCount],
[TripDurationSeconds],
[TripDistanceMiles],
[PaymentType],
[FareAmount],
[SurchargeAmount],
[TaxAmount],
[TipAmount],
[TollsAmount]
) TO [gary.goldman@element61.be];  


Testing the correct application of CLS in Power BI

When testing the CLS in Power BI, however, we did not manage to access the data for the whole table.
Could not load the [TotalAmount] column as per the CLS.

  • Unexpected behavior: we selected solely another column of the table where the CLS is applied and could not retrieve any values. ⚠️ We will discuss the unexpected behavior in the "Why did we experience the unexpected behaviour?" chapter below. 

  • To ensure that the CLS was applied correctly, we directly queried the Fabric Warehouse. The CLS was indeed applied accurately to the Fabric Warehouse, and it worked as intended when we queried (1) the protected column ([TotalAmount]) and (2) the whole table without that protected column.SQL Query in Fabric Warehouse restricting access to the [TotalAmount] column as per the CLS.

SQL Query in Fabric Warehouse returning values for all the other columns of that table.

Why did we experience the unexpected behaviour?

After discovering an unexpected behavior on the CLS, which prevented us from obtaining data correctly from a table where a CLS was applied in Power BI Direct Lake, we reached out to Microsoft for assistance.
Their response was that CLS was functioning as intended. However, due to the fact that CLS secures data and metadata at the data source level, it does not cut across workload as cleanly as RLS. In some cases, metadata queries on DirectQuery may fail even when the referred column isn't explicitly referred to by the visual. Microsoft suggested that a better solution would be to use Dynamic Data Masking for Fabric Warehouse. This method is complementary to RLS and CLS, but does not replace them.
The idea behind Dynamic Data Masking is to replace part or all of the data values with a specified pattern (the "Mask") for unprivileged users. By doing so, the information appears redacted, limiting sensitive data exposure.

1. Implementing Dynamic Data Masking in Fabric Warehouse.
First, if needed, we deny the UNMASK capability to the user:
DENY UNMASK TO [gary.goldman@element61.be];

We execute the following query in Fabric Warehouse to apply the data masking to the [City] column in the dbo.Geography table to replace the last part of the string with ‘XXXX’:
ALTER TABLE [dbo].[Geography]
ALTER COLUMN [City] ADD MASKED WITH (FUNCTION = 'partial(5,"XXXX",0)'); 

 

2. Testing that the Dynamic Data Masking applies correctly in Power BI.
As we can see below, the column [City] has its last characters replaced by ‘XXXX’. The mask works correctly.

3. Testing the Row Level Security in combination with the Dynamic Data Masking in Power BI.
To be thorough, we tried the Row Level Security mechanism with Dynamic Data Masking. To do so, we applied the RLS using the same table and the same column as the one used for testing the Dynamic Data Masking above (i.e., the [City] column on the [dbo].[Geography] table). We created the security function as “WHERE @City = 'Aberdeen';”.
As shown below, both mechanisms work as expected, even when used together. The column [City] is filtered only to show ‘Aberdeen’, but also applies the mask as it replaced the last characters with ‘XXXX’.

You can find more information about implementing Dynamic Data Masking in the Microsoft documentation, and more specifically for Fabric Warehouse

Like the RLS and CLS, the Direct Lake semantic model falls back to Direct Query when using Dynamic Data Masking.

👉 Conclusion

In conclusion, Fabric Warehouse helps you secure your data by using two essential mechanisms: Row Level Security and Column Level Security. While Column Level Security may break your Power BI dashboard for some users (as some metadata queries can fail), Row Level Security and Dynamic Data Masking work well on Fabric. All three methods trigger a fallback to Direct Query when applied at the Fabric Warehouse level. Thus, three options are offered to apply them:

  1. Applying the security methods on the Fabric Warehouse and acknowledge the fallback to Direct Query mode, with a possible performance impact.
  2. Using Direct Lake semantic models with fixed identity, keeping the performance of Direct Lake.
  3. Using Import mode and applying the security methods on two levels: the Fabric Warehouse and the Power BI.

We look forward to OneSecurity for Fabric (without timeline update at the time of writing). As per the Microsoft Fabric Launch Digital Event, OneSecurity is “a shared universal security model defined in OneLake alongside the data providing the necessary security capabilities so that data does not need to be copied into another place to be secured. More granular security can be defined once in OneLake. This includes column level security, row level security, data masking, etc.” This technology should streamline the security process by setting security policies at only one common layer (OneLake) and applying them everywhere in Fabric. We hope it will also unlock the Direct Lake performance improvements without using fixed identity.