Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, we can leverage the power of data and AI to transform organizations and create new opportunities from data.
For more Microsoft Fabric topics 👉Microsoft Fabric | element61
What is a workspace in Microsoft Fabric?
Workspaces have always been a central part of Power BI. It's where you can share your reports and make them available for consumption to end-users via a Power BI App. With Microsoft Fabric, you still can define a workspace and it offers support for a variety of data workloads.
How does Microsoft Fabric impact our current Power BI architecture of workspaces?
In brief, there is not much to it. However, if we dig deeper, there is a lot more to consider. 🤓 If you want to keep your Power BI infrastructure as it is, you don't need to do anything. However, if you're curious about the possibilities of Microsoft Fabric, keep reading.
The change that impacts your workspace management most is that data engineers primarily ingested and transformed data in Azure/Databricks and business users or citizen developers crunched data in Power BI. Now, everyone will have to work in the same environment and workspaces, and we need to define how everyone will be able to work together without impacting each other's work.
Introducing data Domains in Microsoft Fabric
Merging all data workloads into one environment means increasing users and artifacts, requiring better governance and access management. Therefore, a layer must be added between workspaces and tenant-level management. Part of the solution that Microsoft came up with is Domains, which allow us to organize and manage our data by grouping multiple workspaces.
The statement at the origin of the creation of data Domains is that "organizations are shifting from traditional IT-centric data architectures, where the data is governed and managed centrally, to more federated models organized according to business needs. Domains - Microsoft Fabric
Microsoft recommends organizing data using separate Domains for each department (e.g., Sales, Finance, IT) in a Data Mesh architecture, where departments have ownership and responsibility of their data in a decentralized manner. All workspaces related to a specific department can be grouped under the same Domain.
The above visual summarizes how the data Domains come into the picture of the Microsoft Fabric infrastructure by filling the gap between Workspaces and Tenant.
Currently, in preview, this feature allows us to regroup the pertinent data relevant to us in the Onelake data hub.
With domains, a user can then quickly access all the data he has received access to from the domain admin of his department and then filter on the artifacts of his choice.
To create a domain, you must be a Fabric Admin (previously Power BI admin) and an admin of the workspaces you want to assign.
It is relatively simple to create a Domain. In the admin portal, you can go to the Domain settings.
From there, you can assign a name, description, and image. You will notice that you can also set 2 different new roles associated with domains: domain admins and domain contributors.
Domain admins can change everything you see on the screenshot above, except adding or removing other Domain admins.
On the other hand, domain contributors can only change the domain association of the workspace they are an admin of.
Now that we have covered a domain's properties and possible roles let's see how we can associate a workspace to a domain.
One workspace can only be assigned to one Domain, but switching is easy. Microsoft offers us three possibilities to do so.
- First, the most straightforward is via workspace name. We could imagine multiple workspaces having the name of their department. This way, we can assign various workspaces at once. (PS: There is a slight delay between when you create a workspace and when it is available in the possible workspaces to assign.)
- The second possibility is via workspace admin. We could imagine this scenario in an enterprise already working in a data mesh structure, where each department already had a "Power BI champion," a dedicated workspace admin.
- The third choice is via capacity. More prominent companies may have separate capacities for each department; this way, it's easy to assign all workspaces on the same capacity to a single domain.
In summary, to create domains and associate workspaces with domains, you need to have the role of Fabric Admin. Additionally, you should also be a workspace admin to perform the association. Of course, being a domain admin/contributor is also necessary for managing the specific Domain.
Guidelines and reflections on workspace management?
Now that we have discussed the primary tools available for workspace management let's explore potential scenarios and essential considerations to remember when devising your workspace management strategy.
Be aware that this information is based on what is currently available as of the date of writing. Microsoft Fabric is still in preview, and new features are regularly added.
Of course, everything that will be mentioned will highly depend on your internal scenario and infrastructure.
Power BI developers who were used to their clean workspace will suddenly see many different items popping up here and there due to the development of the new Microsoft Fabric ETL.
Unfortunately, it's not (yet?) possible to handle the access at the item level inside a workspace, so if a Power BI report developer is a member/admin of a workspace, he will be able to modify data pipelines, delete lakehouses, etc... It's evident that the choice of combining is precarious.
Since the July 2023 update of Fabric was released in early August, managing permissions at the item level is possible. Still, this feature will not impact the setup described hereunder and the current possibilities.
For the sake of simplicity, we will make the (reasonable) assumption that our dataset developers will also be the data engineers.
We first recommend dividing your workspaces between data engineering and data visualization for the above reasons.
Second, consider possibilities for lifecycle management of our data projects in Microsoft Fabric.
We have two possibilities for this. One that has been available for some time now in Power BI is deployment pipelines.
The second one was introduced simultaneously as Microsoft Fabric and is a game changer for Power BI developers' GIT integration.
In a CI/CD context, we will promote GIT integration for Continuous Integration and allow collaboration of multiple developers on the same file and deployment pipelines for Continuous Deployment, as the latest is easier to use and offers an excellent user experience.
However, it would be best if you were careful, as an overlap of functionalities exists between both tools. It is essential that the deployment strategy of your applications is well-defined within your team to avoid pushing unintended branches in production via GIT and overwriting what has been pushed via the deployment pipeline.
Since deployment pipelines allow you to automate your transition between a development, test, and production environment, and each environment is linked to a single workspace, it is straightforward that you will have three new workspaces, one for each environment.
An important note is that Fabric Items (Lakehouses, Warehouses, etc.) are not (yet) supported in either GIT integration or deployment pipelines.
This setup would also allow us to enjoy the architectural possibilities of new Fabric Capacities. Concretely, if we don't want our engineering ETL to disturb our report consumption but also want our development not to impact our production setup, we could imagine a scenario where we would have three different capacities: one for our development, one for our production ETL and one dedicated to the reporting and end-user consumption. The size of the capacities will, of course, depend on your needs, but don't forget that to enjoy the previous Premium per Capacity features (such as free report consumption for reports located in a workspace inside a Capacity), you need at least an F64 capacity (equivalent to a P1).
Of course, this would increase the administrative burden of managing three different capacities. This could be avoided if we could assign a percentage of a capacity to specific workloads (Thumbs up to this Microsoft Idea 😉 Microsoft Idea).
We currently have 1 data domain with two deployment pipelines and six different workspaces.
Last (but certainly not least!), let's see how the medallion architecture comes into the picture. The goal of this insight is not to dig into the medallion architecture and why it is recommended to build your lakehouse but to try to understand its impact on the different workspaces. We will focus on integrating the primary layers, the Bronze, Silver, and Gold.
Let's first try to understand the different possibilities before choosing one.
1. We could have only one Lakehouse where we create subfolders for our layers.
This option is the simplest regarding infrastructure, as we will have only to create one Lakehouse. However, you want to allow people to analyze data on the Lakehouse. In that case, it will be more problematic as it is not possible to create subfolders under the Tables part of the Lakehouse, so all your tables will be grouped, and the only way to differentiate them would be in their naming convention which can become very complex to use quickly with the increase in the number of tables.
2. The second option would be to create one Lakehouse per layer.
This solution is the most versatile as it allows us to separate our layers, making it easier to analyze data, but we are still working on the same workspace. It will also be easier to manage access on each Lakehouse, as we probably don't want the analysts to access our Bronze layer.
3. Finally, a more drastic option would be to create one workspace per layer.
This solution has a principal added value: split your compute engine across the different layers and have various capacities linked to each. This could be useful for massive companies using Direct Lake on the gold layer on their Power BI reports and which don't want ETL or data movements across layers to impact the compute resources of their gold layer used for reporting by end consumers.
What is the conclusion?
In conclusion, the concept of workspaces remains central within Microsoft Fabric, enabling collaboration and data sharing. The introduction of Fabric Domains as a solution to manage data within Fabric emphasizes a decentralized approach, aligning with modern data governance paradigms.
Throughout this insight, we have added, one by one, three layers of complexity to potential infrastructure.
First, we recommended a split between the engineering and visualization teams to avoid security and access issues.
Then, we introduced lifecycle management in the picture by promoting deployment pipelines.
Lastly, we thought about potential scenarios for introducing medallion architecture in Microsoft Fabric.
We have seen that all scenarios have pros and cons, leading to a tradeoff between development/governance burden and potential data leakage/security risk.
However, the diagram below can be considered a baseline for workspace management with Microsoft Fabric.