How does Microsoft Fabric Empower Data Scientists?

Microsoft Fabric is the new Microsoft branded analytics platform that integrates data warehousing, data integration and orchestration, data engineering, data science, real-time analytics, and business intelligence into a single product. This new one-stop shop for analytics brings a set of innovative features and capabilities by providing a transformative tool for data professionals and business users. With Microsoft Fabric, we can leverage the power of data and AI to transform organizations and create new opportunities from data.
For more Microsoft Fabric topics 👉 Microsoft Fabric | element61

Microsoft Fabric offers Data Science experiences to empower users to complete end-to-end data science workflows for the purpose of data enrichment and business insights. You can complete a wide range of activities across the entire data science process, all the way from data exploration, preparation and cleansing to experimentation, modelling and model scoring and serving of predictive insights to BI reports.

Let's start with the basics. What can you do?

For Data Scientists, Microsoft Fabric presents a versatile toolkit for conducting diverse tasks in data analysis and machine learning:

versatile toolkit

With Fabric, you can write and execute Python, Spark, SQL, or R code to manipulate and analyze data from different sources using notebooks. To enhance your data science capabilities, you can also incorporate external libraries like numpy, scipy, scikit-learn, tensorflow, and more... Additionally, you can schedule your notebooks based on time intervals and collaborate seamlessly with colleagues within the same notebook environment. Fabric also enables citizen data scientists to manipulate pandas data frames through the data wrangler.

data wrangler

Leverage OneLake as a unified data store accessible to all developers, including Data Scientists. No longer rely solely on Data Engineers for data access. Python packages can be harnessed in notebooks to train machine-learning models. MLflow is integrated into the experiments section to track models, allowing for easy comparison of evaluation metrics. Model management can be customized through the user interface (UI) and APIs.

Sounds great 🙌, but what features still need to be added?

Fabric is still in its early stages and has certain limitations and areas that require improvement, such as:

  • Python libraries can be managed either at the workspace level or within the notebooks. It would be ideal to have the option to create virtual environments for each project in Fabric Data Science, which would provide clear visibility into which project, notebook, or model uses specific library versions.
  • The lack of a version control system for data science notebooks makes it impossible to track changes and history in code and data..
  • Some Azure ML capabilities have been adopted, but several key features such as ML pipelines, AutoML, and ML endpoints are missing.

Although Fabric is currently in public preview, substantial development is still required before advanced Data Science teams can fully adopt it for their data-related activities. However, since the product is still in public preview, it is likely that these features will be added soon. Some of them are already announced:

  • Semantic Link: Enabling Data Scientists to access the semantic data model using Python or Spark.
  • Hyperparameter tuning & AutoML: Automating the training of your ML models.
  • Pre-trained AI models: Including cognitive services such as Text Analytics, Anomaly Detection, Text Translator, and others. An improvement to the API calls, but no game changer.
  • Copilot in notebooks: Accelerating Data Scientists with code development, debugging, analyzing data, etc thanks to the power of Copilots integrated generative AI feature.

If so much is missing, should we ignore it?

Whether Fabric Data Science is the right choice for you depends on your specific situation and needs. If you are already using an Analytics Platform to meet your Data Science needs, you may not find Fabric to be of significant additional value. Azure offers more advanced and comprehensive platforms like Databricks and Azure ML. However, Fabric is an excellent option for smaller teams that have not yet explored Machine Learning or other Data Science projects, thanks to its out-of-the-box simplicity. If you are an inexperienced Data Scientist, Fabric allows you to focus on loading, transforming, and experimenting with data using Python, SQL, or R. Once complete, the results can be seamlessly visualized using Power BI, making the process incredibly user-friendly 🙂

In conclusion

Although Fabric Data Science is still in its early stages, avoiding premature conclusions is crucial. While the tool doesn't yet have all the necessary features for a strong Development & Production setup, it does lower the barriers for teams and companies eager to embark on their Data Science journey. Regardless, we look forward to seeing what the future holds!