Getting started on advanced analytics doesnt need to be complex nor require a heavy investment. With the launch of Microsoft Azure Machine Learning Studio, Microsoft provides users with a tool to do advanced analytics while abstracting for users the hassle of technology and infrastructure.
Microsoft Azure ML Studio is part of the Microsoft Cortana Intelligence Suite which can be seen as a bucket of software modules designed to give you an end-to-end Microsoft solution to tackle any Big Data and advanced analytical challenge ranging from streaming to bots to machine learning. The machine learning studio is Microsofts specific solution to do advanced analytics. In summary, its a cloud tool which empowers business analysts or data scientist to:
- Get started with unstructured data such as pictures and social media feeds or streaming data sources; naturally combining this with
traditional BI data
- Build machine learning models using (1) out-of-the-box algorithms, (2) personal code-snippets in python or R and/or (3) community models and code-snippets
- Deploy models as an easy-accessible web application accessible in excel or as a solution API
In principle, Azure ML consists of three main parts:
- Projects and Experiments
This is where the actual analytical work takes place: the analyst can merge all relevant data, apply transformations where needed and run any type of modelling he or she needs: predictive, clustering, classification or anomaly models. All of this is done codeless through an intuitive visual interface where the analyst just drag-and-drops those modules needed.
Microsoft integrated the open-source iPython Jupyter notebooks within its Azure ML environment to give the analyst the familiarity and flexibility to code (in R or Python) where needed. Embedding it within Azure ML empowers the analyst to make cloud machine learning its sole data science environment.
- Web services
Azure ML allows the data scientist or analyst to bring its solution live in minutes through a general web service. Before this existed, any data scientist would have needed the support of an IT engineer to do this likely delaying the deployment for several weeks.
Figure 1 - Main menu in Microsoft Azure Machine Learning Studio
click to enlarge
Building a machine learning model
The key feature of Microsofts Azure ML Studio is the ease of use of its drag-and-drop modules. As a user starts a new project and related experiment, he or she is offered an empty drag-and-drop template as well as a broad menu of available datasets, data in- and outputs, transformations, models and assessment functions. The menu is sorted top-down in line with the basics steps to build advanced analytical solution.
- Step 1: Get data
Azure ML allows the user to either use
- (1) available sample-datasets,
- (2) import data manually or
- (3) link the experiment with a proper data storage such as any Azure storage service (e.g., SQL, Blob Storage), a big data database (HDFS accessed through Hive), a web URL or a traditional on-premises server (i.e. your SQL Server).
To avoid the model to query repeatably (and thus distress the storage systems)
the tool allows to select Use cached results; this will allow the tool to use the in-memory data while the user keeps building and rerunning the model.
Figure 2 - Data import connectors allowed in Azure ML
- Step 2: Preprocess data
Figure 3 - Example of a data transformation
click to enlarge
- Step 3: Modeling initiation and training
Any data science mission starts from a business objective. Generally, this business objective can be classified under one of the 5 machine-learning application groups:
- Regression modeling
The analyst aims to predict a numeric value and/or understand the drivers of a specific variable. This is one of the most popular application used in machine learning as this is where most traditional analytic tools fail to deliver. Traditional use-cases are the forecasting of revenue in retail, the prediction of stock-, house- or retail-prices, the forecasting of energy demand, etc. On Azure ML, we find example experiments towards each of these use cases giving us a head-start in our own proper development.
Within Azure ML, analysts can leverage the full offer of regression techniques ranging from linear regressions up to neural networks. Each of the models have a series of parameters through which the analyst can finetune such as regularization, learning rate, etc. For the most advanced users, Azure ML allows the
of neural networks through a Net# script; additionally, the analyst can always leverage custom R or Python code to bring his own model.
When the analysts tries to bucket observations in categorized outputs, the analysts is tackling a classification challenge. This application domain includes known use-cases such as predicting customer churn, determining machine predictive maintenance, predicting credit default, forecasting flight delay, sentiment analysis, etc. (example experiments are available on Azure ML). The business objective of these algorithms is to empower the business with the predicted classification: i.e. will the customer churn or not, will the loan default or not, is the tweet positive, negative of neutral, etc.
With clustering, the analyst tries to leverage math to create the best differentiating groups of similar customers, products, etc. The algorithm common used is K-Means clustering. Once processed within Azure ML Studio, a clustering can feed the BI data-warehouse with a segmented customer base and clear target groups for marketing.
- Anomaly detection
With the boost in financial technology and the availability of IoT real-time data streams, the field of anomaly detection is growing significantly. Anomaly detection is aimed at scanning datasets for outliners (i.e. anomalies) which are worth further investigating: e.g., fraudulent transactions or machine-logged errors. Azure ML Studio allows to use two generally-used techniques: One-class Support-vector machines (SVM) and Principal Component Analysis (PCA).
A well-known application of advanced analytics are recommendation engines which we see on our Netflix, our supermarket-flyers and our retail digital newsletters. Azure ML Studio allows any organization to easily get started on recommendation engines by offering it the algorithm as well as the related instant web service (more details below).
Recommender algorithms come in two types:
- collaborative filtering i.e., recommenders where we aim to connect users with similar behavior and preferences and
- content filtering i.e., an approach where we connect products based on the characteristics of a product. To give analysts a head-start and overcome the cold-start problem, Azure ML Studio instantly offers the Matchbox recommender algorithm which is a hybrid recommender using both techniques: i.e. for new users, content-filtering will be used; for users on which ratings or orders are known a more collaborative-filtering approach will work and thus be applied.
Figure 4 - Example of building, training and scoring a model
click to enlarge
- Step 4: Model scoring and evaluation
Every model needs to be thoroughly evaluated to guarantee its accuracy but also its usability on unknown datasets. Within Azure ML Studio, this evaluation process is hosted through a set of Scoring and Evaluate modules found in the respective buckets. After scoring, the evaluate module will return the full set of needed evaluation parameters depending on the model type used: e.g., Accuracy, precision, recall and AUC when tackling classification but MAE, RSE and R when tackling regression models. This feature significantly simplifies the model evaluation process.
- Step 5: Concluding on the best model
After an iterative process of model building, scoring and evaluation the analyst can conclude on the best-fitted model which can be a single model or a hybrid model (using custom R code or a data transformation). The analyst can run and save the model but also set-up a web-service around the model (cfr. below) or even publish it to be shared with the community (i.e. the Cortana Intelligence gallery)Figure 6 - Azure ML Studio Menu
Custom code snippets in R or Python
Azure MLs ease-of-use is great for analysts and starting data scientists. However, to make Azure ML a preferred environment for expert data scientist, Microsoft has allowed for smooth integration of custom code snippets in R or python within the experiments.
Through this custom modules, users can execute their own user-defined operations, models, visualizations and even data imports. As R and python are the worlds-leading programming languages for statistical computing and visualization, this integration creates a big opportunity for users to similarly leverage all available community libraries and research projects publicly available.
Figure 7 - How to integrate R as custom code snippet
click to enlarge
A great use of custom modules is to enrich the modeling process with extra graphics such as correlation matrices or evaluation graphs.
Notebook for R and Python 2 and 3
Another important part of Microsofts Azure ML Studio is the notebooks. Notebooks are interactive code-environments which allow the user to combine written computer code (e.g. python or r) with rich text elements such as titles, description paragraphs, figures, links, etc. These notebooks allow analysts to enter some script code (R or python) and get a prompt response.
Currently, notebooks should be seen as a standalone R and Python environment accessible within the same working environment as Azure ML Studio. This means that currently its not straightforward to deploy notebook code easily in ML Studio nor reverse. That being said, clear benefits are already there as one has the ability to stay within one interface, work on the same (intermediate) datasets and copy code back and forth.
On their blog, Microsoft team highlights that roadmap forward definitely focuses deeper Azure ML Studio integration.
The notebooks are available in Python 2 and 3 as well as in R 3.3.1.
A final feature of the Azure ML Studio is the ability to deploy your model instantly as a web-service. With this feature, Microsoft supports the data scientist to put things in production quick and robust without the need for any IT engineering support.
Two web-services are available:
- Retraining Web-service
This covers the ability to create a tool to retrain the model on a new training set without having to open Azure ML
- Predictive Web-service
This web-service is likely the most powerful as it gives the ability to instantly create a tool to score a set of unseen data with the model build and e.g. get a prediction of classification on an unknown dataset.
A click on Set Up Web Service will automatically trigger three actions:
- The model will be saved
- The module will be trimmed to remove modules that were only needed for training
- Finally, it will import the web service modules and define where the web service will give input & request output. The resulting experiment will look something like shown belowFigure 8 - Web-service deployed experiment
click to enlarge
Once ready, the user still needs to deploy the web service by first rerunning the experiment and then clicking on Deploy web service. At the stage, the user is prompted with a configuration canvas where one can find the API key as well as a direct testing interface towards Excel. With this API key, the model can directly be applied in apps and/or called from an external system.
Integration with other Microsoft Products
Azure ML Studio is truly integrated - read & writing - with other Microsoft products of which foremost the Microsoft Azure stack. This allows organizations to at any point in time enrich ML Studio with new data sources. Examples could include streaming data coming from Azure Event Hubs, data stored in Azure Blob Storage or data stored in on-premises SQL Servers. Through this integration, ML Studio can serve as an integrated analytical module where data is send to, analytics computed and results returned.
Practical examples of end-to-end integrations are showcased in the Cortanas Solution templates
Azure ML Studio comes with an extensive free testing environment. The standard paying tier is offered at $9.99 per Seat per month + $1 per experimentation hour. Within the standard offering, there is unlimited storage space for datasets, no limit on the number of modules per experiment as well as a production web API and a service level agreement.
Microsofts Azure Machine Learning Studio is a comprehensive but ease-to-use tool for both entry- and expert data scientists. Its simplicity and connectivity i.e., the integration of open-source communities of R and python make it a robust and practical tool which has the ability to empower a broad audience of users to get hands-on on advanced analytics.