MQTT: Enable OT analytics in the Modern Data Platform in Azure

TL;DR

Getting OT data into your Modern Data Platform allows you to get the most value out of this data by enabling advanced analytics and many use cases for optimizing your processes.

MQTT is the preferred option to serve as a communication backbone in your OT architecture and to get this data into the cloud.

Several Azure resources handle ingestion, forwarding and processing of messages like the ones passed over MQTT, each with their specific functionalities and use cases. Depending on the setup, MQTT data can be pushed into the cloud using these technologies.


Why do you want Operational Technology data analytics

Production companies today have extensive Operation Technology (OT) architectures to streamline and control their production processes: equipment, PLCs and IoT devices that generate telemetry data and provide connectivity, SCADA systems to monitor and control real-time industrial processes, MES systems for quality management, scheduling, tracking, ..., and historians for archiving, reporting and analyzing OT data.

The data that flows through these systems is essential to keep the production process running and contains a lot of value. Unfortunately, in OT setups, data is only used operationally and is retained for a limited period of time. A lot of value, that can be extracted using advanced analytics, is lost.

That's why we propose to get the most out of your data by integrating OT-centric data into the IT department. This enables a myriad of use cases, like allowing data analysts to include production floor data in executive-level reports about long-term trends in your production facility and allowing data scientists to use OT data to create machine learning models that will optimize production or maintenance. The integration between OT and IT happens by putting the data into a Modern Data Platform in Azure.


Why MQTT is a great choice in the OT Architecture

The ideal communications backbone to connect all OT systems and devices is MQTT. This publish/subscribe-based protocol is ideal for connecting remote devices in Industrial-IoT (IIoT) settings because of its scalability, robustness and extremely light footprint.

The smart devices and servers that constitute the OT infrastructure serve as MQTT clients, producing data in the form of messages and publishing these messages to their respective topics, or subscribing to topics to receive messages relevant for the application at hand. All messages pass through the broker, which orchestrates the flow of messages and makes sure a published message arrives at all the subscribers.

An important feature and benefit of MQTT is that it easily allows connection to the cloud, which is not the case when using for example OPC UA, because MQTT only requires the cloud applications to communicate with the broker, whereas OPC UA requires multiple connections and implementations.

The remaining part of this article describes the three main Azure resources that handle messages such as the ones passed around over MQTT, and specifies how and when to use them to ingest your OT data into the cloud, ready to get the most out of it with advanced analytics!

Image
MQTT publish subscribe

Azure services that handle messages

MQTT is built around messages, so let's have a look at Azure services that handle messages and compare their functionality.
Azure provides different services that handle messages, but each implements different functionalities. It is important to understand what each of these services do and when to use them.

Azure Event Hubs

Azure Event Hubs is a data streaming service designed to stream millions of messages per second with low latency. It is compatible with Apache Kafka: Event Hubs (or queues) are equivalent to Kafka topics. Messages get published to these queues and are retained for a limited period of time in which they are either forwarded to one the configured endpoints or consumed by connecting applications.

An Event Hub can be configured to capture the data it receives into a storage account, by creating a blob on a dynamic file path every so many minutes containing all messages received in the corresponding time interval. The messages can also be forwarded to endpoints such as Azure Stream Analytics, Azure Data Explorer, Cosmos DB and others (read more here). Therefore, in the context of a Modern Data Platform in Azure, Event Hubs should be seen as a data ingestion tool.

Azure IoT Hub

Azure IoT Hub is a central message hub for communication between IoT applications and their attached devices. It is essentially an extension of Event Hubs, implementing an Event Hub under the hood, but providing many additional features that focus on the direct cloud-to-device connection.

These features include, among others:

  • Cloud-to-device messages
  • Device twin and device management
  • File uploads and software updates of edge devices

IoT Hub can, like Event Hub, be used for telemetry ingestion, but especially for data generated by the individual connected devices. It is useful over Event Hub when the devices themselves are also managed using IoT Hub and benefit from two-way communication. For simple, one-way data ingestion at scale, Event Hubs is the better choice. Certainly when keeping in mind that communication from the cloud to external applications or devices is not impossible with Event Hubs - it just follows a different approach. With IoT Hub, messages can be pushed directly to connected devices, while Event Hub can be used to publish messages from within the cloud which are then consumed outside of the cloud, thereby implementing passive two-way communication.

Azure Event Grid

Azure Event Grid is a fully managed Pub/Sub message distribution service that offers flexible message consumption patterns using both MQTT and HTTP.

Being a message distribution service, it is not a data ingestion service, making the difference in use cases with Event Hubs obvious: when all messages need to be ingested into the data lake or into Azure Data Explorer, use Event Hubs. When different applications like Azure Functions, web apps or webhooks need to receive specific messages and not others, use Event Grid.

In the context of MQTT, Azure Event Grid implements the broker. But when we look at integrating OT data into an Azure Modern Data Platform for reporting and advances analytics, it is safe to assume that an MQTT broker is already present in the OT architecture and that an ingestion service (like Event Hub or IoT Hub) is the component we need.

Integration with MQTT

If you have some smart devices that run custom code, need input or updates and generate data, connect them to Azure directly using IoT Hub, and use MQTT as a reliable, lightweight communication protocol. This way, telemetry ingestion and device management are neatly combined in the IoT Hub.
On the other hand, if you want to ingest data from a mature OT setup of a production facility, with servers or historians that accommodate the data already,  use Event Hubs as the gateway to the Modern Data Platform, and an MQTT broker to orchestrate the data flow. Unfortunately, Azure Event Hubs does not support MQTT, so a detour is required, depending on you MQTT broker technology.

Kafka translation

Some MQTT broker software vendors, like HiveMQ, support translating MQTT topics to Kafka topics (potentially by using an extension), thereby allowing direct ingestion from the broker to Azure Event Hubs. Messages published on Event Hub queues can also be translated back into the MQTT circuit, making this translation transparent for MQTT clients.

This is the preferred approach when possible because it allows for clean and straightforward ingestion into the main streaming-data ingestion tool of Azure.

Azure Event Grid

If your MQTT broker does not support Kafka, you can put an Azure Event Grid in between the MQTT broker and the Event Hub. Route messages to the Event Grid over MQTT and subscribe your Event Hub queue to topics on the Event Grid. Subscribing Event Hubs to Event Grid is a very straightforward process, well supported by the Event Grid.

Image
MQTT to Azure

Conclusion

MQTT is a great choice for communication in the OT architecture, allowing dynamic, scalable and lightweight communication between different devices and applications running on the edge and on servers. As an additional benefit, MQTT facilitates communication to the cloud, which is essential when bringing OT data into your Modern Data Platform.

You definitely want your OT data in your data platform, because there it can be used for advanced analytics, allowing you to get the most value out of this data and enabling many use cases to optimize your processes. Depending on the technologies, different resources can be used to implement the ingestion of your real-time OT data over MQTT