Microsoft Azure Event Hubs

Microsoft Azure Event Hubs

What is Azure Event Hubs?

Azure Event Hubs is a distributed stream processing platform and event ingestion service managed by Microsoft. Event Hubs gives organizations a fully-managed solution to receive and process millions of events per second (without concerning about infrastructure and SLA).

Azure Event Hubs is truly distinctive vs. Apache Kafka as it has a seamless integration with other services in Azure.

How to use Azure Event Hubs?

Microsoft Azure Event Hubs

click to enlargeMicrosoft Azure Event Hubs

Event hubs is a highly scalable telemetry service offering one-way communication with the HTTP/AMQP protocol. You can send events from anywhere: a website, an app, an IoT device, a software, etc. Azure Event Hubs is distinct from Azure IoT Hub as communication is one way and not two ways.

When to use Azure Event Hubs?

Event Hubs is the component to use for real-time and/or streaming data use cases:

  • Real-time reporting
  • Capture streaming data into files for further processing and analysis – e.g. capturing data from micro-service applications or a mobile app
  • Make data available to stream-processing and analytics services – e.g. when scoring an AI algorithm
  • Telemetry streaming & processing
  • Application logging

What is the difference between Azure Event Hub, IoT Hub, Event Grid, Service Bus, Azure Storage Queues?

IoT Hub can have the same functionality as Event Hubs but also can handle a bi-directional communication. IoT Hub allows for sending data back to the IoT devices which is not possible with Event Hubs, e.g. send software updates to the sensors.

Event Grid facilitates event-driven reactive programming, meaning that you can react to changes happening in real-time. It’s designed to process events and not really data and take an action on a specific event, e.g. trigger an Azure Function. Event Grid does not guarantee an order of the events, because each event is handled independently.

Event Hub is using partitions which are ordered sequences. So, it can send and receive a huge number of events and maintain the order to be in a same partition when events are sharing a partition key value, e.g. Azure Function needs to pull and process message 1 before message 2 arrives and is processed. With regards to ordering of messages, this is also offered with Service Bus Queues and Topics. However, Event Hubs with the partitioning feature offers more scalability compared to Service Bus Queues and Topics because there the consumer tries to read from the same queue. Moreover, in Event Hubs the message is not removed as in Service Bus Queues and Topics, but stays in the partition and can be read again later by the consumer (but no later than the retention period).

Microsoft Azure Event Hubs
click to enlargeMicrosoft Azure Event Hubs

Azure Storage Queues does not guarantee ordering, so in cases where you don’t need to preserve order for the messages this will be the preferred choice. Queues in general make sure that the messages are processed and doesn’t care about the order of processing. Event Hubs will not process the next batch in a partition unless the previous is completed and it can happen that there is one bad event that is slowing down the whole process. So, in cases where the order is not important Event Hub might slow down the performance.

Important Features:

  • You can benefit from auto-scale up
  • You are guaranteed on consistency and ordering per partition
  • You can use Apache Kafka protocol to talk to Event Hubs
  • You can use Event Hubs Capture to automatically capture streaming data and save it to a storage account
  • You can have data retention up to 7 days
  • You can horizontal scale using partitions
  • You can add checkpointing to return older data and can enable failover resiliency
  • You can process streams with both .NET, Java python, Go and Node.js
  • If needed, dedicated clusters can be purchased

Alternatives (open-source or different platform): Apache Kafka, Amazon kinesis, Google Pub/Sub

What are the components of Azure Event Hubs?

  • Event producers: Anything that sends data to an event hub can count as an event producer. The events can be published using the HTTPS or AMQP 1.0 or Apache Kafka (1.0 and above) protocols. HTTP is usually used in scenarios with low volume of published events. On the other hand, AMQP can deal with higher volumes of events providing better performance and throughput.
  • Partitions: Events are streamed to a specific partition and each consumer only reads a specific subset, or partition. The number of partitions is specified on creation and must be between 2 and 32. After specifying the number of partition while creation, this number cannot be modified.
  • Partition Keys: The partition key is the value to map the incoming event data to a specific partition. In case the partition keys are not specified, event hub will just store incoming events in different partition on round-robin basis.
  • Consumer: Consumer is any application that reads event data from an Event Hub. For reading event data by the consumer only the AMQP protocol is used, because the events are pushed to the consumer from event hub via the AMQP channel, the client does not need to pull for data availability.
  • Consumer groups: Consumer group is a view of an entire event hub. The partitions cannot be accessed directly to fetch the data, but instead the data can only be accessed through a consumer group. Since each consuming application can have a different way of reading the stream and perhaps some of them want to read it multiple times, consumer groups give the possibility for the consuming entities to have a separate view of the event stream.
  • Throughput units: Units that control the throughput capacity of Azure Event Hubs and are one of the key parameters in the pricing calculations. Scaling the throughput units represents scaling of the Event Hubs traffic.
  • Event receivers: Any entity (application) that reads event data from an event hub.

Our expertise

element61 has a solid understanding on use cases where Azure Event Hubs might prove to be useful for your organization. We can help you decide if this is the best solution for you and implement it to build a big data pipeline.

Conclusion

Azure Event Hubs are a fully managed service by Microsoft for event ingestion and stream processing. It integrates well with services both from and outside Azure which allows for building a complete big data pipeline. Can send or receive millions of events to/from concurrent services and devices and can scale up automatically when traffic increases.

More information is available at the Microsoft website.

Contact us for more information on Azure Event Hubs!