Azure Cognitive Services: Vision

What is Azure Vision Cognitive Service?

VisionAzure Vision Cognitive Service is a cloud-based service that provides computer vision capabilities for analyzing images and videos. The Vision Cognitive Service unifies the previously available Cognitive Services: Computer Vision, Face, Spatial Analysis, and Video Indexer.

With this service, users can build intelligent applications using the web-based Vision Studio, REST APIs, and client libraries based on prebuilt AI models that can be customized to fit your scenario.

The Vision Cognitive Service entails four services each with specific features. The four core services can be summarized as very logical building blocks:

  • Analyze Images: This building block allows you to extract information and insights from images, such as objects, faces, text, colors, tags, and captions. You can use preconfigured models or create your own custom models to handle domain-specific scenarios.
  • Analyze Videos: This building block allows you to index and analyze videos for content, sentiment, speakers, topics, and more. You can use this feature to create rich media experiences for your users and gain insights from your video content.
  • Analyze Spaces: This building block allows you to understand people's presence and movements within physical areas in real time. You can use this feature to optimize space utilization, enhance safety and security, and create engaging customer experiences.
  • Recognize Faces: This building block allows you to detect, identify, and verify human faces in images and videos. You can use this feature to create intelligent applications that recognize and verify human identity.


Within those building blocks, the following features are – according to us – the most relevant to understand:

How can any of these features help you in your business?

Democratization of AI has been a trend in the past few years & is now stronger than ever. Complex AI & ML models are now readily available and allow any business to easily integrate those models in applications, processes, etc. As the technology is now widely available, it is not technology, but your own creativity that limits the value you will be extracting from AI in the next few years.

Computer vision is one of the most exciting fields of AI that enables machines to see and understand the world as humans do. Computer vision has many applications across various industries and domains, such as healthcare, retail, education, entertainment, security, and more. Here is a selection of use cases related to vision and visual data that could drive value for your organization.

Here are some examples of how you can use the Azure Vision Cognitive Service features in your business:

  • A retailer could use image analysis to tag and organize their product catalog based on visual features such as color, style, or brand. They could also use face recognition to provide personalized recommendations or offers based on customer preferences or loyalty (if privacy permits of course).
  • A healthcare provider could use OCR to extract text from medical records or prescriptions. They could also use face verification to authenticate patients or staff using biometric data.
  • A media company could use video indexer to index and analyze their video content for content moderation, transcription, translation, captioning, sentiment analysis, topic extraction, and more. They could also use spatial analysis to measure audience engagement and behavior during live events or broadcasts.
  • A law enforcement agency could use face identification to match suspects or missing persons against a database of known faces. They could also use custom object detection to detect weapons or other objects of interest in images or videos.
  • A tourism company could use entity linking to enrich their travel guides with relevant information from Wikipedia. They could also use image analysis to generate captions for their photos or videos.
  • A gaming company could use face detection to create avatars or characters based on user's faces. They could also use spatial analysis to create immersive and interactive environments that respond to user's movements.

How can you get started? And what do you need?

To use the Azure Vision Cognitive Service features, you will need an Azure account and a subscription to the Vision service. Once you have these set up, you can access the features through the Vision Studio web portal or the REST APIs and client libraries. Depending on the specifics of your use case, you may also need to use other Azure services such as Azure Storage for storing data or Azure Functions for processing data.

Vision Studio enables you to use the service features without writing any code. You can create projects, upload data, train models, test models, deploy models, and monitor performance all from one place. You can also export your models as containers or endpoints for integration with your applications.

It’s important to carefully plan and design your use case to determine which Azure resources you will need to achieve your desired outcome.

What does it cost?

As it is a cloud service, you pay as you go based on how much you use and which services you use. For the vision models, the count is based on transactions where one transaction corresponds a feature that you have selected. Features are face, entity, tag, spatial analysis, ... Furthermore, if you scan a 100 page document, each page constitutes as one transaction. If a video has more than one minute of duration (or 200 frames), each additional minute (or 200 frames) counts as one transaction.

Here is a non-exhaustive list of the pricing of Azure Vision Cognitive Service (note that you get 5 000 transactions for free per month for almost all the services) that seems most relevant for us: 

  • For the standard image analysis features such as OCR, tagging, description generation etc., you pay ~ € 1 per 1 000 transactions for the first 1 million transactions per month
  • For custom image classification & object detection,  you pay ~ € 18 per hour for training time & ~ € 1,8 per 1 000 transactions for prediction time

How to get started?

Do you think you have an interesting use case that could make use of the Azure Vision Cognitive Service or any other computer vision service? Do not hesitate to reach out and let’s discuss how it would practically work in your environment.

Stay tuned for the next post on the Azure Decision Cognitive Service!