Build Vision-Enabled Apps and Services Using Azure Cognitive Services: Vision APIs

Overview
In a previous blogpost I was playing around with Speech APIs showcasing its various features that give your apps mouth to talk and ears to hear.
In this blogpost I will showcase Vision APIs that give your apps eyes to see!
Vision APIs give your apps the power to identify and analyze content within images, videos, and digital ink.
Vision APIs Breakdown
Demo
Let’s try each one through the Azure portal.
If you find any of the following demos interesting just hover back to the breakdown section of the blog and click the corresponding links to try it for yourself!
Computer Vision
In the following demo I used a picture of Iron Man flying close to a building in New York, and as you can see in the returned tags, the API detected that there is a fictional character, and in the Objects, the API detected that there is a tower!

What if we have digital or even handwritten text? I will write in that I am Ali Heikal and I am Iron Man!

The result comes in JSON format so it can be easily used!

It can even recognize brand, celebrities, and even landmarks, let’s try for example the famous Great Sphinx of Giza in Egypt!

The Computer Vision API can be used in various scenarios, including, for example, analyzing videos in real-time through scattered frames!
Ink Recognizer
The Ink Recognizer helps turn written content into shapes and text (including many languages), let’s try writing my name Ali in a circle and see what happens!

Custom Vision
I will be covering this in a separate blogpost later on, but in short, it is a customization version of the Computer Vision API with the ability to even export models!
Face
In the following demo I used 2 pictures of myself to see if the API will verify that the person is both pictures is the same, me!

What about age and emotions?
In the following demo I used a picture Robert Downey, Jr. from Iron Man (2018) and the results were very descriptive as you can see!

In the following demo I tested the accuracy of emotion recognition on one of my pictures that I took in a museum in Finland and since it’s me in the picture I know for a fact that the results are legit!

Video Indexer
I will be covering this in a separate blogpost later on, but in short, it combines a lot of Vision and Speech features altogether for videos including transcription, understanding who spoke when, and their emotions in connection to their words.
Form Recognizer
The Form Recognizer can recognize text, key/value pairs, and tables from documents, forms, and receipts without the need of manual interference.

Resources
If you are eager to start building using Vision APIs you can get going using the available SDKs and examples that explains how to do so.
- Computer Vision Documentation
- Ink Recognizer Documentation
- Custom Vision Documentation
- Face Documentation
- Video Indexer Documentation
- Form Recognizer
Summary
Vision APIs are revolutionary and can without a shadow of a doubt disrupt business, in all industries, which is in fact the reason why everyone is trying to jump in on, but while doing so, we need to think about the ethics of using such a technology!