In a previous blogpost I was playing around with Speech APIs showcasing its various features that give your apps mouth to talk and ears to hear.
In this blogpost I will showcase Vision APIs that give your apps eyes to see!
Vision APIs give your apps the power to identify and analyze content within images, videos, and digital ink.
Vision APIs Breakdown
Let’s try each one through the Azure portal.
If you find any of the following demos interesting just hover back to the breakdown section of the blog and click the corresponding links to try it for yourself!
In the following demo I used a picture of Iron Man flying close to a building in New York, and as you can see in the returned tags, the API detected that there is a fictional character, and in the Objects, the API detected that there is a tower!
What if we have digital or even handwritten text? I will write in that I am Ali Heikal and I am Iron Man!
The result comes in JSON format so it can be easily used!
It can even recognize brand, celebrities, and even landmarks, let’s try for example the famous Great Sphinx of Giza in Egypt!
The Computer Vision API can be used in various scenarios, including, for example, analyzing videos in real-time through scattered frames!
The Ink Recognizer helps turn written content into shapes and text (including many languages), let’s try writing my name Ali in a circle and see what happens!
I will be covering this in a separate blogpost later on, but in short, it is a customization version of the Computer Vision API with the ability to even export models!
In the following demo I used 2 pictures of myself to see if the API will verify that the person is both pictures is the same, me!
What about age and emotions?
In the following demo I used a picture Robert Downey, Jr. from Iron Man (2018) and the results were very descriptive as you can see!
In the following demo I tested the accuracy of emotion recognition on one of my pictures that I took in a museum in Finland and since it’s me in the picture I know for a fact that the results are legit!
I will be covering this in a separate blogpost later on, but in short, it combines a lot of Vision and Speech features altogether for videos including transcription, understanding who spoke when, and their emotions in connection to their words.
The Form Recognizer can recognize text, key/value pairs, and tables from documents, forms, and receipts without the need of manual interference.
If you are eager to start building using Vision APIs you can get going using the available SDKs and examples that explains how to do so.
- Computer Vision Documentation
- Ink Recognizer Documentation
- Custom Vision Documentation
- Face Documentation
- Video Indexer Documentation
- Form Recognizer
Vision APIs are revolutionary and can without a shadow of a doubt disrupt business, in all industries, which is in fact the reason why everyone is trying to jump in on, but while doing so, we need to think about the ethics of using such a technology!