Diatoz

August 21, 2021

Machine learning in computer vision has created a phenomenal improvement in the whole field. Machine learning based methods are the one ranked top in the major benchmarks. In computer vision, object detection is one of the important task that are used in numerous application in both images and videos. Before getting to Object detection, lets understand what is image classification in order to get some foundation.

Image classification

Image classification is a fundamental task in computer vision and it is the base for Object detection. In image classification, the label of an image is classified using an image classification model. The goal is to classify the image by assigning it to a specific label. Typically, Image Classification refers to images which has only one object. But, object detection involves both classification and localization, and is used to analyze more real life use cases in which multiple objects may exist in an image.

Image classification is a supervised learning task. That means we have to prepare the dataset in which images of same label are collected and compiled together for all the labels.

Object Detection: Image classification with localization

Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. The output of the model will be bounding boxes.

CNN for Object detection

CNN Convolutional Neural Network is a kind of Neural Networks that has filter which convolves with the image pixels and widely used in visual data interpretation task in computer vision.

There are two major kind of CNN architecture for object detection,

Single Step Object detection
Two Step Object detection

Single-Step Object Detection

These kind of model uses just one type of architecture (CNN) for feature extraction, localization and classification end to end and can be easily trainable.

Below are some of the models that uses this approach,

YOLO,
SSD,
RetinaNet

Two-Step Object Detection

These kind of model takes two different steps in order to get the detection outputs. Typically, these kind has convolutional layers for feature extraction and RPN (region proposal network) and fully connected layers for localization and classification.

Below are some of the models that uses this approach,

RCNN
Fast-RCNN
Faster-RCNN
Mask-RCNN

Transformers for Object detection

CNN only architecture have been the best architecture so far until transformers comes in and leaded the benchmarks of object detection. Transformers were introduced to the field of Natural Language processing and soon it finds its way to computer vision. And by the way, transformers is not replacing CNN, it is only enhancing the architecture by using CNN as feature extractor.

Example of Vision Transformer is DETR(Detection Transformer) which is an end to end object detection model that does object classification and localization i.e boundary box detection.

Applications for Object detection

Face detection
Counting and Analysis
Visual search engine
Visual tagging
Aerial view analysis
Video analytics
And much more.

Object detection using Neural Networks (Deep learning)

Diatoz

August 21, 2021

Image classification

Object Detection: Image classification with localization

CNN for Object detection

Single-Step Object Detection

Two-Step Object Detection

Transformers for Object detection

Applications for Object detection

Add Your Comment

Categories

Business

Consulting

Financial Planning

Applied AI

Recent Posts

HOW AI HELPS TO MAKE A SUCCESSFUL BUSINESS

COVID impacts and benefits with technology

MARKETING CHANGES IN A DIGITAL WORLD