Object detection using Neural Networks (Deep learning)



August 21, 2021

Machine learning in computer vision has created a phenomenal improvement in the whole field. Machine learning based methods are the one ranked top in the major benchmarks. In computer vision, object detection is one of the important task that are used in numerous application in both images and videos. Before getting to Object detection, lets understand what is image classification in order to get some foundation.

Image classification

Image classification is a fundamental task in computer vision and it is the base for Object detection. In image classification, the label of an image is classified using an image classification model. The goal is to classify the image by assigning it to a specific label. Typically, Image Classification refers to images which has only one object. But, object detection involves both classification and localization, and is used to analyze more real life use cases in which multiple objects may exist in an image.

Image classification is a supervised learning task. That means we have to prepare the dataset in which images of same label are collected and compiled together for all the labels.

Object Detection: Image classification with localization

Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. The output of the model will be bounding boxes.

CNN for Object detection

CNN Convolutional Neural Network is a kind of Neural Networks that has filter which convolves with the image pixels and widely used in visual data interpretation task in computer vision.

There are two major kind of CNN architecture for object detection,

  • Single Step Object detection
  • Two Step Object detection

Single-Step Object Detection

These kind of model uses just one type of architecture (CNN) for feature extraction, localization and classification end to end and can be easily trainable.

Below are some of the models that uses this approach,

  • YOLO,
  • SSD,
  • RetinaNet

Two-Step Object Detection

These kind of model takes two different steps in order to get the detection outputs. Typically, these kind has convolutional layers for feature extraction and RPN (region proposal network) and fully connected layers for localization and classification.

Below are some of the models that uses this approach,

  • RCNN
  • Fast-RCNN
  • Faster-RCNN
  • Mask-RCNN

Transformers for Object detection

CNN only architecture have been the best architecture so far until transformers comes in and leaded the benchmarks of object detection. Transformers were introduced to the field of Natural Language processing and soon it finds its way to computer vision. And by the way, transformers is not replacing CNN, it is only enhancing the architecture by using CNN as feature extractor.

Example of Vision Transformer is DETR(Detection Transformer) which is an end to end object detection model that does object classification and localization i.e boundary box detection.

Applications for Object detection

  • Face detection
  • Counting and Analysis
  • Visual search engine
  • Visual tagging
  • Aerial view analysis
  • Video analytics
  • And much more.

Add Your Comment