Object Detection is a computer vision task that aims to detect instances of a class (e.g. cars, bicyles, humans, etc.) in an images or videos. Check this post to learn more about Object Detection, the different sub-tasks as well as available model architecture commonly used - link.

This article show how we can quickly perform Object Detection on our own set of images but leveraging freely available models from TF Hub which where pre-trained on this task.

We will be using Faster R-CNN which has a very complex architecture and can be backed by diffrent type of architectures (e.g. VGG). The follow diagram illustrates at a very high level the model architecture, to learn more about this model check this article to learn more about this model - link.

Faster RCNN architecture

In our case we will use an Inception V2-backed model similarly to the original architecture in the Faster R-CNN paper. The model we will use is pretrained on the huge COCO dataset and available on TF Hub.

First, we need to downlad TensorFlow Object Detection API and install it. We need to do this to use some of the utility functions provided by this API to quickly visualize the output of Object Detection, i.e. the image along with the detected instances and their bounding boxes.

%%capture
%%bash

git clone --depth 1 https://github.com/tensorflow/models

cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

Now we can import TF Hub and TF Object Detection APIs as well as all the other packages we'll be needing

import glob
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

from PIL import Image
from object_detection.utils import visualization_utils
from object_detection.utils.label_map_util import create_category_index_from_labelmap

%matplotlib inline

Let's download the Fatser R-CNN model from TF Hub

MODEL_PATH = ('https://tfhub.dev/tensorflow/faster_rcnn/inception_resnet_v2_1024x1024/1')
model = hub.load(MODEL_PATH)

To be able to map the output of the model to some meaninful class name, we need to load COCO's category index as follows

labels_path = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
CATEGORY_IDX = create_category_index_from_labelmap(labels_path)

We need to define a utility function to load the test image and process them before passing them to the model

def load_image(path):
    image_data = tf.io.gfile.GFile(path, 'rb').read()
    image = Image.open(BytesIO(image_data))
    width, height = image.size
    image = np.array(image.getdata())
    image = image.reshape((1, height, width, 3)).astype('uint8')
    return image

The following function will be used to load an image, run the Faster R-CNN model on it and return an image on which the instances are identified with bounding boxes

def get_image_with_boxes(model, image_path):
    image = load_image(image_path)
    results = model(image)
    # Convert the results to NumPy arrays
    model_output = {k: v.numpy() for k, v in results.items()}
    # Create a visualization of the detected instances with their boxes, scores, and classes
    boxes = model_output['detection_boxes'][0]
    classes = model_output['detection_classes'][0].astype('int')
    scores = model_output['detection_scores'][0]
    image_with_boxes = image.copy()[0]
    # draw boxes on the output image, along with the classes and scores
    visualization_utils.visualize_boxes_and_labels_on_image_array(
        image=image_with_boxes,
        boxes=boxes,
        classes=classes,
        scores=scores,
        category_index=CATEGORY_IDX,
        use_normalized_coordinates=True,
        max_boxes_to_draw=200,
        min_score_thresh=0.30,
        agnostic_mode=False,
        line_thickness=5
        )
    return image_with_boxes

Get some images for testing

%%bash
mkdir -p images
curl -s -o images/bicycle1.jpg https://cdn.pixabay.com/photo/2016/11/30/12/29/bicycle-1872682_960_720.jpg
curl -s -o images/bicycle2.jpg https://cdn.pixabay.com/photo/2016/11/22/23/49/cyclists-1851269_960_720.jpg
curl -s -o images/animal1.jpg https://cdn.pixabay.com/photo/2014/05/20/21/20/bird-349026_960_720.jpg
curl -s -o images/animal2.jpg https://cdn.pixabay.com/photo/2018/05/27/18/19/sparrows-3434123_960_720.jpg
curl -s -o images/car1.jpg https://cdn.pixabay.com/photo/2016/02/13/13/11/oldtimer-1197800_960_720.jpg
curl -s -o images/car2.jpg https://cdn.pixabay.com/photo/2016/09/11/10/02/renault-juvaquatre-1661009_960_720.jpg

Call the previous get_image_with_boxes function on each image

images = []
image_paths = glob.glob('images/*')
for path in image_paths:
    image_with_annotation = get_image_with_boxes(model, path)
    images.append(image_with_annotation)

Finally we plot the output images and detected bounding boxes, note the quality of the predictions of this model

figure, axis = plt.subplots(2, 3, figsize=(35, 15))
for index, image in enumerate(images):
    row, col = int(index / 3), index % 3
    axis[row, col].imshow(image)
    axis[row, col].axis('off')

We have seen how we can leverage TF Hub and easily use out of box pretrained model with a very complex architecture to perform a very complex task such as Object Detection. Implementing and training Faster R-CNN is not an easy task, but thanks to TF Hub we can effortlessly use the result of the hard work of the Deep Learning community in building and training such models.

Note: that we were using labels defined in the COCO dataset on which the model was pretrained. If we want to perform Object Detection on our custom dataset or different labels we would need to retrain the model which we will see in another article.