Object Detection is a computer vision task that aims to detect instances of a class (e.g. cars, bicyles, humans, etc.) in an images or videos. Check this post to learn more about Object Detection, the different sub-tasks as well as available model architecture commonly used - link.
This article show how we can quickly perform Object Detection on our own set of images but leveraging freely available models from TF Hub which where pre-trained on this task.
We will be using Faster R-CNN which has a very complex architecture and can be backed by diffrent type of architectures (e.g. VGG). The follow diagram illustrates at a very high level the model architecture, to learn more about this model check this article to learn more about this model - link.
In our case we will use an Inception V2-backed model similarly to the original architecture in the Faster R-CNN paper. The model we will use is pretrained on the huge COCO dataset and available on TF Hub.
First, we need to downlad TensorFlow Object Detection API and install it. We need to do this to use some of the utility functions provided by this API to quickly visualize the output of Object Detection, i.e. the image along with the detected instances and their bounding boxes.
%%capture
%%bash
git clone --depth 1 https://github.com/tensorflow/models
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .
Now we can import TF Hub and TF Object Detection APIs as well as all the other packages we'll be needing
import glob
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from PIL import Image
from object_detection.utils import visualization_utils
from object_detection.utils.label_map_util import create_category_index_from_labelmap
%matplotlib inline
Let's download the Fatser R-CNN model from TF Hub
MODEL_PATH = ('https://tfhub.dev/tensorflow/faster_rcnn/inception_resnet_v2_1024x1024/1')
model = hub.load(MODEL_PATH)
To be able to map the output of the model to some meaninful class name, we need to load COCO's category index as follows
labels_path = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
CATEGORY_IDX = create_category_index_from_labelmap(labels_path)
We need to define a utility function to load the test image and process them before passing them to the model
def load_image(path):
image_data = tf.io.gfile.GFile(path, 'rb').read()
image = Image.open(BytesIO(image_data))
width, height = image.size
image = np.array(image.getdata())
image = image.reshape((1, height, width, 3)).astype('uint8')
return image
The following function will be used to load an image, run the Faster R-CNN model on it and return an image on which the instances are identified with bounding boxes
def get_image_with_boxes(model, image_path):
image = load_image(image_path)
results = model(image)
# Convert the results to NumPy arrays
model_output = {k: v.numpy() for k, v in results.items()}
# Create a visualization of the detected instances with their boxes, scores, and classes
boxes = model_output['detection_boxes'][0]
classes = model_output['detection_classes'][0].astype('int')
scores = model_output['detection_scores'][0]
image_with_boxes = image.copy()[0]
# draw boxes on the output image, along with the classes and scores
visualization_utils.visualize_boxes_and_labels_on_image_array(
image=image_with_boxes,
boxes=boxes,
classes=classes,
scores=scores,
category_index=CATEGORY_IDX,
use_normalized_coordinates=True,
max_boxes_to_draw=200,
min_score_thresh=0.30,
agnostic_mode=False,
line_thickness=5
)
return image_with_boxes
Get some images for testing
%%bash
mkdir -p images
curl -s -o images/bicycle1.jpg https://cdn.pixabay.com/photo/2016/11/30/12/29/bicycle-1872682_960_720.jpg
curl -s -o images/bicycle2.jpg https://cdn.pixabay.com/photo/2016/11/22/23/49/cyclists-1851269_960_720.jpg
curl -s -o images/animal1.jpg https://cdn.pixabay.com/photo/2014/05/20/21/20/bird-349026_960_720.jpg
curl -s -o images/animal2.jpg https://cdn.pixabay.com/photo/2018/05/27/18/19/sparrows-3434123_960_720.jpg
curl -s -o images/car1.jpg https://cdn.pixabay.com/photo/2016/02/13/13/11/oldtimer-1197800_960_720.jpg
curl -s -o images/car2.jpg https://cdn.pixabay.com/photo/2016/09/11/10/02/renault-juvaquatre-1661009_960_720.jpg
Call the previous get_image_with_boxes
function on each image
images = []
image_paths = glob.glob('images/*')
for path in image_paths:
image_with_annotation = get_image_with_boxes(model, path)
images.append(image_with_annotation)
Finally we plot the output images and detected bounding boxes, note the quality of the predictions of this model
figure, axis = plt.subplots(2, 3, figsize=(35, 15))
for index, image in enumerate(images):
row, col = int(index / 3), index % 3
axis[row, col].imshow(image)
axis[row, col].axis('off')
We have seen how we can leverage TF Hub and easily use out of box pretrained model with a very complex architecture to perform a very complex task such as Object Detection. Implementing and training Faster R-CNN is not an easy task, but thanks to TF Hub we can effortlessly use the result of the hard work of the Deep Learning community in building and training such models.