YOLO means “You only look once” and it’s an algorithm for real-time object recognition. This is a picture of my desk and I’ve trained the algorithm to detect three types of objects: K-2SO units, Darth Vaders and Tie Fighters.

Desk

In this post I’m going to describe what I did and what to expect if you want to do something similar.

Installing Darknet (Windows)

Darknet is an open source neural network framework written in C and CUDA and it includes a YOLO implementation.

I use Windows, so I’m going to focus on that platform. If you are using Linux or Mac, you’ll probably have an easier time installing because those platforms are better suited for scientific purposes.

After testing different alternatives, I followed these instructions on how to install Darknet on Windows.

If you don’t want to do all this stuff or if you don’t have a CUDA compatible GPU, you can also run YOLO v4 in the cloud.

I personally dislike the notebook paradigm and didn’t want to be limited by the cloud environment, so I preferred installing on my machine. However, keep in mind that downloading, configuring and compiling will be time consuming.

In my case, I installed the newest versions of OpenCV, CUDA and cuDNN (CUDA Deep Neural Network library) and it worked fine. The only difference is that the guide will instruct you to copy a DLL or an include file and you’ll find more than one in recent versions (I just copied them all). Just as reference, these are the versions I’ve used:

  • OpenCV 4.5
  • CUDA 11.1 (456.43)
  • cuDNN 11.1 (v8.0.4.30)
  • CMake 3.19 (rc1)

Testing the installation

The second part of the guide will help you test your installation. In my case, I downloaded a traffic video and the results were very good.

darknet.exe -i 0 detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4

Traffic

Custom objects

There are basically two steps involved if you want to train YOLO v4 with custom objects:

  • Configuring the pipeline
  • Creating a dataset and labeling the images according to YOLO’s format

Training a model can take a very long time, so you’ll want to make sure that your pipeline is configured correctly first. In order to do so, you’ll want to train the neural network with a reliable dataset so if something goes wrong, you’ll know that the dataset is not to blame.

Wheat detection

This dataset provides images for wheat detection. I know, very exciting stuff!!! 😉

This Darknet repository has all the instructions for configuring the pipeline, but I’ll provide some additional tips based on my experience.

At first, I tried using “regular YOLO” (lacking a better name). However, I soon noticed that it would take 30 hours to complete training, so I decided to go with Tiny Yolo instead.

The repository instructions tell you to set subdivisions=16, but this produced a CUDA out of memory error in my case.

CUDA error

The error message was quite helpful and setting the value to 64 solved the problem.

It’s also a good idea to set random=1. This is explained in another section of the repository, but it enhances precision.

Finally, for tiny YOLO, this is the command I used for training. The project’s file structure is a little messy, so just in case, all my work was done in the following directory:

C:\darknet\build\darknet\x64

darknet.exe detector train data/obj.data cfg/yolov4-tiny-obj.cfg yolov4-tiny.conv.29

On another terminal window, I kept running this command to monitor how object detection was progressing:

darknet.exe detector test data/obj.data cfg/yolov4-tiny-obj.cfg backup/yolov4-tiny-obj_last.weights

The weights file is updated every 100 iterations. Keep in mind that it’s normal for the algorithm not to detect anything during an iteration, even if it detected an object in a previous one. This usually happened to me during the first hundreds of iterations.

Finally, you don’t need to complete the full training since wheat detection is just a test, so if you get something like this around 700 or 800 interations then you are on the right track.

Wheat

Star Wars

Creating the dataset

A dataset consists of images and text files describing the location of the objects we want to detect. It’s one text file per image, but it can include multiple lines if there are multiple objects in the image.

The first step was gathering enough images. According to the documentation, you need at least 2,000 images per type of object. For sure I wasn’t going to search and download all those images manually, so I decided to take a shortcut and hope for the best. 😉

This is what worked for me:

  • I selected 50 textures from the Describable Textures Dataset project. The idea was to use real life textures as backgrounds for my generated images.

  • Then I downloaded some K-2SO images with a white background. Same for Darth Vader and the Tie Fighter. You need to replace the white background with alpha color. I used Paint.Net and it was quite easy (click on the white background using the Magic Wand, Invert Selection, Copy, Paste into new Image and Save as PNG).

    Tie Alpha

  • Try variety of sizes, orientations, colors, etc. For example, these are the Tie Fighter thumbnails.

    Dataset

  • Finally I coded a script to perform the following tasks:

    • Randomly select an image and perform a slight rotation (up to 15 degrees).
    • Resize the image randomly.
    • Paste the image over the texture background at a random position.
    • Adjust the brightness of the image using a random factor.
    • Since I knew exactly where I had pasted the image, create a text file describing the position according to YOLO’s specification.

Let’s see an example of the masterpieces I’ve produced. 😉

Created1

This is the Python script in case you want to modify it.

from PIL import Image, ImageDraw, ImageEnhance
import random
from os import listdir
from os.path import isfile, join
import collections

FolderData = collections.namedtuple('ImagesData', 'className folder')

img_width = 800
img_height = 800
image_qty = 2000
base_path = "C:/darknet/build/darknet/x64/data"


def create(folder_data, index, textures, train_file):

    images = [f for f in listdir(folder_data.folder) if isfile(join(folder_data.folder, f))]

    for i in range(0, image_qty):
        im = Image.new('RGB', (img_width, img_height))

        rnd_idx = random.randint(0, len(images) - 1)
        starwars_img = Image.open(f"{folder_data.folder}/{images[rnd_idx]}")

        if i % 5 == 0:
            angle = 0
        else:
            angle = random.randint(0, 15)

        rotated = starwars_img.rotate(angle, expand=1)
        new_width = int(round(rotated.width * random.uniform(0.80, 1.40), 0))
        width_percent = (new_width / float(rotated.size[0]))
        new_height = int((float(rotated.size[1]) * float(width_percent)))
        rotated = rotated.resize((new_width, new_height), Image.ANTIALIAS)

        rnd_texture = random.randint(0, len(textures) - 1)
        texture_img = Image.open("textures/" + textures[rnd_texture])
        texture_img = texture_img.resize((img_width, img_height), Image.ANTIALIAS)
        im.paste(texture_img, (0, 0), texture_img.convert('RGBA'))

        x = random.randint(0, 300)
        y = random.randint(0, 300)
        im.paste(rotated, (x, y), rotated.convert('RGBA'))

        # uncomment for debug purposes
        # draw = ImageDraw.Draw(im)
        # draw.rectangle([x, y, x + rotated.width, y + rotated.height])

        bright_factor = random.uniform(0.80, 1)
        enhancer = ImageEnhance.Brightness(im)
        im = enhancer.enhance(bright_factor)

        im.save(f"{base_path}/obj/{folder_data.className}_{i}.jpg", format='jpeg')

        with open(f"{base_path}/obj/{folder_data.className}_{i}.txt", "w") as f:
            rw = rotated.width
            rh = rotated.height
            f.write(f'{index} {(x + (rw / 2)) / img_width} {(y + (rh / 2)) / img_height} {rw / img_width} {rh / img_height}')

        train_file.write(f"data/obj/{folder_data.className}_{i}.jpg\n")


def Dataset():

    textures = [f for f in listdir("textures") if isfile(join("textures", f))]

    folders = [
        FolderData(className="Vader", folder="images/vader"),
        FolderData(className="K-2SO", folder="images/k2s0"),
        FolderData(className="Tie fighter", folder="images/tie"),
    ]

    with open(base_path + "/obj.data", "w") as f1:
        f1.write(f"classes = {len(folders)}\n")
        f1.write("train = data/train.txt\n")
        f1.write("valid = data/test.txt\n")
        f1.write("names = data/obj.names\n")
        f1.write("backup = backup/\n")

    with open(base_path + "/obj.names", "w") as f2:
        for folder_data in folders:
            f2.write(f"{folder_data.className}\n")

    with open(base_path + "/train.txt", "w") as train_file:
        for index, folder_data in enumerate(folders):
            create(folder_data, index, textures, train_file)


if __name__ == "__main__":
    Dataset()

The only tricky part was describing the object’s position in the text file, because it’s relative to the center of the image, not to the top-left corner like we are used to. There’s a program commonly used for labeling images, which is called LblImage and it was handy for debugging purposes because I could compare the values the program generated with my own ones. Here’s a screenshot of the program for you to get an idea.

LblImage

The script also includes two lines that you can uncomment in order to verify that the square delimiting the object is where it’s supposed to be.

LblImage

Jedi training

I used the exact same commands I used for wheat detection, but adjusted the config files because in this case we have three types of objects as opposed to just one. You can also use the -thresh paramater to define a detection threshold.

By default, Darknet displays a graph so that you can know the exact iteration and a time estimation until completion.

Chart

Make sure to also read when should I stop training.

The bad thing about training the neural network is that it takes many hours (7 in my case for Tiny YOLO), so you can’t iterate fast and mistakes are costly. For instance, I had mistakenly labeled all images as “Darth Vader” and had to start again when I had already spent 5 hours training.

After training is complete, you’ll probably have to experiment a little with the following parameters in yolov4-tiny-obj.cfg.

width=1056
height=1056

For example, I used 416 for training like the repository instructed and then found that 1056 provided the best results for detection.

You can analize images, like the one at the beginning of this post, or also video. Videos are distracting while you are reading, so I’ve included a sample at the end of this post.

Something that I’ve found cool is that I didn’t include this experimental Tie Fighter model in the dataset and the network managed to guess it could be a Tie Fighter due to the cabin.

Video

Wait, that’s the wrong franchise

I would say it’s almost a miracle that YOLO works decently with the low quality dataset I’ve produced, so it’s not unreasonable for the network to find resemblance between Sauron and Darth Vader based on the helmet, weapon in one hand, color, etc.

Franchise

I guess this is an example of how important it is to produce a high quality dataset and to use better models (not the tiny one) for traffic detection or other scenarios in which mistakes could be fatal.

Final thoughts

I had never done anything related to object detection before, so it was an interesting experiment. Waiting hours for the training to complete really sucked, but the final result was fun and rewarding (although I had to constantly tweak some parameters and I’ve picked the best results for this post).

Finally, here’s the sample of real time video detection I promised earlier.

Video

Thanks for reading!!! 😃