Create an object detector

  1. Requirements
  2. Create a Guild AI project
  3. Initialize a project environment
  4. Install gpkg.object-detect
  5. Extend a model configuration
  6. Add dataset support
  7. Obtain annotated images
  8. Prepare dataset
  9. Train a detector using transfer learning
  10. Check model accuracy
  11. Monitor progress with TensorBoard
  12. Stop training
  13. Export and freeze a trained model
  14. Detect objects in an image
  15. Deploy a trained model
  16. Summary

In this guide we create an object detector using support from gpkg.object-detect. We demonstrate a number of Guild AI features that accelerate model development and automate workflow:

  • Reuse high-level model configuration from gpkg.object‑detect
  • Automate workflow steps used to train, evaluate, and deploy a state-of-the-art object detector

Requirements

Many of the operations run in this guide are not feasible without a GPU accelerated system. If you don’t have access to a GPU, consider using a GPU accelerated server or container from AWS, Google Cloud, Microsoft Azure, and others.

If your GPU accelerated system is remote, run the commands in this guide over SSH while connect to that system.

Create a Guild AI project

In this section we create a Guild AI project that will define an object detector model.

A Guild project is a directory that contains a Guild file, which is named guild.yml and located in the project root directory.

For the examples, we use the environment variable PROJECT to represent the project directory. For our example, we create the project in ~/sample‑object‑detector. Feel free to create the project in another location. If you use a different location, define PROJECT accordingly.

Define the location of the sample object detector project:

PROJECT=~/sample-object-detector

Create the project directory:

mkdir $PROJECT

Unless otherwise noted, commands are run from the project directory. Change to that directory now:

cd $PROJECT

In the project directory, create the file guild.yml containing:

- model: detector
  description: Sample object detector

Save your changes to guild.yml, confirming that it’s located in the project directory (i.e. $PROJECT/guild.yml).

From a command line, use Guild to list the project models:

guild models

Guild shows the sample model you just defined in guild.yml:

./detector  Sample object detector

If you don’t see the detector model, verify that the guild.yml above is located in the project directory.

List the model operations:

guild operations

Note

You may alternatively use guild ops as a shortcut for this command. We use this short form for the remainder of this guide.

Guild doesn’t show anything because our model doesn’t define any operations. We modify the model later to add operations. First we need to install required software in a project environment.

Initialize a project environment

When working with projects, we recommend using a Guild environment to isolate project runs and installed Python packages. This ensures that work in one project does not conflict with work in other projects.

At a command prompt, ensure that you’re in the project directory:

cd $PROJECT

Use the init command to initialize a Guild environment:

guild init

Guild prompts you before initialzing the environment in the project env directory with the default Python runtime and TensorFlow version for your system.

Press Enter to confirm.

Guild creates a new Python virtual environment in the project directory under env, which contains the Python runtime, installed Python packages, and project Guild home.

A project environment must be activated using the operation system source command.

Activate the project environment:

source guild-env

When an environment is activated in a command console, the command prompt shows the environment name in the format (<env name>) <default prompt>. The environment name is the project directory name by default but can be set using the ‑‑name when running guild init.

Note

You must activate the project environment using source guild‑env each time you open a new command console for project work.

Verify that the environment is activated using the check command:

guild check

Guild shows environment details, including the location of Guild home, which is identified by guild_home in the output.

Confirm that the path for guild_home is in the project directory under env/.guild. If it is in a different location, verify the steps above to ensure that your project environment is initialized and activated.

Install gpkg.object‑detect

The object detector we create in this guide uses model support defined in the Guild package gpkg.object‑detect.

Guild packages are standard Python packages that can be installed using pip or guild install.

Note

Before installing a Python package for a project, verify that your project environment is activated by checking the command prompt. The prompt should contain the environment name in the form (<env name>) <default prompt>. If it does not run source guild‑env from the project directory to activate the environment.

When a Python package and its dependencies are installed in an activated environment, the packages are installed within the environment directory rather than a user or system directory. They are not visible to other environments, which avoids package version conflicts.

After Verifying that your environment is activated, install gpkg.object‑detect:

pip install cython && pip install gpkg.object-detect

Important

There is dependency on Cython from one of the packages required by gpkg.object‑detect. You must first install the cython package before installing gpkg.object‑detect.

Guild installs gpkg.object‑detect along with its dependencies.

List installed Guild packages by running:

guild packages

Guild shows the installed gpkg.object‑detect package along with its installed dependencies (e.g. gpkg.slim).

With gpkg.object‑detect installed, we can use it to extend our model with a detector model configuration.

Extend a model configuration

In this section, we modify our detector to extend one of the model configurations defined in the gpkg.object‑detect package.

gpkg.object‑detect supports the following configurations:

faster‑rcnn‑base
Base configuration for Faster RCNN models
faster‑rcnn‑resnet‑101
Faster RCNN detector with ResNet-101 backbone
faster‑rcnn‑resnet‑50
Faster RCNN detector with ResNet-50 backbone
model‑base
Base configuration for all object detect models
ssd‑base
Base configuration for all SSD models
ssd‑mobilenet‑v2
SSD detector with MobileNet v2 backbone

We can apply any of these configurations to our sample detector by extending it. We use ssd‑mobilenet‑v2 for our detector.

Modify guild.yml in the project directory to be:

- model: detector
  description: Sample object detector
  extends:
    - gpkg.object-detect/ssd-mobilenet-v2

By extending ssd‑mobilenet‑v2, our model inherits its operations, which support a workflow for building SSD object detectors with a MobileNet v2 backbone.

Save your changes to guild.yml.

List the model operations again:

guild ops

Guild shows the new list of operations, which are inherited from faster‑rcnn‑reset‑50:

./detector:detect             Detect images using a trained detector
./detector:evaluate           Evaluate a trained detector
./detector:export-and-freeze  Export a detection graph with checkpoint weights
./detector:train              Train detector from scratch
./detector:transfer-learn     Train detector using transfer learning

We use these operation to build a trained object detect, running them as follows:

  • train or transfer‑learn to train a model
  • evaluate to evaluate model performance using validation data
  • detect to use a trained model to detect objects in am image

Before we can train the model, we need support for a dataset.

Add dataset support

Train and evaluate operations require a dataset. In this section, we modify our model to include dataset support. Dataset support has two components:

  • A model operation that prepares the dataset for training and validation
  • Information about the prepared data, including its file format and class labels

Dataset support in gpkg.object‑detect is flexible—you’re free to provide data from any source, provided the data is prepared as TF Records that are split between training and validation.

For our detector, we add support for images with Pascal VOC formatted annotations. This type of dataset requires two inputs:

  • Directory of JPG, PNG, or GIF images
  • Directory of Pascal VOC XML formatted annotations associated with the images

To support this scheme, we add voc-annotated-images-directory-support to our model’s extends list.

Modify guild.yml to be:

- model: detector
  description: Sample object detector
  extends:
    - gpkg.object-detect/voc-annotated-images-directory-support
    - gpkg.object-detect/ssd-mobilenet-v2

Important

The order that items appear in extends is important as configuration appearing earlier in the list takes precedence over configuration appearing later. In the case of our model, voc‑annotated‑images‑directory‑support must appear before ssd‑mobilenet‑v2.

By extending voc‑annotated‑images‑directory‑support our model inherits a prepare operation.

Save your changes to guild.yml.

List model operations again:

guild ops

Guild shows the list of operations, which now includes:

./detector:prepare  Prepare images annotated using Pascal VOC format

We use prepare to process annotated images, converting them into TF records that are split between training and validation sets.

Before running prepare, we need some annotated images.

Obtain annotated images

If you have Pascal VOC annotated images, you can use them to train your detector.

If you don’t have images, download one of these datasets:

The Oxford-IIIT Pet Dataset images
annotations
Visual Object Classes Challenge 2008 combined

Note

You need both images and their associated annotations. If you select a dataset that separates images and annotations, ensure that you download both source files. If the dataset combines both images and annotations, just just need the combined source file.

Unpack any downloaded files and note the locations of the image files (i.e. JPG, PNG, and GIF files) and of the annotations (i.e. XML files).

Set the following variables:

IMAGES=<path to image files>
ANNOTATIONS=<path to annotations>

Replace the applicable paths above with the locations of your images and annotations.

Prepare dataset

Prepare the dataset for training and validation by running:

guild run prepare images=$IMAGES annotations=$ANNOTATIONS

Press Enter to accept the default settings.

Guild runs the operation, which processes the annotated images to prepare train and validation records.

Once prepared, the dataset is available to any operations that needs training data (e.g. train or transfer‑learn) or validation data (e.g. evaluate).

Let’s take a moment to view the operation results.

When you run an operation, Guild generates a run, which is a file system artifact containing run details, run output, and files created by the run.

List runs using guild runs:

guild runs

Guild shows available runs for the project (ID and date will differ):

[1:81998e28]  ./detector:prepare  2018-11-02 11:15:25  completed

If there are any failed runs in the list, you can delete them by running:

guild runs rm --error

Guild prompts you before deleting any runs.

Guild provides a host of run management and discovery features. Refer to Runs for details.

List files associated with the latest run:

guild ls

Guild shows files associated with the prepare operation (files will differ based on the dataset you prepared):

  dataset.yml
  deployment/
  labels.pbtxt
  nets/
  object_detection/
  slim/
  train-0001-0941.tfrecord
  train-0942-1898.tfrecord
  train-1899-2833.tfrecord
  train-2834-3568.tfrecord
  val-0001-0973.tfrecord
  val-0974-1528.tfrecord

The generated TF records files are those matching train‑*.tfrecord (used for training) and val‑*.tfrecord (used for validation). These contain the processed images and annotations.

dataset.yml contains information about the prepared dataset, including the number of processed examples, the number of classes, and the record storage method.

View dataset.yml using the cat command:

guild cat dataset.yml

Guild shows the contents of dataset.yml. For information about object detection configuration, see gpkg.object-detect Configuration.

labels.pbtxt is a map of annotated object labels to the numeric values stored in the TF records.

These files are generated by the prepare operation and are used by subsequent operations that need them.

Train a detector using transfer learning

Having prepared our dataset for training, we’re ready to train a detector.

Our detector supports two train operations:

train
Train the detector from scratch
transfer‑learn
Train the detector using transfer learning 1

In this guide, we use transfer‑learn, which saves time and can improve model accuracy for smaller datasets.

Start the transfer‑learn operation by running:

guild run transfer-learn --gpus 0

Note

The use of ‑‑gpus 0 ensures that the operation only uses the first GPU on the system (ID of 0). If your system has more than one GPU, you can use a second GPU later when running evaluate.

Press Enter to accept the default settings and start the operation.

The trains indefinitely by default. It’s common practice to let models train indefinitely without prescribing a fixed number of training steps or epochs. While the operation runs, you can routinely evaluate model performance and stop the operation when it’s clear that further training is not needed.

You can stop any operation by pressing Ctrl‑C in the command console where the operation is running. Alternatively you can run guild stop from a different command prompt.

Check model accuracy

While the model trains, we can check its accuracy by running the evaluate operation from a second command prompt.

To evaluate the model during training, open a new command console for the project:

  • Open a new command console or a new window/pane if using tmux.

  • Change to the project directory and activate the environment:

cd $PROJECT
source guild-env

Note

If you forget to activate the environment, you won’t see project runs or Python packages installed for the project. When in doubt, check the console prompt and look for the environment name in form (<env name>) <default prompt>. You can also run guild check and confirm that guild_home is located in the project directory.

From the activated project directory, list project runs:

guild runs

Guild shows the project runs the prepare and transfer‑learn runs (IDs and dates will differ):

[1:ffc693ac]  ./detector:transfer-learn  2018-10-30 16:08:50  running
[2:f93f43da]  ./detector:prepare         2018-10-30 13:45:29  completed

If you don’t see these runs, confirm that the working directory is $PROJECT and that you have activated the project environment by running source guild‑env from that directory.

Next, we run the evaluate operation, which evaluates the latest checkpoint saved by the transfer‑learn operation (run 1 above) using the validation data from the latest prepare operation (run 2 above).

If your system only has one GPU, you can’t use a GPU for the evaluate operation. In this case, start the operation using the ‑‑no‑gpus option:

guild run evaluate --no-gpus

Evaluating a model without GPU acceleration can take a long time on large datasets. To reduce the evaluation time, try setting eval‑examples to a number less than 1000. For example:

guild run evaluate --no-gpus eval-examples=100

This measurement is not as comprehensive as using all available examples (the default setting) but the operation will finish in less time.

Note

To stop a running operation, press Ctrl‑C in the operation command console. When stopped this way, a run has a status of terminated. You can delete terminated runs using guild runs rm ‑‑terminated. For more information on managing runs, see Runs.

If your system has more than one GPU, you can use a second GPU for the evaluate operation. In this case, start the operation using the ‑‑gpus option:

guild run evaluate --gpus 1

Note

The use of ‑‑gpus 1 in the command ensures that the operation only sees the second GPU and that it will not try to allocate memory on other GPUs. If you don’t explicitly control the visible GPUs with ‑‑gpus and ‑‑no‑gpus options, each TensorFlow operation preemptively consumes the memory on all visible GPUs, even if they’re not used.

The evaluate operation uses the validation records from the prepared dataset to measure model performance. As we are training an object detector, performance is measured using COCO mAP metrics.2

Each time you run evaluate, Guild generates a new run that records your measurement. The run saves accuracy metrics that can be viewed in TensorBoard and used by other programs.

Use guild compare to view run performance, including loss and accuracy:

guild compare

You can view run details, including the latest TensorFlow event scalar values, by navigating to a run using the Up and Down keys and pressing Enter. Exit the run detail screen by pressing q.

Exit Guild Compare by pressing q.

While the model continues to train, we monitor its progress with TensorBoard next.

Monitor progress with TensorBoard

In this section, we use TensorBoard to monitor the transfer learn operation and determine when to stop training.

Using a second command console, from the project directory, start TensorBoard:

guild tensorboard

Guild starts TensorBoard and opens it in a new browser window. Guild manages TensorBoard so that changes to project runs automatically appear in TensorBoard.

If you run the tensorboard command on a remote server, Guild does not open TensorBoard in your browser. You must open the link that Guild shows in the remote command console in your browser manually. If Guild starts TensorBoard on a port that you cannot access—e.g. due to firewall restrictions—terminate the tensorboard command by pressing Ctrl‑C and run the command again, specifying a port that you can access using the ‑‑port option. For example, if you can access port 8080 on the remote server, start TensorBoard by running:

guild tensorboard --port 8080

When you have opened TensorBoard in your browser, in the TensorBoard Scalars tab, type or paste the following regular expression into the Filter tags field at the top of the page:

loss|gpu

TensorBoard shows matching scalars, including the various training losses associated with the operation as well as GPU information.

Watch the progress of the losses and use them as a gauge for determining when to stop training.

Sample training losses from TensorBoard

When losses are no longer decreasing, or are increasing, consider stopping the transfer learn operation—more training will not likely improve model performance.

You can stop training at any point, however we recommend training for at least 5K-10K steps to give the model a chance to transfer learn. Use evaluate (see Check model accuracy above) to calculate model precision to confirm before stopping.

When you are finished monitoring the run, in the second command console—the console where TensorBoard is running—press Ctrl‑C to quit TensorBoard.

Stop training

Stop the transfer learn operation using one of these two methods:

  • In the command console where the transfer‑learn operation is running, press Ctrl‑C

or

  • In a second command console, use the stop command:
guild stop

Press y and Enter to confirm that you want to stop the transfer‑learn operation.

Guild stops the transfer learn operation.

When you stop a run, the run exits with a status of terminated. A terminated training operation does not indicate failure—only that the training was stopped. In this case, the operation is designed to run until it’s explicitly stopped.

List the files associated with the transfer learn operation:

guild ls --operation transfer-learn

Guild shows a number of files, including files under a train subdirectory. The files under train represent the trained model checkpoints associated with various steps. As we’ll see in the next section, checkpoints are used to create a frozen inference model.

Export and freeze a trained model

To use our trained object detector for inference, we need to export the model architecture and freeze its weights using a checkpoint. This process generates a frozen inference graph, which is a binary file used to initialize a trained model in TensorFlow.

Run the export‑and‑freeze command:

guild run export-and-freeze

Press Enter to confirm.

By default, Guild uses the latest checkpoint from the latest training run (i.e. the latest transfer‑learn operation) to generate the frozen inference graph.

You can specify different runs or checkpoint steps with trained‑model and step flags respectively.

To get help for the export‑and‑freeze operation, run:

guild run export-and-freeze --help-op

Guild shows operation help:

Usage: guild run [OPTIONS] detector:export-and-freeze [FLAG]...

Export a detection graph with checkpoint weights

Use 'guild run --help' for a list of options.

Dependencies:
  trained-model  Trained model from train or transfer-learn

Flags:
  step  Checkpoint step to use for the frozen graph (latest checkpoint)

As needed, you can run the operation using an alternative trained model and checkpoint step this way:

guild run export-and-freeze trained-model=<run ID> step=<step>

Replace <run ID> with the run ID associated with the trained model and replace <step> with the checkpoint step.

To view the available checkpoints for a transfer learn operation, run:

guild ls --operation transfer-learn --path train/model

The use of ‑‑path limits the file listing to the trained model checkpoints.

Checkpoint steps are shown in files named train/model.ckpt‑STEP.* where STEP is the available checkpoint step.

Detect objects in an image

With a frozen inference graph, we can detect objects in an image.

Create a new directory containing one or more images that you want to detect.

For this example, we create a directory in /tmp—feel free to use another location.

DETECT_IMAGES=/tmp/sample-detect-images

Try including images from the original dataset as well as other images—for example from the Internet or those you’ve collected yourself. Images may include zero or more instances of detectable objects.

Run the detect operation, specifying the location of the images you want to detect:

guild run detect images=$DETECT_IMAGES

Guild uses the frozen inference graph from the latest export‑and‑freeze operation to load and initialize the object detector. It runs each image in $DETECT_IMAGES through the model to both classify and locate objects with bounding boxes.

When the detect operation finishes, you can view the detected objects using either Guild View—a Guild AI application used to view runs—or an image viewer installed on your system.

Start Guild View:

guild view

Guild opens a new browser window running Guild View. Guild View provides a number of helpful features:

  • Explore project runs
  • Compare runs (similar to guild compare)
  • View runs in TensorBoard (similar to guild tensorboard)
  • View run files
  • Search run output

Explore project runs in Guild View

If you are running the command on a remote server, as with TensorBoard, Guild does not open a window in your browser. You need to open the link that Guild provides in your browser manually. If you cannot access the port that Guild uses, specify a port that you can access using the ‑‑port option. For example, to run Guild View on port 8080, use:

guild view --port 8080

When you have opened Guild View in your browser, click the FILES tab of a detect run and click one of the detected images. Guild opens the image in a file viewer. If the image contains detected objects, the objects appear in a bounding box with the detected class.

View detected image in Guild View

You can also use guild open to use your system file explorer to view run files. To view the run directory of the latest detect operation, run:

guild open --operation detect --path detected

From your system file explorer, you can browse detected images and open them in the image viewer of your choice.

Deploy a trained model

You can deploy a trained object detector as single frozen inference graph. This file is generated by the export‑and‑freeze operation.

To obtain this file, use the ls command:

guild ls --operation export-and-freeze --path graph

Guild shows the files under the graph subdirectory of the latest export‑and‑freeze run (run ID will differ):

~/sample-object-detector/env/.guild/runs/572bdf82df9711e88d57066b64a634d0:
  graph/
  graph/checkpoint
  graph/frozen_inference_graph.pb
  graph/model.ckpt.data-00000-of-00001
  graph/model.ckpt.index
  graph/model.ckpt.meta
  graph/pipeline.config
  graph/saved_model/
  graph/saved_model/saved_model.pb
  graph/saved_model/variables/

The first line in the output contains the full path of the run directory of the latest export‑and‑freeze run.

You can use the ‑‑full‑path option to show full paths for each file:

guild ls --operation export-and-freeze \
         --path graph/frozen_inference_graph \
         --full-path

Note

Guild command options often have short-form alternatives. The above command, for example, can also be specified as guild ls ‑o export‑and‑freeze ‑p graph ‑f.

Use ‑‑help with any command for a full list of options.

Summary

In this guide we trained an object detector using a Guild project.

The final Guild project is very simple:

- model: detector
  description: Sample object detector
  extends:
    - gpkg.object-detect/voc-annotated-images-directory-support
    - gpkg.object-detect/ssd-mobilenet-v2

The detect model extends two model configurations: voc‑annotated‑images‑directory‑support and ssd‑mobilenet‑v2, which are both defined in the gpkg.object‑detect package. These extensions add various operations to the model that support a workflow for training, evaluating, and deploying an object detector.


  1. See A Gentle Introduction to Transfer Learning for Deep Learning for an overview of transfer learning. 

  2. See mAP (mean Average Precision) for Object Detection for an overview of measuring accuracy in object detectors.