Train an Image Classifier

In this guide, we train an image classifier on the Fashion-MNIST data set.



If you haven’t already, install and verify Guild AI for this guide. The commands below must be entered in a command console/prompt for your system. If you are unfamiliar with using command consoles, Getting to Know the Command Line - by David Baumgold provides a number of helpful tips.


If you are running on Windows, you must open your command console with as Administrator. This is due to a permission issue in Windows that prevents creation of symblolic links, which Guild uses.

Image classifier training script

In this step, we create a script named, which is an image classifier training script adapted from the official Keras examples. 1

If you haven’t done so already, create a new directory for the project:

mkdir guild-start

Create a file named, located in the guild‑start directory:

from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical

batch_size = 128
epochs = 5
dropout = 0.2
lr = 0.001
lr_decay = 0.0

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))

    optimizer=RMSprop(lr=lr, decay=lr_decay),
    x_train, y_train,
    validation_data=(x_test, y_test),

Verify that your project structure is:

  • guild-start
    • (from Quick Start - not used in this guide)
    • (from Quick Start - not used in this guide)

Train with default settings

In a command console, change to the guild‑start project:

cd guild-start


guild run
You are about to run
  batch_size: 128
  dropout: 0.2
  epochs: 5
  lr: 0.001
  lr_decay: 0.0
Continue? (Y/n)

Press Enter to start training.

By default, the script is configured to train over 5 epochs.

You can view the training results in various ways:

  • List available runs, which includes
guild runs

This commands shows the last 20 runs. If you’re only interested in listing runs of you can filter the list using the ‑o or ‑‑operation command line option:

guild runs -o


You can type a portion of the operation name with ‑o. For example, guild runs ‑o mnist would show all operations that contain mnist.

  • Show information for the run
guild runs info

This command shows information for the latest run.

  • List generated run files
guild ls

This command lists files for the latest run. In the case of, we see:



Directory and filenames will differ on your system.

The events file under logs is a TensorBoard event log generated by the training script — specifically by the TensorBoard Keras callback used by the script.

View results in TensorBoard

View the training run in TensorBoard:

guild tensorboard --operation mnist

This command shows any run matching “mnist” in TensorBoard. If you run this command in a separate command console, you can leave TensorBoard running in the background while you run more operations — Guild automatically syncs TensorBoard with the current runs.

Guild integrates with TensorBoard and automatically synchronizes filtered runs

See the tensorboard command for more information on running TensorBoard from Guild.

When you’re done viewing results in TensorBoard, return to the command prompt and type Ctrl‑C to stop TensorBoard.

TensorBoard 1.13.0 at http://localhost:65397 (Press CTRL+C to quit)
Type Ctrl‑C to stop TensorBoard

Train a second time

Run again — this time specify a different learning rate:

guild run lr=0.01

This changes the learning rate to 0.01 from the default 0.001. It turns out that this value is too high — but we use the scenario to demonstrate a simple trouble shooting process in Guild.

Press Enter to start training.

As the model trains, note the validation accuracy (represented by val_acc in the training output) — it is roughly 10%, which is random guessing! So we know our model isn’t learning.

You can stop the training at any point by typing Ctrl‑C — or let it run to completion.

Compare the runs:

guild compare --table --operation mnist --strict-cols =lr,val_acc

This variation of compare uses ‑‑strict‑cols to only show the columns we’re interested in comparing — in this case, lr and val_acc. The syntax =lr means “the flag lr” and is used to distinguish the value from scalars. val_acc is the name of the scalar used for validation accuracy.

For details on compare options, see the compare command.

Show differences between runs

In the previous step, we tried a learning rate that was too high — our model failed to learn anything at all.

Let’s assume for a moment we didn’t know why this happened. How could we troubleshoot the problem?

Let’s use Guild’s diff command to compare our last two runs. Specifically, we compare changes to flags and source code.

guild diff --flags
--- ~/.guild/runs/7327dbd44bce11e98af6c85b764bbf34/.guild/attrs/flags
+++ ~/.guild/runs/925f38e44bce11e98af6c85b764bbf34/.guild/attrs/flags
@@ -1,5 +1,5 @@
 batch_size: 128
 dropout: 0.2
 epochs: 5
-lr: 0.001
+lr: 0.01
 lr_decay: 0.0


By default, Guild uses the diff command to show differences. You can specify an alternative program when running diff with the c or ‑‑cmd command line option. For example, if you have Meld available on your system, you can compare the last two runs by running guild run ‑c meld.

You can configure the default program used for diffing in user configuration.

Here’s the diff of flags in Meld:

We can see from this comparison exactly what changed across the two runs: the learning rate went from 0.001 to 0.01. While this is a simple example, it demonstrates the value of systematically tracking experiment details.


In this guide we trained a simple image classifier and used TensorBoard and diffing tools to view and compare runs.

  • The training script used in this guide is a realistic example of a real machine learning algorithm
  • We didn’t modify the script to take advance of Guild’s experiment tracking and comparison features
  • We used a simple method of troubleshooting — diffing two runs — to explain a result

Next steps

Learn about Guild files and how they're used to support simple reproducibility in machine learning.
Guild makes it easy to backup and restore runs, including backups to AWS S3 and on-prem servers.
Train a model remotely to take advantage of cloud based GPUs.

  1. The training script for the image classifier is adapted from keras/examples/ on GitHub