- Start a run
- Flag values
- List runs
- Get run information
- Compare runs
- Label runs
- Delete runs
- Restore deleted runs
- Purge deleted runs
Runs are generated in Guild AI by running an operation.
When you train a model, you generate a run, which contains the trained model as well as training logs and other artifacts associated with the operation.
Similarly, when you fine tune a model, you generate a run. When you test a model, you generate a run. In fact, any operation that you run generates a distinct run. This is how Guild manages your work.
Here is a common work flow:
- Find and install a model
- Run an operation on that model (e.g.
- Monitor the progress of the operation (e.g.
- Run another operation with different hyper-parameters (flags)
- Compare runs
- Delete runs that you’re no longer interested in
- Select successful runs for deployment or use in other operations
The work centers on runs—creating, comparing, and selecting.
As you work with runs in Guild it’s important to understand some core concepts. If you’d prefer to skip this conceptual material, jump to Start a run below.
A run directory is a file system directory (folder) that contains artifacts associated with a run. Guild creates a unique run directory for every run. This directory contains a variety of important data:
- Run metadata
- Run sources such as datasets
- Run output such as event logs and saved models
Run directories are located in
GUILD_HOME/runs. For more information
see Guild home.
Run related operations interact with run directories in various ways:
guild runcreates a new run directory
guild runs infoprints information read from a run directory
guild runs listenumerates run directories
guild runs deletedeletes run directories
Over time you’ll generate a large number of runs. This list can become unwieldy, especially when you’re interested in a small subset— e.g. runs associated with a particular model you’re working with. For this reason, Guild provides two ways of limiting the runs that apply to run related commands:
- Limit to runs associated with a model defined in the current directory
- Limit to runs that match a filter
The first limit is known as run scope. Scope can be either local or global. By default, scope is local when the current directory contains a model definition, otherwise scope is global. Local scope limits runs to those associated with models defined in the current directory. Global scope displays all runs.
Global scope can be applied using the
Run scope is applied based on the directory that Guild commands are run in. Consider the following directory structure:
- HomeDoes not contain a model definition — global scope applies
- mnist Contains a model definition — local scope applies
- MODELS Model definition
- mnist Contains a model definition — local scope applies
Commands run the from
/Home have global run scope because
doesn’t contain a model definition. Commands run from
however have local scope because that directory contains a model
Run scope defaults to local when a model definition is exists
because Guild assumes that the user is working on models defined at
that location and is not interested in other runs, at least by
default. This follows the pattern of command line tools such as
that apply operations locally when they find a project, repository,
etc. in the current directory.
When a command is run in local scope, Guild prints a message to indicate that results are limited:
Limiting runs to the current directory (use ‑‑all to include all)
The other limit is run filtering. Filters are applied with command line options that specify run attributes, which may include:
- Run status
- Deleted status
Run filtering is applied after run scope (see above).
For example, to view runs that are associated with the
operation, use the
guild runs --op train
If the command is in local scope, Guild will limit runs to those associated with models in the current directory otherwise it will use all runs. It will then filter those runs, limiting the result to those associated with operations containing the string “train”.
Some run related commands let you select one or more runs:
For these commands, runs can be specified in various ways:
- Index as returned by
guild runs list
- Run ID (full or partial if unique)
Additionally, a range may be specified using run indexes in the form:
START are inclusive—runs are selected beginning with
STOP index up to and including those with the
START are optional. If
START is omitted it is
assumed to be
1 (i.e. the first run in the list). If
omitted it is assumed to be the index of the last run.
Run indexes are relative to the list of runs returned by
guild runs list for a given scope and filter (see
Limiting runs above). The run associated with index
1 for one listing may not be the same run for another
listing. Always verify the selected runs before proceeding with a
When in doubt, use a run ID to select a run.
Consider this output from
Limiting runs to the current directory (use --all to include all) [1:9734f85e] ./slim-resnet-101:train 2017-12-14 07:56:32 terminated [2:d8cde0fc] ./slim-resnet-50:export 2017-12-13 13:14:31 completed [3:0df943ac] ./slim-resnet-50:predict 2017-12-06 11:51:15 completed [4:e150e44a] ./slim-resnet-50:predict 2017-12-06 11:50:00 completed
The run scope in the above command is local. If
the user had run
guild runs ‑‑all the scope would be global—
the list and run indexes would likely be different.
Below are various operations with run selectors applied to this list.
guild runs rm 1
9734f85e(you can always use index
1to select the most recently started run in the list)
guild runs rm 2:3
guild runs rm :
guild runs rm 0df943ac e150e44a
The following assumptions must hold for the above examples that use run indexes:
Commands must be executed in the same directory as the command that generated the list and without scope modifiers or filters
The runs themselves must not change—i.e. runs cannot be deleted or started
Start a run
To start a run, use the run command. The basic format of a
run command looks like this:
guild run OPERATION guild run MODEL:OPERATION guild run PACKAGE/MODEL:OPERATION
You can list available operations using the operations command.
In general, you can omit information about an operation name as long as Guild can uniquely identify the operation.
For example, if the output of
operations looks like this:
iris/iris-cnn:train iris/iris-cnn:finetune iris/iris-cnn:test
You can start the
finetune operation by running:
guild run finetune
You can always provide the model or package. For example, this form
will also start
guild run iris-cnn:finetune
You use part of the operation specification as long as Guild can
uniquely identify the operation. For example, you can run the
guild run cnn:train
Some operations are so common that Guild provides alias commands. Aliases currently include:
Aliases are used to start operation using these forms:
guild ALIAS_CMD guild ALIAS_CMD MODEL guild ALIAS_CMD PACKAGE/MODEL
train alias is used to run the
train operation. In the example
above, the following commands can be used to train the iris model:
guild train guild train iris-cnn guild train cnn
Specify operation flag values as
NAME=VALUE arguments to run.
To get help on available and required flags for an operation, run:
guild run OPERATION --help-op
You can also view help for models defined in the current directory by running:
To get help for a packaged model, run:
guild help PACKAGE
If you omit a required flag, the
run command (or applicable alias)
will exit with an error message.
guild runs is shorthand for
guild runs list.
will display different lists depending on the directory it’s run in. If the directory contains a model definition, runs will be limited to those associated with the locally defined models. If the directory does not contain a model definition, all runs are displayed.
Get run information
Use runs info to show information about a run.
By default, Guild shows information about the latest run:
guild runs info
You can select a specific run by providing a run ID or index.
Run indexes are displayed in run lists (see List runs above).
Compare runs by running:
Guild Compare is spreadsheet-like application that displays runs, their status, and metrics such as validation accuracy and training loss.
To display compare results as a table, use:
guild compare --table
To display compare results in CSV format (e.g. for use in Excel), use:
guild compare --csv
For more help, see the compare command.
Runs can have labels, which provide additional information about the run. A label can used for filtering in the runs list command.
Use runs label to set or clear a label for a run.
guild runs list LABEL to list runs with the specified label.
Delete runs using
guild runs delete or
guild runs rm. See
runs delete for command details.
Guild will display the list of runs to be deleted and ask you to
confirm the operation. You must type
y and then press
You can permanently delete runs by including the
Permanently deleted runs cannot be recovered! We
recommend that you do NOT permanently delete runs as a part of
your typical work flow. By omitting the
‑‑permanent flag, you
have the opportunity to recover a run that you unexpectedly
need. In time you can purge deleted runs using the purge
command (see below).
Frequently used delete commands
To delete all failed runs (i.e. “error” status), use:
guild runs rm -E
To delete all failed as well as terminated runs, use:
guild runs rm -ET
Restore deleted runs
Deleted runs can be recovered by running:
guild runs restore [RUN...]
For more help, see the runs restore command.
Purge deleted runs
The disk space used by deleted runs can be recovered by permanently deleting them using runs purge.
You can show the list deleted runs using
guild runs ‑‑deleted.
For example, to permanently delete all deleted runs, use:
guild runs purge
Guild will prompt you before proceeding.
Purging deleted runs will permanently delete them! Be certain that you don’t need a run before permanently deleting it.
For more help, see the runs purge command.