Experiment management

Guild tracks each operation as a unique experiment. Simply run your script with the guild command.

$ guild run train.py

Guild automatically captures essential details about the training run:

  • Model metrics such loss and accuracy
  • Generated files such as checkpoints, images, text, etc.
  • Logs and command output
  • Source code used in the experiment

What's the payoff?

When you systematically capture training runs, remarkable things happen:

  • You know when you're improving and when you're regressing
  • You can analyze differences across runs to better understand a result
  • You can share your results with colleagues
  • You can backup trained models — some of which may have taken days to generate — for safekeeping
  • You can optimize your model by focusing on approaches that perform well and avoiding those that don't

But I already use a spreadsheet to track results

Guild is automatic — it does the work for when you run your training script so you don't have to copy and paste. It captures everything associated with an experiment — hyperparameters, metrics, logs, generated files, and even source code snapshots! With this detail, you can do more than merely report results: you can systematically improve your model.

And you can always export your results to a spreadsheet with Guild!

Auto ML

Auto ML stands for automated machine learning — the process of applying machine learning to machine learning. Rather than manually design models and select hyperparameters, have the automatically learn them! The result is better models in less time.

$ guild run train.py x=[-2.0:2.0] --optimizer bayesian --max-trials 100

This commands runs train.py using a Bayesian optimizer:

  • Runs the script 100 times, each time with different values for x
  • Selects values between -2.0 and 2.0 — the search space for x
  • Uses the result of each trial to choose values for x that are likely to improve results (e.g. higher accuracy)

When would I use this?

Whenever you want to improve your model! Model tuning is one of the most effective way to improve accuracy — in some cases it can be more effective than more data! And Guild makes it easy, so why not give it a try and see what happens?

What Bayesian methods does Guild support?

Guild supports a number of state-of-the-art optimization algorithms:

  • Gaussian process
  • Decision tree
  • Gradient boosted trees
  • Tree of Parzen estimator (coming)

But I prefer manual tuning

We wholeheartedly agree! That's why Guild supports an incremental approach to hyperparameter tuning:

  • Use well-known hyperparameters during model development
    $ guild run train.py x=0.1
  • Selectively expand the range with grid search (run once for each value specified)
    $ guild run train.py x=[-0.1,0,0.1,0.2]
  • If your search space is large, try random search
    $ guild run train.py x=[-4.0:4.0] --optimizer random
  • When you want to optimize your model, use Bayesian search
    $ guild run train.py x=[-2.0:2.0] --optimizer bayesian

Of course you can always try Bayesian optimization to start — it can be remarkably efficient! Whatever approach you take, Guild lets you control every step.

For a step-by-step guide, see Get Started - Hyperparameter Optimization.


When you use Guild to train your models, you make it easier for others to reproduce your results — or at least to recreate your experiments. It looks something like this:

Step 1 - Get the project source code

$ git checkout https://github.com/guildai/amazing-results-project

Step 2 - Change to the project directory

$ cd amazing-results-project

Step 3 - Use Guild to recreate the results

$ guild run

Guild takes care of the rest automatically:

  • Download any required libraries and data sets
  • Run the exact command prescribed by the author for recreating the result
  • Capture the results for comparison

I'm not a researcher, I don't need to reproduce my results

Even some researchers feel they don't need to reproduce their results

Reproducibility aside, consider the benefits of automating your worflow:

  • You can run more experiments, which gives you more data, which lets you build better models
  • By automating your steps, you're less likely to make process-related mistakes (e.g. copying the wrong directory, using the wrong hyperparameter, etc.)
  • When presenting your results to your boss, client, or sponsor, your credibility goes way up when you can confidently reproduce a result at any time

For a step-by-step guide, see Get Started - Reproducibility.


Guild is tightly integrated with analytic tools like TensorBoard, which let you easily compare experiment results and drill into training details.

$ guild tensorboard

Compare experiment results in TensorBoard

Guild integration with TensorBoard consists of:

  • Launch TensorBoard with a single command
  • Automatically sync experiments with TensorBoard as they're updated
  • Filter runs by operation name, label, and run status

For a step-by-step guide, see Get Started - TensorBoard.

End-to-end learning

Guild has a powerful workflow feature, which lets you run multiple steps in a single operation. Consider this scenario for training and deploying a model for a mobile application:

  1. Prepare data set for training
  2. Train model
  3. Compress model

The primary goal is to maximize classification accuracy — but because the model is deployed to a resource constrained environment so you want to also minimize model size.

Here's how you'd do it in Guild:

$ guild run workflow --optimizer bayesian --maximize accuracy --minimize model-size

This is the workflow definition:

  description: Run training end-to-end, including model compression
    - prepare-data
    - train
    - compress

Running the workflow operation with a Bayesian optimizer, Guild attempts to both maximize model accuracy and minimize model size by adjusting hyperparameters across each of the three operations: prepare-data, train, and compress.

Remote training and backups

Guild commands can be run remotely! Simply define a remote and reference it using the --remote command line option when running Guild commands.

Here's an example of running train.py on a remote named ec2-v100:

$ guild run train.py --remote ec2-v100

Here's a sample remote configuration:

  type: ec2
  region: us-east-2
  ami: ami-0a47106e391391252
  instance-type: p3.2xlarge

Guild also lets you easily copy experiments to and from remote locations, including AWS S3 and SSH accessible servers.

Here's an example of copying local experiments to a remote named s3:

$ guild push s3

And a sample configuration for s3:

  type: s3
  bucket: my-experiments

Can't I just use floppy disks for backup?

1.44 MB is big but cloud storage even bigger!

Backing experiments up to local storage is trivial with Guild — it's simply a matter of running the push command. Guild copies only the differences so it's an efficient operation.

Simple team collaboration

Remote backup is also easy way to collaborate with colleague and fellow machine learning engineers. Consider this simple workflow:

  1. One or more scientists/engineers run experiments for a particular task with Guild (e.g. experiments explore task performance across a variety of models and hyperparameters).
  2. Each scientist/engineer routinely copies experiments to a common remote location — this serves to backup work but also makes that work available to everyone on the team!
  3. To compare results across the team, a scientist/engineer need only synchronize with the remote location using Guild pull to get everyone's experiments.

For step-by-step guides, see Get Started for Backup and Restore and Remote Training.