Guild tracks each operation as a unique experiment. Simply run
your script with the
Guild automatically captures essential details about the training run:
- Model metrics such loss and accuracy
- Generated files such as checkpoints, images, text, etc.
- Logs and command output
- Source code used in the experiment
What's the payoff?
When you systematically capture training runs, remarkable things happen:
- You know when you're improving and when you're regressing
- You can analyze differences across runs to better understand a result
- You can share your results with colleagues
- You can backup trained models — some of which may have taken days to generate — for safekeeping
- You can optimize your model by focusing on approaches that perform well and avoiding those that don't
But I already use a spreadsheet to track results
Guild is automatic — it does the work for when you run your training script so you don't have to copy and paste. It captures everything associated with an experiment — hyperparameters, metrics, logs, generated files, and even source code snapshots! With this detail, you can do more than merely report results: you can systematically improve your model.
And you can always export your results to a spreadsheet with Guild!
Auto ML stands for automated machine learning — the process of applying machine learning to machine learning. Rather than manually design models and select hyperparameters, have the automatically learn them! The result is better models in less time.
This commands runs
train.py using a Bayesian
- Runs the script 100 times, each time with different values
- Selects values between
2.0— the search space for
- Uses the result of each trial to choose values
xthat are likely to improve results (e.g. higher accuracy)
When would I use this?
Whenever you want to improve your model! Model tuning is one of the most effective way to improve accuracy — in some cases it can be more effective than more data! And Guild makes it easy, so why not give it a try and see what happens?
What Bayesian methods does Guild support?
Guild supports a number of state-of-the-art optimization algorithms:
- Gaussian process
- Decision tree
- Gradient boosted trees
- Tree of Parzen estimator (coming)
But I prefer manual tuning
We wholeheartedly agree! That's why Guild supports an incremental approach to hyperparameter tuning:
- Use well-known hyperparameters during model development
$ guild run train.py x=0.1
- Selectively expand the range with grid search (run once
for each value specified)
$ guild run train.py x=[-0.1,0,0.1,0.2]
- If your search space is large, try random search
$ guild run train.py x=[-4.0:4.0] --optimizer random
- When you want to optimize your model, use Bayesian search
$ guild run train.py x=[-2.0:2.0] --optimizer bayesian
Of course you can always try Bayesian optimization to start — it can be remarkably efficient! Whatever approach you take, Guild lets you control every step.
For a step-by-step guide, see Get Started - Hyperparameter Optimization.
When you use Guild to train your models, you make it easier for others to reproduce your results — or at least to recreate your experiments. It looks something like this:
Step 1 - Get the project source code
Step 2 - Change to the project directory
Step 3 - Use Guild to recreate the results
Guild takes care of the rest automatically:
- Download any required libraries and data sets
- Run the exact command prescribed by the author for recreating the result
- Capture the results for comparison
I'm not a researcher, I don't need to reproduce my results
Even some researchers feel they don't need to reproduce their results
Reproducibility aside, consider the benefits of automating your worflow:
- You can run more experiments, which gives you more data, which lets you build better models
- By automating your steps, you're less likely to make process-related mistakes (e.g. copying the wrong directory, using the wrong hyperparameter, etc.)
- When presenting your results to your boss, client, or sponsor, your credibility goes way up when you can confidently reproduce a result at any time
For a step-by-step guide, see Get Started - Reproducibility.
Guild is tightly integrated with analytic tools like TensorBoard, which let you easily compare experiment results and drill into training details.
Guild integration with TensorBoard consists of:
- Launch TensorBoard with a single command
- Automatically sync experiments with TensorBoard as they're updated
- Filter runs by operation name, label, and run status
For a step-by-step guide, see Get Started - TensorBoard.
Guild has a powerful workflow feature, which lets you run multiple steps in a single operation. Consider this scenario for training and deploying a model for a mobile application:
- Prepare data set for training
- Train model
- Compress model
The primary goal is to maximize classification accuracy — but because the model is deployed to a resource constrained environment so you want to also minimize model size.
Here's how you'd do it in Guild:
This is the
workflow: description: Run training end-to-end, including model compression steps: - prepare-data - train - compress
workflow operation with a Bayesian
optimizer, Guild attempts to both maximize model accuracy and
minimize model size by adjusting hyperparameters across each of
the three operations: prepare-data, train,
Remote training and backups
Guild commands can be run remotely! Simply
remote and reference it using the
command line option when running Guild commands.
Here's an example of running
train.py on a remote
Here's a sample remote configuration:
ec2-v100: type: ec2 region: us-east-2 ami: ami-0a47106e391391252 instance-type: p3.2xlarge
Guild also lets you easily copy experiments to and from remote locations, including AWS S3 and SSH accessible servers.
Here's an example of copying local experiments to a remote
And a sample configuration for
s3: type: s3 bucket: my-experiments
Can't I just use floppy disks for backup?
1.44 MB is big but cloud storage even bigger!
Backing experiments up to local storage is trivial with Guild
— it's simply a matter of running the
command. Guild copies only the differences so it's an efficient
Simple team collaboration
Remote backup is also easy way to collaborate with colleague and fellow machine learning engineers. Consider this simple workflow:
- One or more scientists/engineers run experiments for a particular task with Guild (e.g. experiments explore task performance across a variety of models and hyperparameters).
- Each scientist/engineer routinely copies experiments to a common remote location — this serves to backup work but also makes that work available to everyone on the team!
To compare results across the team, a scientist/engineer
need only synchronize with the remote location using
pullto get everyone's experiments.