Backup and Restore
Guild lets you easily backup runs to the cloud and restore those runs at a later time. This is useful for safeguarding runs from accidental deletion and can also be used for collaboration.
This guide demonstrates Guild’s backup and restore capabilities on Amazon S3. Refer to Requirements below for more information.
Storing files on S3 will incur ongoing costs. Refer to Cleanup below for steps to delete all files uploaded to S3 from this guide.
This guide is a continuation of Train an Image Classifier. Complete that guide before proceeding below.
In addition, you must complete the following steps for Amazon S3 bucket support:
Note the name of the S3 bucket created.
Note the AWS access key ID and secret access key of the IAM user who has write access to the S3 bucket.
remote entry in Guild config
Using your text editor, open
Add the following to the bottom of the file:
<name of S3 bucket> with the name of the S3 bucket created
in Requirements above.
remotes section already exists in
that line from the snippet above and only copy the lines after
Save you changes to
Set AWS environment variables
In the same command console you’ll use throughout this guide, set the following environment variables:
export AWS_ACCESS_KEY_ID=<access key ID> export AWS_SECRET_ACCESS_KEY=<secret access key>
<access key ID> and
<secret access key> with the
respective values associated with the IAM user who has write access to
the S3 bucket.
In the same console, verify that you have access to the S3 bucket from Guild by running:
guild remote status s3
If you have access, you will see:
s3 (S3 bucket BUCKET_NAME) is available
BUCKET_NAME is the name of the S3 bucket associated with the
Train the Fashion-MNIST classifier by running:
guild run fashion_mnist_mlp.py
Enter to start training.
This creates a run that we backup in the next step.
Backup latest run to S3
Copy the latest run to the
s3 remote by running:
guild push 1 s3
1 in the command tells Guild to only copy the latest run.
is the name of the remote you configured above.
Guild prompts before copying:
You are about to copy (push) the following runs to s3: [0def3262] fashion_mnist_mlp.py 2019-03-22 11:53:44 completed Continue? (Y/n)
Enter to continue.
Guild copies the latest run to the S3 bucket. You can verify this using the AWS Management Console.
Additionally, you can list runs available on the
s3 remote using:
guild runs -r s3
Synchronizing runs with s3 [1:0def3262] fashion_mnist_mlp.py 2019-03-22 11:53:44 completed
Delete the latest local run
Later we restore the
fashion_mnist_mlp.py run from S3 — so let’s
delete the local run first:
guild runs rm 1
Once again, the number
1 in the command tells Guild to only delete
the latest run.
Enter to delete the run.
Restore the deleted run
In this step, we restore the deleted run by copying it from its backup location on S3.
To copy latest the run, use:
guild pull 1 s3
As with the earlier commands, the number
1 indicates that we only
want to copy the latest run from S3. If you omit this argument, Guild
will copy all of the runs from S3.
Enter to copy the run from S3.
You can verify that the run has been copied with:
guild runs info
Compare that information to the same command applied to the remote:
guild runs info -r s3
To delete all of the runs from S3, run:
guild runs rm --permanent -r s3
Note that this command deletes all of the runs from the
bucket — not just the latest run. Guild prompts you before deleting.
Verify the list and press
y followed by
If you don’t use the
‑‑permanent command line option, the
run is not deleted from S3 as it can be restored using
restore ‑r s3. If you want to truly delete the run, use the
Verify that there are no runs on the
guild runs -r s3
And that there are no restorable runs:
guild runs -r s3 --deleted
In this guide we used S3 to backup and restore a run.
This feature is useful for guarding against accidental run deletion, but it’s also useful when collaborating on teams:
One or more researchers or engineers train models and push results to S3
Other users pull runs for various uses including comparison, summary, and release processes