A resource is named set of files that are used by model operations.
Resources may be defined at two levels:
- Model resource
- Package resource
Model resources are defined by a model. Here’s an example:
# MODEL name: simple-model operations: train: train requires: data resources: data: data.csv
Resources are comprised of one or more sources. A source may be a string or an object.
A source string is equivalent to a source object with the value used
file attribute (see below).
A source object may define these attributes:
urlcannot be used with either
operation—they are mutually exclusive.
filecannot be used with
operation— they are mutually exclusive.
[[PACKAGE/]MODEL:]OPERATION. Multiple operations may be specified by separating them with a comma. For more information see Operation output below.
operationcannot be used with
file—they are mutually exclusive.
selectis required if
false) indicating whether or not to unpack the source. If unspecified, sources are unpacked if they are archives. For more information see Unpacking sources below.
operation collectively represent
the source type. One and only one of these attributes must be
specified for each source object.
Before an operation is started, each required resource must be resolved. Resource resolution consists of these steps for each resource source:
- Acquire the source
- If a SHA-256 hash is available, verify the source
- If the source is an archive and
no, unpack the archive
- Create link to the source within the operation run directory or, if
selectis specified, create a link for each matching path within the source
When all required resources are resolved, Guild will start the operation.
URL sources are stored in Guild’s resource cache. If a source
Source archives may be unpacked to access their constituent files. A
file is considered an archive if it has one of the following
By default, archives are unpacked. You can explicitly disable
unpacking by setting the
unpack source attribute to
Selecting source files
A source may indicate that files within a directory or archive should
be selected for use by specifying the source
select attribute. The
select must be a value regular expression. When specified,
Guild will create links to each matching path within the source
directory or archive.
Archives must be unpacked to select source files.
Links use the basename of each matching file and do not contain parent paths. To illustrate, consider this structure, which may apply to either a file system directory or the contents of an archive:
To create a link to
mnist in the operation run directory, use a
select value of
Here’s a model definition that illustrates this scheme.
name: example operations: train: cmd: train requires: mnist resources: mnist: sources: - url: https://github.com/acme/models/archive/master.zip select: models-master/src/mnist
It’s common for an operation to require the output of another operation. Examples include:
- Model training requires a prepared dataset
- Model compression requires a trained model
- Model deployment requires a compressed model
By using required resources with
operation sources, model developers
can effectively link operations together in a pipeline.
Guild resolves operations using these steps:
If the user specifies a run ID as an argument to the run command in the form
RESOURCE_NAME=RUN_IDGuild will resolve the operation source using the target run directory.
If the user does not specify a run ID, Guild uses the latest non-error run for any of the specified operations. Multiple operations may be specified by separating the operation specs with a comma.
Consider the following model definition:
name: example operations: prepare: cmd: prepare train: cmd: train requires: data resources: data: sources: - operation: prepare select: data.csv
In this example, the
train operation requires output from the
prepare operation. This requirement is expressed using the
requires operation attribute, which references the named resource
data resource consists of a single source:
data.csv, which is generated by the