Resources

  1. Resource sources
  2. Resolving resources
  3. Unpacking sources
  4. Selecting source files
  5. Operation output

A resource is named set of files that are used by model operations.

Resources may be defined at two levels:

  • Model resource
  • Package resource

Model resources are defined by a model. Here’s an example:

# MODEL
name: simple-model
operations:
  train: train
  requires: data
resources:
  data: data.csv

Resource sources

Resources are comprised of one or more sources. A source may be a string or an object.

A source string is equivalent to a source object with the value used for the file attribute (see below).

A source object may define these attributes:

url
Source is located on a remote server and is accessible via a URL. The protocols http and https are supported. url cannot be used with either file or operation—they are mutually exclusive.
file
Source is a file or directory located relative to the model or package file. file cannot be used with url or operation— they are mutually exclusive.
operation
Source is a file generated by a model operation. Operations must be specified as [[PACKAGE/]MODEL:]OPERATION. Multiple operations may be specified by separating them with a comma. For more information see Operation output below. operation cannot be used with url or file—they are mutually exclusive.
select
A regular expression used to select files from a local directory, archive, or a run directory if the source is an operation. For more information see Selecting source files below. select is required if operation is used.
sha256
A SHA-256 hash of the resource source. If specified, the source SHA-256 hash must match this value for the resource to resolve.
unpack
A boolean flag (true or false) indicating whether or not to unpack the source. If unspecified, sources are unpacked if they are archives. For more information see Unpacking sources below.

The attributes url, file, and operation collectively represent the source type. One and only one of these attributes must be specified for each source object.

Resolving resources

Before an operation is started, each required resource must be resolved. Resource resolution consists of these steps for each resource source:

  • Acquire the source
  • If a SHA-256 hash is available, verify the source
  • If the source is an archive and unpack is not no, unpack the archive
  • Create link to the source within the operation run directory or, if select is specified, create a link for each matching path within the source

When all required resources are resolved, Guild will start the operation.

URL sources are stored in Guild’s resource cache. If a source

Unpacking sources

Source archives may be unpacked to access their constituent files. A file is considered an archive if it has one of the following extensions: .zip, .tar, .tgz, .tar.*.

By default, archives are unpacked. You can explicitly disable unpacking by setting the unpack source attribute to no.

Selecting source files

A source may indicate that files within a directory or archive should be selected for use by specifying the source select attribute. The value of select must be a value regular expression. When specified, Guild will create links to each matching path within the source directory or archive.

Archives must be unpacked to select source files.

Links use the basename of each matching file and do not contain parent paths. To illustrate, consider this structure, which may apply to either a file system directory or the contents of an archive:

  • models-master
    • src
      • mnist

To create a link to mnist in the operation run directory, use a select value of models‑master/src/mnist.

Here’s a model definition that illustrates this scheme.

name: example
operations:
  train:
    cmd: train
    requires: mnist
resources:
  mnist:
    sources:
    - url: https://github.com/acme/models/archive/master.zip
      select: models-master/src/mnist

Operation output

It’s common for an operation to require the output of another operation. Examples include:

  • Model training requires a prepared dataset
  • Model compression requires a trained model
  • Model deployment requires a compressed model

By using required resources with operation sources, model developers can effectively link operations together in a pipeline.

Guild resolves operations using these steps:

  • If the user specifies a run ID as an argument to the run command in the form RESOURCE_NAME=RUN_ID Guild will resolve the operation source using the target run directory.

  • If the user does not specify a run ID, Guild uses the latest non-error run for any of the specified operations. Multiple operations may be specified by separating the operation specs with a comma.

Consider the following model definition:

name: example
operations:
  prepare:
    cmd: prepare
  train:
    cmd: train
    requires: data
resources:
  data:
    sources:
    - operation: prepare
      select: data.csv

In this example, the train operation requires output from the prepare operation. This requirement is expressed using the requires operation attribute, which references the named resource data. The data resource consists of a single source: data.csv, which is generated by the prepare operation.