Loom Templates

To run an analysis on Loom, you must first have a template that defines the analysis steps and their relative arrangement (input/output dependencies, scatter-gather patterns). An analysis run may then be initiated by assigning input data to an existing template.

A Loom template is defined in a yaml or json file and then imported to the Loom server.

Examples

To run these examples, you will need access to a running Loom server. See Getting Started for help launching a Loom server either locally or in the cloud.

join_two_words

simplest example

This example illustrates the minimal set of features in a Loom template: name, command, environment (defined by a docker image), and input/output definitions.

We use the optional “data” field on the inputs to assign default values.

join_two_words.yaml:

name: join_two_words
command: echo {{word1}} {{word2}}
environment:
  docker_image: ubuntu:latest
inputs:
  - channel: word1
    type: string
    data:
      contents: hello
  - channel: word2
    type: string
    data:
      contents: world
outputs:
  - channel: output_text
    type: string
    source:
      stream: stdout

join_two_words.yaml

The command “echo {{word1}} {{word2}}” makes use of Jinja2 notation to substitute input values. “{{word1}}” in the command will be substituted with the value provided on the “word1” input channel. For inputs of type “string”, “integer”, “boolean”, and “float”, the value substituted is a string representation of the data. For inputs of type “file”, the filename is substituted. The full set of Jinja2 features may be used, including filters, conditional statements, and loops.

Run the join_two_words example

loom template import join_two_words.yaml

# Run with default input data
loom run start join_two_words

# Run with custom input data
loom run start join_two_words word1=foo word2=bar

capitalize_words

array data, iterating over an array input

This template illustrates the concept of non-scalar data (in this case a 1-dimensional array). The default mode for inputs is “no_gather”, which means that rather than gather all the objects into an array to be processed together in a single task, Loom will iterate over the array and execute the command once for each data object, in separate tasks.

Here we capitalize each word in the array. The output from each task executed is a string, but since many tasks are executed, the output is an array of strings.

Note the use of “as_channel” on the input definition. Since our input channel is an array we named the channel with the plural “words”, but this run executes a separate tasks for each element in the array it may be confusing to refer to “{{words}} inside the command. It improves readability to use “as_channel: word”.

capitalize_words.yaml:

name: capitalize_words
command: echo -n {{word}} | awk '{print toupper($0)}'
environment:
  docker_image: ubuntu:latest
inputs:
  - channel: words
    as_channel: word
    type: string
    data:
      contents: [aardvark,aback,abacus,abaft]
outputs:
  - channel: wordoutput
    type: string
    source:
      stream: stdout

capitalize_words.yaml

Run the capitalize_words example

loom template import capitalize_words.yaml

# Run with default input data
loom run start capitalize_words

# Run with custom input data
loom run start capitalize_words words=[uno,dos,tres]

join_array_of_words

array data, gather mode on an input

Earlier we saw how to join two words, each defined on a separate input. But what if we want to join an arbitrary number of words?

This example has a single input, whose default value is an array of words. By setting the mode of this input as “gather”, instead of iterating as in the last example we will execute a single task that receives the full list of words as an input.

In this example we merge the strings and output the result as a string.

join_array_of_words.yaml:

name: join_array_of_words
command: echo -n {{wordarray}}
environment:
  docker_image: ubuntu:latest
inputs:
  - channel: wordarray
    type: string
    mode: gather
    data:
      contents: [aardvark,aback,abacus,abaft]
outputs:
  - channel: wordoutput
    type: string
    source:
      stream: stdout

join_array_of_words.yaml

Run the join_array_of_words example

loom template import join_array_of_words.yaml

# Run with default input data
loom run start join_array_of_words

# Run with custom input data
loom run start join_array_of_words wordarray=[uno,dos,tres]

split_words_into_array

array data, scatter mode on an output, output parsers

This example is the reverse of the previous example. We begin with a scalar string of space-separated words, and split them into an array.

To generate an array output from a single task, we set the output mode to “scatter”.

We also need to instruct Loom how to split the text in stdout to an array. For this we use a parser that uses the space character as the delimiter and trims any extra whitespace characters from the words.

split_words_into_array.yaml:

name: split_words_into_array
command: echo -n {{text}}
environment:
  docker_image: ubuntu:latest
inputs:
  - channel: text
    type: string
    data:
      contents: >
        Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt ut labore et dolore
        magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
        ullamco laboris nisi ut aliquip ex ea commodo consequat.
outputs:
  - channel: wordlist
    type: string
    mode: scatter
    source:
      stream: stdout
    parser:
      type: delimited
      options:
        delimiter: " "
        trim: True

split_words_into_array.yaml

Run the split_words_into_array example

loom template import split_words_into_array.yaml

# Run with default input data
loom run start split_words_into_array

# Run with custom input data
loom run start split_words_into_array text="one two three"

add_then_multiply

multistep templates, connecting inputs and outputs, custom interpreter

All the previous examples have involved just one step. Here we show how to define more than one step in a template.

Also, since we are doing math in this example, it is easier to use python than bash, so we introduce the concept of custom interpreters.

Notice how the flow of data is defined using shared channel names between inputs and outputs. On the top-level template “add_then_multiply” we define input channels “a”, “b”, and “c”. These are used by the steps “add” (“a” and “b”) and “multiply” (“c”). There is also an output from “add” called “ab_sum” that serves as an input for “multiply”. Finally, the output from “multiply”, called “result” is passed up to “add_then_multiply” as a top-level output.

add_then_multiply.yaml:

name: add_then_multiply
inputs:
  - type: integer
    channel: a
    data:
      contents: 3
  - type: integer
    channel: b
    data:
      contents: 5
  - type: integer
    channel: c
    data:
      contents: 7
outputs:
  - type: integer
    channel: result
steps:
  - name: add
    command: print({{ a }} + {{ b }}, end='')
    environment:
      docker_image: python
    interpreter: python
    inputs:
      - type: integer
        channel: a
      - type: integer
        channel: b
    outputs:
      - type: integer
        channel: ab_sum
        source:
          stream: stdout
  - name: multiply
    command: print({{ c }} * {{ ab_sum }}, end='')
    environment:
      docker_image: python
    interpreter: python
    inputs:
      - type: integer
        channel: ab_sum
      - type: integer
        channel: c
    outputs:
      - type: integer
        channel: result
        source:
          stream: stdout

add_then_multiply.yaml

Run the add_then_multiply example

loom template import add_then_multiply.yaml

# Run with default input data
loom run start add_then_multiply

# Run with custom input data
loom run start add_then_multiply a=1 b=2 c=3

building_blocks

reusing templates

Let’s look at another way to write the previous workflow. The “add” and “multiply” steps can be defined as stand-alone workflows. After they are defined, we can create a template that includes those templates as steps.

add.yaml:

name: add
command: print({{ a }} + {{ b }}, end='')
environment:
  docker_image: python
interpreter: python
inputs:
  - type: integer
    channel: a
  - type: integer
    channel: b
outputs:
  - type: integer
    channel: ab_sum
    source:
      stream: stdout

multiply.yaml:

name: multiply
command: print({{ c }} * {{ ab_sum }}, end='')
environment:
  docker_image: python
interpreter: python
inputs:
  - type: integer
    channel: ab_sum
  - type: integer
    channel: c
outputs:
  - type: integer
    channel: result
    source:
      stream: stdout

building_blocks.yaml:

name: building_blocks
inputs:
  - type: integer
    channel: a
    data:
      contents: 3
  - type: integer
    channel: b
    data:
      contents: 5
  - type: integer
    channel: c
    data:
      contents: 7
outputs:
  - type: integer
    channel: result
steps:
  - add
  - multiply
  

add.yaml

multiply.yaml

building_blocks.yaml

Run the building_blocks example

# Import the parent template along with any dependencies
loom template import building_blocks.yaml

# Run with default input data
loom run start building_blocks

# Run with custom input data
loom run start building_blocks a=1 b=2 c=3

search_file

file inputs

Most of these examples use non-file inputs for convenience, but files can be used as inputs and outputs much like other data types.

In this example, the “lorem_ipsum.txt” input file should be imported prior to importing the “search_file.yaml” template that references it.

lorem_ipsum.txt:

Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.

search_file.yaml:

name: search_file
command: grep {{pattern}} {{file_to_search}}
environment:
  docker_image: ubuntu:latest
inputs:
  - channel: file_to_search
    type: file
    data:
      contents: lorem_ipsum.txt
  - channel: pattern
    type: string
    data:
      contents: dolor
outputs:
  - channel: matches
    type: string
    mode: scatter
    source:
      stream: stdout
    parser:
      type: delimited
      options:
        delimiter: "\n"

lorem_ipsum.txt

search_file.yaml

Here is an alternative text file not referenced in the template. We can override the default input file and specify beowulf.txt as the input when starting a run.

beowulf.txt:

Lo! the Spear-Danes' glory through splendid achievements
The folk-kings' former fame we have heard of,
How princes displayed then their prowess-in-battle.
Oft Scyld the Scefing from scathers in numbers
From many a people their mead-benches tore.
Since first he found him friendless and wretched,
The earl had had terror: comfort he got for it,
Waxed 'neath the welkin, world-honor gained,
Till all his neighbors o'er sea were compelled to
Bow to his bidding and bring him their tribute:
An excellent atheling! After was borne him
A son and heir, young in his dwelling,
Whom God-Father sent to solace the people.

beowulf.txt

Run the search_file example

# Import the template along with dependencies
loom template import search_file.yaml

# Run with default input data
loom run start search_file

# Run with custom input data
loom file import beowulf.txt
loom run start search_file pattern=we file_to_search=beowulf.txt\$20b8f89484673eae4f121801e1fec28c

word_combinations

scatter-gather, input groups, output mode gather(n)

When a template step has two inputs rather than one, iteration can be done in two ways:

  • collated iteration: [a,b] + [c,d] => [a+c,b+d]
  • combinatorial iteration: [a,b] + [c,d] => [a+c, a+d, b+c, b+d]

With more than two inputs, we could employ some combination of these two approaches.

“groups” provide a flexible way to define when to use collated or combinatorial iteration. Each input has an integer group ID (the default is 0). All inputs with a common group ID will be combined with collation. Between groups, combinatorial iteration is used.

In this example, we iterate over two inputs, one with an array of adjectives and one with an array of nouns. Since the inputs have different group IDs, we iterate over all possible combinations of word pairs (combinatorial).

word_combinations.yaml:

name: word_combinations
inputs:
  - channel: adjectives
    type: string
    data:
      contents: [green,purple,orange]
  - channel: nouns
    type: string
    data:
      contents: [balloon,button]
outputs:
  - channel: all_word_pairs
    type: file
steps:
  - name: combine_words
    command: echo "{{adjective}} {{noun}}" > {{word_pair_file}}
    environment:
      docker_image: ubuntu
    inputs:
      - channel: adjectives
        as_channel: adjective
        type: string
        group: 0
      - channel: nouns
        as_channel: noun
        type: string
        group: 1
    outputs:
      - channel: word_pair_files
        as_channel: word_pair_file
        type: file
        source:
          filename: word_pair.txt
  - name: merge_word_pairs
    command: cat {{word_pair_files}} > {{all_word_pairs}}
    environment:
      docker_image: ubuntu
    inputs:
      - channel: word_pair_files
        type: file
        mode: gather(2)
    outputs:
      - channel: all_word_pairs
        type: file
        source:
          filename: all_word_pairs.txt

word_combinations.yaml

You may have noticed that we gather the input “word_pair_files” with “mode: gather(2)”. This is because word_pair_files is not just an array, but an array of arrays. We wish to gather it to full depth. You may wish to modify this example to use “mode: gather” (or equivalently “mode: gather(1)”) to see how it affects the result.

Run the word_combinations example

loom template import word_combinations.yaml

# Run with default input data
loom run start word_combinations

# Run with custom input data
loom run start word_combinations adjectives=[little,green] nouns=[men,pickles,apples]

sentence_scoring

nested scatter-gather

Why should we bother differentiating between “gather” and “gather(2)”? This example illustrates why, by showing how to construct a scatter-scatter-gather-gather workflow. On the first gather, we do not fully gather the results into an array, but only gather the last level of nested arrays. This lets us group data for the letters in each word while keeping data for different words separate. On the second gather, we combine the data for each word to get an overall result for the sentence.

sentence_scoring.yaml:

name: sentence_scoring
inputs:
- channel: sentence
  type: string
  hint: Input text to be broken into words and letters
  data:
    contents: I am robot
outputs:
- channel: sentence_value
  type: integer
steps:
- name: split_into_words
  command: echo {{ sentence }}
  inputs:
  - channel: sentence
    type: string
  outputs:
  - channel: words
    mode: scatter
    type: string
    source:
      stream: stdout
    parser:
      type: delimited
      options:
        delimiter: " "
        trim: true
  environment:
    docker_image: ubuntu
- name: split_into_letters
  interpreter: python
  command: print(' '.join([letter for letter in '{{ word }}']))
  inputs:
  - channel: words
    as_channel: word
    type: string
  outputs:
  - channel: letters
    type: string
    mode: scatter
    source:
      stream: stdout
    parser:
      type: delimited
      options:
        delimiter: " "
        trim: true
  environment:
    docker_image: python
- name: letter_to_integer
  interpreter: python
  command: print(ord( '{{ letter }}' ), end='')
  inputs:
  - channel: letters
    as_channel: letter
    type: string
  outputs:
  - channel: letter_values
    type: integer
    source:
      stream: stdout
  environment:
    docker_image: python
- name: sum_word
  interpreter: python
  command: print({{ letter_values|join(' + ') }}, end='')
  inputs:
  - channel: letter_values
    type: integer
    mode: gather
  outputs:
  - channel: word_values
    type: integer
    source:
      stream: stdout
  environment:
    docker_image: python
- name: multiply_sentence
  interpreter: python
  command: print({{ word_values|join(' * ') }}, end='')
  inputs:
  - channel: word_values
    type: integer
    mode: gather
  outputs:
  - channel: sentence_value
    type: integer
    source:
      stream: stdout
  environment:
    docker_image: python

sentence_scoring.yaml

Run the sentence_scoring example

loom template import sentence_scoring.yaml

# Run with default input data
loom run start sentence_scoring

# Run with custom input data
loom run start sentence_scoring sentence='To infinity and beyond'

Special functions

The examples above demonstrated how jinja template notation can be used to incorporate input values into commands, e.g. “echo {{input1}}”. The template context contains all input channel names as keys, but it also contains the special functions below.

If an input uses the same name as a special function, the input value overrides.

index

index[i] returns the one-based index of the current task. So if a run contains 3 parallel tasks, index[1] will return value 1, 2, or 3 for the respective tasks. If the run contains nested parallel tasks, index[i] will return the index of the task in dimension i. If i is a positive integer larger than the dimensionality of the tasks, it will return a default value of 1 (e.g. index[1], index[2], etc. all return 1 for scalar data.). If i is not a positive integer value, a validation error will result.

size

size[i] returns the size of the specified dimension. So if a run contains 3 parallel tasks, size[1] will return a value of 3 for all tasks. If the run contains nested parallel tasks, size[i] will return the size of dimension i. If i is a positive integer larger than the dimensionality of the tasks, it will return a value of 1 (e.g. size[1], size[2], etc. all return 1 for scalar data). If i is not a positive integer value, a validation error will result.

Schemas

Template schema

field required default type example
name yes   string ‘calculate_error’
inputs no [] [Input] [‘channel’: ‘input1’, ‘type’: ‘string’]
outputs no [] [Output] [‘channel’: ‘output1’, ‘type’: ‘string’, ‘source’: {‘stream’: ‘stdout’}]
command* yes   string ‘echo {{input1}}’
interpreter* no /bin/bash -euo pipefail string ‘/usr/bin/python’
resources* no null    
environment* yes   string {‘docker_image’: ‘ubuntu:latest’}
steps+ no [] [Template|string] see examples in previous section

* only on executable steps (leaf nodes)

+ only on container steps (non-leaf nodes)

Input schema

field required default type example
channel yes   string ‘sampleid’
type yes   string ‘file’
mode* no no_gather string ‘gather’
group* no 0 integer 2
hint no   string ‘Enter a quality threshold’
data no null DataNode {‘contents’: [3,7,12]}

* only on executable steps (leaf nodes)

DataNode schema

field required default type example
contents yes     see notes below

DataNode contents can be a valid data value of any type. They can also be a list, or nested lists of any of these types, provided all items are of the same type and at the same nested depth.

data type valid DataNode contents examples invalid DataNode contents examples
integer 172  
float 3.98  
string ‘sx392’  
boolean true  
file myfile.txt  
file myfile.txt$9dd4e461268c8034f5c8564e155c67a6  
file $9dd4e461268c8034f5c8564e155c67a6  
file myfile.txt@ef62b731-e714-4b82-b1a7-057c1032419e  
file myfile.txt@ef62b7  
file @ef62b7  
integer [2,3]  
integer [[2,2],[2,3,5],[17]]  
integer   [2,’three’] (mismatched types)
integer   [[2,2],[2,3,[5,17]]] (mismatched depths)

Output schema

field required default type example
channel yes   string ‘sampleid’
type yes   string ‘file’
mode* no no_gather string ‘gather’
parser* no null OutputParser {‘type’: ‘delimited’, ‘options’: {‘delimiter’: ‘,’}
source* yes   OutputSource {‘glob’: ‘*.dat’}

* only on executable steps (leaf nodes)

OutputParser schema

field required default type example
type* yes   string ‘delimited’
options no   ParserOptions {‘delimiter’:’ ‘,’trim’:true}

* Currently “delimited” is the only OutputParser type

OutputSource schema

field required default type example
filename* false   string ‘out.txt’
stream* false   string ‘stderr’
glob+ false   string ‘*.txt’
filenames+ false   string [‘out1.txt’,’out2.txt’]

* When used with outputs with “scatter” mode, an OutputParser is required

+ Only for outputs with “scatter” mode. (No parser required.) The “glob” field supports “*”, ”?”, and character ranges using “[]”.