Loom Templates¶
To run an analysis on Loom, you must first have a template that defines the analysis steps and their relative arrangement (input/output dependencies, scatter-gather patterns). An analysis run may then be initiated by assigning input data to an existing template.
A Loom template is defined in a yaml or json file and then imported to the Loom server.
Examples¶
To run these examples, you will need access to a running Loom server. See Getting Started for help launching a Loom server either locally or in the cloud.
join_two_words¶
simplest example
This example illustrates the minimal set of features in a Loom template: name, command, environment (defined by a docker image), and input/output definitions.
We use the optional “data” field on the inputs to assign default values.
join_two_words.yaml:
name: join_two_words
command: echo {{word1}} {{word2}}
environment:
docker_image: ubuntu:latest
inputs:
- channel: word1
type: string
data:
contents: hello
- channel: word2
type: string
data:
contents: world
outputs:
- channel: output_text
type: string
source:
stream: stdout
The command “echo {{word1}} {{word2}}” makes use of Jinja2 notation to substitute input values. “{{word1}}” in the command will be substituted with the value provided on the “word1” input channel. For inputs of type “string”, “integer”, “boolean”, and “float”, the value substituted is a string representation of the data. For inputs of type “file”, the filename is substituted. The full set of Jinja2 features may be used, including filters, conditional statements, and loops.
Run the join_two_words example
loom template import join_two_words.yaml
# Run with default input data
loom run start join_two_words
# Run with custom input data
loom run start join_two_words word1=foo word2=bar
capitalize_words¶
array data, iterating over an array input
This template illustrates the concept of non-scalar data (in this case a 1-dimensional array). The default mode for inputs is “no_gather”, which means that rather than gather all the objects into an array to be processed together in a single task, Loom will iterate over the array and execute the command once for each data object, in separate tasks.
Here we capitalize each word in the array. The output from each task executed is a string, but since many tasks are executed, the output is an array of strings.
Note the use of “as_channel” on the input definition. Since our input channel is an array we named the channel with the plural “words”, but this run executes a separate tasks for each element in the array it may be confusing to refer to “{{words}} inside the command. It improves readability to use “as_channel: word”.
capitalize_words.yaml:
name: capitalize_words
command: echo -n {{word}} | awk '{print toupper($0)}'
environment:
docker_image: ubuntu:latest
inputs:
- channel: words
as_channel: word
type: string
data:
contents: [aardvark,aback,abacus,abaft]
outputs:
- channel: wordoutput
type: string
source:
stream: stdout
Run the capitalize_words example
loom template import capitalize_words.yaml
# Run with default input data
loom run start capitalize_words
# Run with custom input data
loom run start capitalize_words words=[uno,dos,tres]
join_array_of_words¶
array data, gather mode on an input
Earlier we saw how to join two words, each defined on a separate input. But what if we want to join an arbitrary number of words?
This example has a single input, whose default value is an array of words. By setting the mode of this input as “gather”, instead of iterating as in the last example we will execute a single task that receives the full list of words as an input.
In this example we merge the strings and output the result as a string.
join_array_of_words.yaml:
name: join_array_of_words
command: echo -n {{wordarray}}
environment:
docker_image: ubuntu:latest
inputs:
- channel: wordarray
type: string
mode: gather
data:
contents: [aardvark,aback,abacus,abaft]
outputs:
- channel: wordoutput
type: string
source:
stream: stdout
Run the join_array_of_words example
loom template import join_array_of_words.yaml
# Run with default input data
loom run start join_array_of_words
# Run with custom input data
loom run start join_array_of_words wordarray=[uno,dos,tres]
split_words_into_array¶
array data, scatter mode on an output, output parsers
This example is the reverse of the previous example. We begin with a scalar string of space-separated words, and split them into an array.
To generate an array output from a single task, we set the output mode to “scatter”.
We also need to instruct Loom how to split the text in stdout to an array. For this we use a parser that uses the space character as the delimiter and trims any extra whitespace characters from the words.
split_words_into_array.yaml:
name: split_words_into_array
command: echo -n {{text}}
environment:
docker_image: ubuntu:latest
inputs:
- channel: text
type: string
data:
contents: >
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et dolore
magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
outputs:
- channel: wordlist
type: string
mode: scatter
source:
stream: stdout
parser:
type: delimited
options:
delimiter: " "
trim: True
Run the split_words_into_array example
loom template import split_words_into_array.yaml
# Run with default input data
loom run start split_words_into_array
# Run with custom input data
loom run start split_words_into_array text="one two three"
add_then_multiply¶
multistep templates, connecting inputs and outputs, custom interpreter
All the previous examples have involved just one step. Here we show how to define more than one step in a template.
Also, since we are doing math in this example, it is easier to use python than bash, so we introduce the concept of custom interpreters.
Notice how the flow of data is defined using shared channel names between inputs and outputs. On the top-level template “add_then_multiply” we define input channels “a”, “b”, and “c”. These are used by the steps “add” (“a” and “b”) and “multiply” (“c”). There is also an output from “add” called “ab_sum” that serves as an input for “multiply”. Finally, the output from “multiply”, called “result” is passed up to “add_then_multiply” as a top-level output.
add_then_multiply.yaml:
name: add_then_multiply
inputs:
- type: integer
channel: a
data:
contents: 3
- type: integer
channel: b
data:
contents: 5
- type: integer
channel: c
data:
contents: 7
outputs:
- type: integer
channel: result
steps:
- name: add
command: print({{ a }} + {{ b }}, end='')
environment:
docker_image: python
interpreter: python
inputs:
- type: integer
channel: a
- type: integer
channel: b
outputs:
- type: integer
channel: ab_sum
source:
stream: stdout
- name: multiply
command: print({{ c }} * {{ ab_sum }}, end='')
environment:
docker_image: python
interpreter: python
inputs:
- type: integer
channel: ab_sum
- type: integer
channel: c
outputs:
- type: integer
channel: result
source:
stream: stdout
Run the add_then_multiply example
loom template import add_then_multiply.yaml
# Run with default input data
loom run start add_then_multiply
# Run with custom input data
loom run start add_then_multiply a=1 b=2 c=3
building_blocks¶
reusing templates
Let’s look at another way to write the previous workflow. The “add” and “multiply” steps can be defined as stand-alone workflows. After they are defined, we can create a template that includes those templates as steps.
add.yaml:
name: add
command: print({{ a }} + {{ b }}, end='')
environment:
docker_image: python
interpreter: python
inputs:
- type: integer
channel: a
- type: integer
channel: b
outputs:
- type: integer
channel: ab_sum
source:
stream: stdout
multiply.yaml:
name: multiply
command: print({{ c }} * {{ ab_sum }}, end='')
environment:
docker_image: python
interpreter: python
inputs:
- type: integer
channel: ab_sum
- type: integer
channel: c
outputs:
- type: integer
channel: result
source:
stream: stdout
building_blocks.yaml:
name: building_blocks
inputs:
- type: integer
channel: a
data:
contents: 3
- type: integer
channel: b
data:
contents: 5
- type: integer
channel: c
data:
contents: 7
outputs:
- type: integer
channel: result
steps:
- add
- multiply
Run the building_blocks example
# Import the parent template along with any dependencies
loom template import building_blocks.yaml
# Run with default input data
loom run start building_blocks
# Run with custom input data
loom run start building_blocks a=1 b=2 c=3
search_file¶
file inputs
Most of these examples use non-file inputs for convenience, but files can be used as inputs and outputs much like other data types.
In this example, the “lorem_ipsum.txt” input file should be imported prior to importing the “search_file.yaml” template that references it.
lorem_ipsum.txt:
Lorem ipsum dolor sit amet, consectetur adipiscing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
search_file.yaml:
name: search_file
command: grep {{pattern}} {{file_to_search}}
environment:
docker_image: ubuntu:latest
inputs:
- channel: file_to_search
type: file
data:
contents: lorem_ipsum.txt
- channel: pattern
type: string
data:
contents: dolor
outputs:
- channel: matches
type: string
mode: scatter
source:
stream: stdout
parser:
type: delimited
options:
delimiter: "\n"
Here is an alternative text file not referenced in the template. We can override the default input file and specify beowulf.txt as the input when starting a run.
beowulf.txt:
Lo! the Spear-Danes' glory through splendid achievements
The folk-kings' former fame we have heard of,
How princes displayed then their prowess-in-battle.
Oft Scyld the Scefing from scathers in numbers
From many a people their mead-benches tore.
Since first he found him friendless and wretched,
The earl had had terror: comfort he got for it,
Waxed 'neath the welkin, world-honor gained,
Till all his neighbors o'er sea were compelled to
Bow to his bidding and bring him their tribute:
An excellent atheling! After was borne him
A son and heir, young in his dwelling,
Whom God-Father sent to solace the people.
Run the search_file example
# Import the template along with dependencies
loom template import search_file.yaml
# Run with default input data
loom run start search_file
# Run with custom input data
loom file import beowulf.txt
loom run start search_file pattern=we file_to_search=beowulf.txt\$20b8f89484673eae4f121801e1fec28c
word_combinations¶
scatter-gather, input groups, output mode gather(n)
When a template step has two inputs rather than one, iteration can be done in two ways:
- collated iteration: [a,b] + [c,d] => [a+c,b+d]
- combinatorial iteration: [a,b] + [c,d] => [a+c, a+d, b+c, b+d]
With more than two inputs, we could employ some combination of these two approaches.
“groups” provide a flexible way to define when to use collated or combinatorial iteration. Each input has an integer group ID (the default is 0). All inputs with a common group ID will be combined with collation. Between groups, combinatorial iteration is used.
In this example, we iterate over two inputs, one with an array of adjectives and one with an array of nouns. Since the inputs have different group IDs, we iterate over all possible combinations of word pairs (combinatorial).
word_combinations.yaml:
name: word_combinations
inputs:
- channel: adjectives
type: string
data:
contents: [green,purple,orange]
- channel: nouns
type: string
data:
contents: [balloon,button]
outputs:
- channel: all_word_pairs
type: file
steps:
- name: combine_words
command: echo "{{adjective}} {{noun}}" > {{word_pair_file}}
environment:
docker_image: ubuntu
inputs:
- channel: adjectives
as_channel: adjective
type: string
group: 0
- channel: nouns
as_channel: noun
type: string
group: 1
outputs:
- channel: word_pair_files
as_channel: word_pair_file
type: file
source:
filename: word_pair.txt
- name: merge_word_pairs
command: cat {{word_pair_files}} > {{all_word_pairs}}
environment:
docker_image: ubuntu
inputs:
- channel: word_pair_files
type: file
mode: gather(2)
outputs:
- channel: all_word_pairs
type: file
source:
filename: all_word_pairs.txt
You may have noticed that we gather the input “word_pair_files” with “mode: gather(2)”. This is because word_pair_files is not just an array, but an array of arrays. We wish to gather it to full depth. You may wish to modify this example to use “mode: gather” (or equivalently “mode: gather(1)”) to see how it affects the result.
Run the word_combinations example
loom template import word_combinations.yaml
# Run with default input data
loom run start word_combinations
# Run with custom input data
loom run start word_combinations adjectives=[little,green] nouns=[men,pickles,apples]
sentence_scoring¶
nested scatter-gather
Why should we bother differentiating between “gather” and “gather(2)”? This example illustrates why, by showing how to construct a scatter-scatter-gather-gather workflow. On the first gather, we do not fully gather the results into an array, but only gather the last level of nested arrays. This lets us group data for the letters in each word while keeping data for different words separate. On the second gather, we combine the data for each word to get an overall result for the sentence.
sentence_scoring.yaml:
name: sentence_scoring
inputs:
- channel: sentence
type: string
hint: Input text to be broken into words and letters
data:
contents: I am robot
outputs:
- channel: sentence_value
type: integer
steps:
- name: split_into_words
command: echo {{ sentence }}
inputs:
- channel: sentence
type: string
outputs:
- channel: words
mode: scatter
type: string
source:
stream: stdout
parser:
type: delimited
options:
delimiter: " "
trim: true
environment:
docker_image: ubuntu
- name: split_into_letters
interpreter: python
command: print(' '.join([letter for letter in '{{ word }}']))
inputs:
- channel: words
as_channel: word
type: string
outputs:
- channel: letters
type: string
mode: scatter
source:
stream: stdout
parser:
type: delimited
options:
delimiter: " "
trim: true
environment:
docker_image: python
- name: letter_to_integer
interpreter: python
command: print(ord( '{{ letter }}' ), end='')
inputs:
- channel: letters
as_channel: letter
type: string
outputs:
- channel: letter_values
type: integer
source:
stream: stdout
environment:
docker_image: python
- name: sum_word
interpreter: python
command: print({{ letter_values|join(' + ') }}, end='')
inputs:
- channel: letter_values
type: integer
mode: gather
outputs:
- channel: word_values
type: integer
source:
stream: stdout
environment:
docker_image: python
- name: multiply_sentence
interpreter: python
command: print({{ word_values|join(' * ') }}, end='')
inputs:
- channel: word_values
type: integer
mode: gather
outputs:
- channel: sentence_value
type: integer
source:
stream: stdout
environment:
docker_image: python
Run the sentence_scoring example
loom template import sentence_scoring.yaml
# Run with default input data
loom run start sentence_scoring
# Run with custom input data
loom run start sentence_scoring sentence='To infinity and beyond'
Special functions¶
The examples above demonstrated how jinja template notation can be used to incorporate input values into commands, e.g. “echo {{input1}}”. The template context contains all input channel names as keys, but it also contains the special functions below.
If an input uses the same name as a special function, the input value overrides.
index¶
index[i] returns the one-based index of the current task. So if a run contains 3 parallel tasks, index[1] will return value 1, 2, or 3 for the respective tasks. If the run contains nested parallel tasks, index[i] will return the index of the task in dimension i. If i is a positive integer larger than the dimensionality of the tasks, it will return a default value of 1 (e.g. index[1], index[2], etc. all return 1 for scalar data.). If i is not a positive integer value, a validation error will result.
size¶
size[i] returns the size of the specified dimension. So if a run contains 3 parallel tasks, size[1] will return a value of 3 for all tasks. If the run contains nested parallel tasks, size[i] will return the size of dimension i. If i is a positive integer larger than the dimensionality of the tasks, it will return a value of 1 (e.g. size[1], size[2], etc. all return 1 for scalar data). If i is not a positive integer value, a validation error will result.
Schemas¶
Template schema¶
field | required | default | type | example |
---|---|---|---|---|
name | yes | string | ‘calculate_error’ | |
inputs | no | [] | [Input] | [‘channel’: ‘input1’, ‘type’: ‘string’] |
outputs | no | [] | [Output] | [‘channel’: ‘output1’, ‘type’: ‘string’, ‘source’: {‘stream’: ‘stdout’}] |
command* | yes | string | ‘echo {{input1}}’ | |
interpreter* | no | /bin/bash -euo pipefail | string | ‘/usr/bin/python’ |
resources* | no | null | ||
environment* | yes | string | {‘docker_image’: ‘ubuntu:latest’} | |
steps+ | no | [] | [Template|string] | see examples in previous section |
* only on executable steps (leaf nodes)
+ only on container steps (non-leaf nodes)
Input schema¶
field | required | default | type | example |
---|---|---|---|---|
channel | yes | string | ‘sampleid’ | |
type | yes | string | ‘file’ | |
mode* | no | no_gather | string | ‘gather’ |
group* | no | 0 | integer | 2 |
hint | no | string | ‘Enter a quality threshold’ | |
data | no | null | DataNode | {‘contents’: [3,7,12]} |
* only on executable steps (leaf nodes)
DataNode schema¶
field | required | default | type | example |
---|---|---|---|---|
contents | yes | see notes below |
DataNode contents can be a valid data value of any type. They can also be a list, or nested lists of any of these types, provided all items are of the same type and at the same nested depth.
data type | valid DataNode contents examples | invalid DataNode contents examples |
---|---|---|
integer | 172 | |
float | 3.98 | |
string | ‘sx392’ | |
boolean | true | |
file | myfile.txt | |
file | myfile.txt$9dd4e461268c8034f5c8564e155c67a6 | |
file | $9dd4e461268c8034f5c8564e155c67a6 | |
file | myfile.txt@ef62b731-e714-4b82-b1a7-057c1032419e | |
file | myfile.txt@ef62b7 | |
file | @ef62b7 | |
integer | [2,3] | |
integer | [[2,2],[2,3,5],[17]] | |
integer | [2,’three’] (mismatched types) | |
integer | [[2,2],[2,3,[5,17]]] (mismatched depths) |
Output schema¶
field | required | default | type | example |
---|---|---|---|---|
channel | yes | string | ‘sampleid’ | |
type | yes | string | ‘file’ | |
mode* | no | no_gather | string | ‘gather’ |
parser* | no | null | OutputParser | {‘type’: ‘delimited’, ‘options’: {‘delimiter’: ‘,’} |
source* | yes | OutputSource | {‘glob’: ‘*.dat’} |
* only on executable steps (leaf nodes)
OutputParser schema¶
field | required | default | type | example |
---|---|---|---|---|
type* | yes | string | ‘delimited’ | |
options | no | ParserOptions | {‘delimiter’:’ ‘,’trim’:true} |
* Currently “delimited” is the only OutputParser type
OutputSource schema¶
field | required | default | type | example |
---|---|---|---|---|
filename* | false | string | ‘out.txt’ | |
stream* | false | string | ‘stderr’ | |
glob+ | false | string | ‘*.txt’ | |
filenames+ | false | string | [‘out1.txt’,’out2.txt’] |
* When used with outputs with “scatter” mode, an OutputParser is required
+ Only for outputs with “scatter” mode. (No parser required.) The “glob” field supports “*”, ”?”, and character ranges using “[]”.