Introduction to CWL and Docker

automating workflows in Life Sciences

Go to main page

How do I connect tools together into a workflow?

Workflow of one step

The simplest “hello world” program can be implemented as a workflow. In order to do that, create a file called echo-workflow.cwl:

echo-workflow.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

inputs:
  message: string 

steps:
  echo:
    run: echo.cwl
    in:
      message: message
    out: []

outputs: []

Now, now by giving as input the echo-job.yml file (already created), invoke cwl-runner (or cwltool) with the workflow echo-workflow.cwl and the input object echo-job.yml on the command line. The command is cwl-runner echo-workflow.cwl echo-job.yml. The boxed text below shows this command and the expected output.

NEED TO ADD COMMAND OUTPUT

Two steps workflow

This workflow extracts a java source file from a tar file and then compiles it.

two-step-workflow.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow
inputs:
  tarball: File
  name_of_file_to_extract: string

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

  compile:
    run: arguments.cwl
    in:
      src: untar/extracted_file
    out: [classfile]

Use a YAML or a JSON object in a separate file to describe the input of a run:

two-step-workflow-job.yml

tarball:
  class: File
  path: hello.tar
name_of_file_to_extract: Hello.java

Now invoke cwl-runner (or cwltool) with the tool wrapper and the input object on the command line:

$ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java
$ cwltool two-step-workflow.cwl two-step-workflow-job.yml
[job untar] /tmp/tmp94qFiM$ tar --create --file /home/example/hello.tar Hello.java
[step untar] completion status is success
[job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java
[step compile] completion status is success
[workflow two-step-workflow.cwl] outdir is /home/example
Final process status is success
{
  "compiled_class": {
    "location": "/home/example/Hello.class",
    "checksum": "sha1$e68df795c0686e9aa1a1195536bd900f5f417b18",
    "class": "File",
    "size": 416
  }
}

What’s going on here? Let’s break it down:

cwlVersion: v1.0
class: Workflow

The cwlVersion field indicates the version of the CWL spec used by the document. The class field indicates this document describes a workflow.

inputs:
  tarball: File
  name_of_file_to_extract: string

The inputs section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps.

outputs:
  compiled_class:
    type: File
    outputSource: compile/classfile

The outputs section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The outputSource connects the output parameter classfile of the compile step to the workflow output parameter compiled_class.

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: tarball
      extractfile: name_of_file_to_extract
    out: [extracted_file]

The steps section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using source). In addition, workflow steps which do not depend on one another may run in parallel.

The first step, untar runs tar-param.cwl. This tool has two input parameters, tarfile and extractfile and one output parameter extracted_file.

The in section of the workflow step connects these two input parameters to the inputs of the workflow, tarball and name_of_file_to_extract using source. This means that when the workflow step is executed, the values assigned to tarball and name_of_file_to_extract will be used for the parameters tarfile and extractfile in order to run the tool.

The out section of the workflow step lists the output parameters that are expected from the tool.

  compile:
    run: arguments.cwl
    in:
      src: untar/extracted_file
    out: [classfile]

The second step compile depends on the results from the first step by connecting the input parameter src to the output parameter of untar using untar/extracted_file. It runs arguments.cwl. The output of this step classfile is connected to the outputs section for the Workflow, described above.