REAT - Robust and Extendable eukaryotic Annotation Toolkit
REAT is a robust easy-to-use genome annotation toolkit for turning high-quality genome assemblies into usable and informative resources. REAT makes use of state-of-the-art annotation tools and is robust to varying quality and sources of molecular evidence.
REAT provides an integrated environment that comprises both a set of workflows geared towards integrating multiple sources of evidence into a genome annotation, and an execution environment for these workflows.
Installation
To install REAT you can:
git clone https://github.com/ei-corebioinformatics/reat
wget https://github.com/broadinstitute/cromwell/releases/download/62/cromwell-62.jar
conda env create -f reat/reat.yml
pip install ./reat
These commands will download the cromwell binary required to execute the workflows and make REAT available in the ‘reat’ conda environment which can be activated using:
conda activate reat
Each task in the workflow is configured with default resource requirements appropriate for most tasks, but these can be overriden by user provided ones. For examples of resource configuration files, refer to each module’s description.
To configure the cromwell engine, there are two relevant files, the cromwell runtime options and the workflow options files.
The cromwell engine can be configured to run in your environment using a file such as:
include required(classpath("application")) database { profile = "slick.jdbc.HsqldbProfile$" db { driver = "org.hsqldb.jdbcDriver" url = """ jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db; shutdown=false; hsqldb.default_table_type=cached;hsqldb.tx=mvcc; hsqldb.result_max_memory_rows=10000; hsqldb.large_data=true; hsqldb.applog=1; hsqldb.lob_compressed=true; hsqldb.script_format=3 """ connectionTimeout = 120000 numThreads = 1 } } # concurrent-job-limit = 2 # max-concurrent-workflows = 1 akka.http.server.request-timeout = 30s call-caching { # Allows re-use of existing results for jobs you've already run # (default: false) enabled = true # Whether to invalidate a cache result forever if we cannot reuse them. Disable this if you expect some cache copies # to fail for external reasons which should not invalidate the cache (e.g. auth differences between users): # (default: true) invalidate-bad-cache-results = true } backend { default = slurm providers { slurm { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config { concurrent-job-limit = 50 filesystems { local { localization: [ # for local SLURM, hardlink doesn't work. Options for this and caching: , "soft-link" , "hard-link", "copy" "soft-link", "copy" ] ## call caching config relating to the filesystem side caching { # When copying a cached result, what type of file duplication should occur. Attempted in the order listed below: duplication-strategy: [ "soft-link" ] hashing-strategy: "path" # Possible values: file, path, path+modtime # "file" will compute an md5 hash of the file content. # "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link", # in order to allow for the original file path to be hashed. check-sibling-md5: false # When true, will check if a sibling file with the same name and the .md5 extension exists, and if it does, use the content of this file as a hash. # If false or the md5 does not exist, will proceed with the above-defined hashing strategy. } } } runtime-attributes = """ Int runtime_minutes = 1440 Int cpu = 4 Int memory_mb = 8000 String? constraints String? queue = "ei-medium" """ submit = """ if [ "" == "${queue}" ] then sbatch -J ${job_name} --constraint="${constraints}" -D ${cwd} -o ${out} -e ${err} -t ${runtime_minutes} \ -p ei-medium \ ${"-c " + cpu} \ --mem ${memory_mb} \ --wrap "/bin/bash ${script}" else sbatch -J ${job_name} --constraint="${constraints}" -D ${cwd} -o ${out} -e ${err} -t ${runtime_minutes} \ -p ${queue} \ ${"-c " + cpu} \ --mem ${memory_mb} \ --wrap "/bin/bash ${script}" fi """ kill = "scancel ${job_id}" check-alive = "squeue -j ${job_id}" job-id-regex = "Submitted batch job (\\d+).*" exit-code-timeout-seconds = 45 } } } }
The workflow options can be used to activate the caching behaviour in cromwell, i.e:
{
"write_to_cache": true,
"read_from_cache": true,
"memory_retry_multiplier" : 1.5
}