Softwares used in the alignment module

bwa_mem

Align .fastq files to a reference genome and generate a .bam file.

🐍 Rule

rule bwa_mem:
    input:
        reads=lambda wildcards: alignment_input(wildcards),
        idx=[
            config.get("bwa_mem", {}).get("amb", ""),
            config.get("bwa_mem", {}).get("ann", ""),
            config.get("bwa_mem", {}).get("bwt", ""),
            config.get("bwa_mem", {}).get("pac", ""),
            config.get("bwa_mem", {}).get("sa", ""),
        ],
    output:
        bam=temp("alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam"),
    params:
        extra=lambda wildcards: "%s %s %s"
        % (
            config.get("bwa_mem", {}).get("extra", ""),
            config.get("bwa_mem", {}).get("read_group", generate_read_group(wildcards)),
            get_deduplication_option(wildcards),
        ),
        sorting=config.get("bwa_mem", {}).get("sort", "samtools"),
        sort_order=config.get("bwa_mem", {}).get("sort_order", "coordinate"),
        sort_extra="-@ %s"
        % str(config.get("bwa_mem", config["default_resources"]).get("threads", config["default_resources"]["threads"])),
    log:
        "alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam.log",
    benchmark:
        repeat(
            "alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam.benchmark.tsv",
            config.get("bwa_mem", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("bwa_mem", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("bwa_mem", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("bwa_mem", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("bwa_mem", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("bwa_mem", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("bwa_mem", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("bwa_mem", {}).get("container", config["default_container"])
    message:
        "{rule}: align fastq files {input.reads} using bwa mem against {input.idx[2]}"
    wrapper:
        "v1.3.1/bio/bwa/mem"

↔ input / output files

Rule parameters Key Value Description
input reads lambda wildcards: alignment_input(wildcards) fastq files from the same sample.
fastq files obtained by get_fastq_file defined in common.smk
idx [ config.get("bwa_mem", {}).get("amb", ""), config.get("bwa_mem", {}).get("ann", ""), config.get("bwa_mem", {}).get("bwt", ""), config.get("bwa_mem", {}).get("pac", ""), config.get("bwa_mem", {}).get("sa", ""), ] reference files for bwa-mem, location are defined in the config.yaml
output bam "alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam" aligned bam file.
NOTE: if the fastq files are divided into different lanes for the same sample,
they will be aligned separately

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string Parameters that should be forwarded.
NOTE: If the sample is marked with umi in the deduplication column in the samples.tsv file the -Y flag is added by the get_deduplication_option function (in common.smk)
read_group string RG string will be added to bam file generated.
The RG string is generated by the function generate_read_group defined in common.smk
sorting string program handling the bam sorting (default samtools)
sort_order string how the bam file should be sorted (default coordinate)
sort_extra string parameters that should be forwarded to sorting (ie, number of threads)
amb string fasta reference amb file
ann string fasta reference ann file
bwt string fasta reference bwt file
pac string fasta reference pac file
sa string fasta reference sa file

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
NOTE: bwa mem uses a large amount of memory.
mem_per_cpu integer memory used per cpu
NOTE: bwa mem uses a large amount of memory.
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available
RECOMMENDATION: use multiple threads for decreased run time.
NOTE: if multiple threads is used the memory must also be increased (mem_mb)

bwa_mem_merge

Merge .bam files from the same sample using samtools merge.

🐍 Rule

rule bwa_mem_merge:
    input:
        bams=lambda wildcards: [
            "alignment/bwa_mem/{sample}_{type}_%s_%s_%s.bam" % (u.flowcell, u.lane, u.barcode)
            for u in get_units(units, wildcards)
        ],
    output:
        bam=temp("alignment/bwa_mem/{sample}_{type}_unsorted.bam"),
    params:
        config.get("bwa_mem_merge", {}).get("extra", ""),
    log:
        "alignment/bwa_mem/{sample}_{type}_unsorted.bam.log",
    benchmark:
        repeat(
            "alignment/bwa_mem/{sample}_{type}_unsorted.bam.benchmark.tsv",
            config.get("bwa_mem_merge", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("bwa_mem_merge", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("bwa_mem_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("bwa_mem_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("bwa_mem_merge", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("bwa_mem_merge", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("bwa_mem_merge", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("bwa_mem_merge", {}).get("container", config["default_container"])
    message:
        "{rule}: merge bam file {input} using samtools"
    wrapper:
        "v1.1.0/bio/samtools/merge"

↔ input / output files

Rule parameters Key Value Description
input bams lambda wildcards: [ "alignment/bwa_mem/{sample}_{type}_%s_%s_%s.bam" % (u.flowcell, u.lane, u.barcode) for u in get_units(units, wildcards) ] bam files from the same sample (and the same sample type)
a list of bam files is obtained using the information in the units.tsv file
output bam "alignment/bwa_mem/{sample}_{type}_unsorted.bam" an unsorted merged bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
RECOMMENDATION: use -c -p to only keep one of the read groups IDs when merging files from the same sample

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

bwa_mem_realign_consensus_reads

Realign after consensus read creation by fgbio_call_and_filter_consensus_reads and generate a .bam file.

🐍 Rule

rule bwa_mem_realign_consensus_reads:
    input:
        bam="alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam",
    output:
        bam=temp("alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi_unsorted.bam"),
    params:
        extra_bwa_mem=config.get("bwa_mem_realign_consensus_reads", {}).get("extra_bwa_mem", ""),
        reference=config.get("reference", {}).get("fasta", ""),
        tmp_dir="alignment/tmp_realign_{sample}_{type}",
        fgbio_sorted_unmapped="alignment/tmp_realign_{sample}_{type}/fgbio_query_sorted.bam",
    log:
        "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.log",
    benchmark:
        repeat(
            "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.benchmark.tsv",
            config.get("bwa_mem_realign_consensus_reads", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("bwa_mem_realign_consensus_reads", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("bwa_mem_realign_consensus_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("bwa_mem_realign_consensus_reads", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("bwa_mem_realign_consensus_reads", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("bwa_mem_realign_consensus_reads", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("bwa_mem_realign_consensus_reads", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("bwa_mem_realign_consensus_reads", {}).get("container", config["default_container"])
    message:
        "{rule}: realign unmappend consensus reads found in {input.bam}"
    shell:
        'sh -c "'
        "set -e; "
        "mkdir -p {params.tmp_dir}; "
        "trap 'rm -rf {params.tmp_dir}' EXIT; "

        "fgbio -Xmx16g SortBam -i {input.bam} -s Queryname -o {params.fgbio_sorted_unmapped}; "

        "samtools fastq -n {params.fgbio_sorted_unmapped} | "
        "bwa mem -t {threads} -p -K 150000000 -Y {params.reference} {params.extra_bwa_mem} - | "
        "fgbio -Xmx16g SortBam -i /dev/stdin -s Queryname -o /dev/stdout | "
        "fgbio -Xmx16g ZipperBams "
        "--unmapped {params.fgbio_sorted_unmapped} "
        "--ref {params.reference} "
        "--tags-to-reverse cd ce ad ae bd be aq bq "
        "--tags-to-revcomp ac bc "
        "-o {output.bam}"
        '" >& {log}'

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" unmapped bam file with consensus reads based on umi barcodes
output bam "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi_unsorted.bam" realigned and sorted bam file based on umi consensus reads

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra_bwa_mem string parameters that should be forwarded to bwa_mem
extra_sort string parameters that should be forwarded to samtools sort
extra_zipper_bam string parameters that should be forwarded to fgbios ZipperBam

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
RECOMMENDATION: Use multiple threads for decreased run time.
NOTE: If multiple threads is used the memory must also be increased (mem_mb)
time string max execution time

fgbio_call_and_filter_consensus_reads

Call and filter consensus reads based on umis using fgbio (CallDuplexConsensusReads followed by FilterConsensusReads)

🐍 Rule

rule fgbio_call_and_filter_consensus_reads:
    input:
        bam="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam",
    output:
        bam=temp("alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam"),
    params:
        extra_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get("extra_call", ""),
        extra_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get("extra_filter", ""),
        max_base_error_rate=config.get("fgbio_call_and_filter_consensus_reads", {}).get("max_base_error_rate", "0.2"),
        min_reads_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get("min_reads_call", "1 1 1"),
        min_reads_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get("min_reads_filter", "1 1 1"),
        min_input_base_quality_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
            "min_input_base_quality_call", "20"
        ),
        min_input_base_quality_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
            "min_input_base_quality_filter", "45"
        ),
        reference=config.get("reference", {}).get("fasta", ""),
    log:
        "alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped.bam.log",
    benchmark:
        repeat(
            "alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped.bam.benchmark.tsv",
            config.get("fgbio_call_and_filter_consensus_reads", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fgbio_call_and_filter_consensus_reads", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fgbio_call_and_filter_consensus_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
            "partition", config["default_resources"]["partition"]
        ),
        threads=config.get("fgbio_call_and_filter_consensus_reads", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fgbio_call_and_filter_consensus_reads", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fgbio_call_and_filter_consensus_reads", {}).get("container", config["default_container"])
    message:
        "{rule}: call and filter consensus reads in {input.bam} into an unmapped bam file"
    shell:
        'sh -c "'
        "fgbio -Xmx4g --compression 0 CallDuplexConsensusReads "
        "--input {input.bam} "
        "--output /dev/stdout "
        "--min-reads {params.min_reads_call} "
        "--min-input-base-quality {params.min_input_base_quality_call} "
        "--threads {threads} "
        "{params.extra_call} "
        "| fgbio -Xmx8g --compression 1 FilterConsensusReads "
        "--input /dev/stdin "
        "--output {output.bam} "
        "--ref {params.reference} "
        "--min-reads {params.min_reads_filter} "
        "--min-base-quality {params.min_input_base_quality_filter} "
        "--max-base-error-rate {params.max_base_error_rate} "
        '{params.extra_filter}" >& {log}'

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam" input 'bam' file with umi tags
output bam "alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" unmapped bam file with consensus reads that are hard filtered

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra_call string parameters that should be forwarded to CallDuplexConsensusReads
extra_filter string parameters that should be forwarded to FilterConsensusReads
max_base_error_rate string mask bases with N if the % of reads that differ from consensus in higher than max error rate in FilterConsensusReads
min_reads_call string String of three numbers for which the reads are filtered if the number of reads are under these numbers in CallDuplexConsensusReads.
The first number is reads from both strands while the second and third number is for the individual strands.
The first number must be greater or equal to the other numbers.
min_reads_filter string String of three numbers for which the reads are filtered if the number of reads are under these numbers in FilterConsensusReads.
The first number is reads from both strands while the second and third number is for the individual strands.
The first number must be greater or equal to the other numbers.
min_input_base_quality_call integer only consider bases over min base quality in consensus creation in CallDuplexConsensusReads
min_input_base_quality_filter integer mask bases with N if under min base quality in consensus creation in FilterConsensusReads

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
NOTE: must be at least 8 GB
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
RECOMMENDATION: Use multiple threads for decreased run time.
time string max execution time

fgbio_group_reads_by_umi

Group and sort reads based on umi using fgbio in preparation for fgbio_call_and_filter_consensus_reads. Also add mate pair MQ sam tags using samblaster.

🐍 Rule

rule fgbio_group_reads_by_umi:
    input:
        bam="alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam",
    output:
        bam="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam",
        histo="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.histo.tsv",
    params:
        extra=config.get("fgbio_group_reads_by_umi", {}).get("extra", ""),
        umi_strategy=config.get("fgbio_group_reads_by_umi", {}).get("umi_strategy", "paired"),
    log:
        "alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam.log",
    benchmark:
        repeat(
            "alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam.benchmark.tsv",
            config.get("fgbio_group_reads_by_umi", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fgbio_group_reads_by_umi", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fgbio_group_reads_by_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fgbio_group_reads_by_umi", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("fgbio_group_reads_by_umi", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("fgbio_group_reads_by_umi", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fgbio_group_reads_by_umi", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fgbio_group_reads_by_umi", {}).get("container", config["default_container"])
    message:
        "{rule}: group reads by umi in {input.bam} and output umi sorted bam"
    shell:
        "(fgbio GroupReadsByUmi "
        "-i {input.bam} "
        "-o {output.bam} "
        "-f {output.histo} "
        "-s {params.umi_strategy} "
        "{params.extra}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam" input bam file
output bam "alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam" output bam that is umi sorted

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
umi_strategy string umi strategy for how the umis should be grouped (paired for duplex umis)

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

fgbio_call_overlapping_consensus_bases

Call consensus bases from overlapping reads.

🐍 Rule

rule fgbio_call_overlapping_consensus_bases:
    input:
        bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
        ref=config.get("reference", {}).get("fasta", ""),
    output:
        bam=temp("alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam"),
        metrics=temp("alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.metrics.txt"),
    params:
        agreement_strategy=config.get("fgbio_call_overlapping_consensus_bases", {}).get("agreement_strategy", "Consensus"),
        disagreement_strategy=config.get("fgbio_call_overlapping_consensus_bases", {}).get("disagreement_strategy", "Consensus"),
        extra=config.get("fgbio_call_overlapping_consensus_bases", {}).get("extra", ""),
        jvm_args=config.get("fgbio_call_overlapping_consensus_bases", {}).get("jvm_args", "-Xmx6g"),
    log:
        "alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam.log",
    benchmark:
        repeat(
            "alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam.benchmark.tsv",
            config.get("fgbio_call_overlapping_consensus_bases", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fgbio_call_overlapping_consensus_bases", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fgbio_call_overlapping_consensus_bases", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fgbio_call_overlapping_consensus_bases", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("fgbio_call_overlapping_consensus_bases", {}).get(
            "partition", config["default_resources"]["partition"]
        ),
        threads=config.get("fgbio_call_overlapping_consensus_bases", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fgbio_call_overlapping_consensus_bases", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fgbio_call_overlapping_consensus_bases", {}).get("container", config["default_container"])
    message:
        "{rule}: call overlapping consensus bases on {input.bam}"
    shell:
        'sh -c "'
        "fgbio {params.jvm_args} CallOverlappingConsensusBases "
        "--input {input.bam} "
        "--output {output.bam} "
        "--metrics {output.metrics} "
        "--ref {input.ref} "
        "--agreement-strategy {params.agreement_strategy} "
        "--disagreement-strategy {params.disagreement_strategy} "
        '{params.extra}" >& {log}'

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" input query sorted bam file
ref config.get("reference", {}).get("fasta", "") genome reference file
output bam "alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam" output bam file with corrected overlapping consensus bases
metrics "alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.metrics.txt" output metrics file with statistics of the overlapping consensus bases correction

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
agreement_strategy string set agreement strategy for fgbio_call_overlapping_consensus_bases
disagreement_strategy string set disagreement strategy for fgbio_call_overlapping_consensus_bases
extra string parameters that should be forwarded
jvm_args string set jvm args for fgbio_call_overlapping_consensus_bases

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

fgbio_copy_umi_from_read_name

Copies the UMI at the end of the BAM’s read name to the RX tag using fgbio in preparation for fgbio_group_reads_by_umi

🐍 Rule

rule fgbio_copy_umi_from_read_name:
    input:
        bam="alignment/bwa_mem/{sample}_{type}.umi.bam",
    output:
        bam=temp("alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam"),
    params:
        extra=config.get("fgbio_copy_umi_from_read_name", {}).get("extra", ""),
    log:
        "alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam.log",
    benchmark:
        repeat(
            "alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam.benchmark.tsv",
            config.get("fgbio_copy_umi_from_read_name", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fgbio_copy_umi_from_read_name", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fgbio_copy_umi_from_read_name", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fgbio_copy_umi_from_read_name", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("fgbio_copy_umi_from_read_name", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("fgbio_copy_umi_from_read_name", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fgbio_copy_umi_from_read_name", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fgbio_copy_umi_from_read_name", {}).get("container", config["default_container"])
    message:
        "{rule}: Copy UMI from read name to sam tag on {input.bam}"
    shell:
        'sh -c "'
        "(samtools view "
        "-h "
        "-F 0x900 "
        "{input.bam} "
        "| samblaster "
        "--addMateTags "
        "--ignoreUnmated "
        "| fgbio CopyUmiFromReadName "
        "-i /dev/stdin "
        "-o {output.bam} "
        '{params.extra})" &> {log}'

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem/{sample}_{type}.umi.bam" input bam file
output bam "alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam" Output bam file with umi tag added (default tag name RX) extracted from the read name

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
RECOMMENDATION: Use multiple threads for decreased run time.
time string max execution time

minimap2_align

Align long read sequencing data stored in a bamfile to a reference genome to produce a bam with aligned reads.

🐍 Rule

rule minimap2_align:
    input:
        query=lambda wildcards: get_minimap2_query(wildcards),
        target=expand(
            "alignment/minimap2_index/{ref}.{preset}.mmi",
            ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
            preset=config.get("minimap2_align", {}).get("preset", ""),
        ),
    output:
        bam=temp("alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam"),
    params:
        extra=lambda wildcards, input: "%s %s -x %s"
        % (
            config.get("minimap2_align", {}).get("extra", ""),
            config.get("minimap2_align", {}).get("read_group", generate_minimap2_read_group(wildcards, input)),
            config.get("minimap2_align", {}).get("preset", ""),
        ),
        sorting=config.get("minimap2_align", {}).get("sort_order", "coordinate"),
        sort_extra=config.get("minimap2_align", {}).get("sort_extra", ""),
    log:
        "alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.log",
    benchmark:
        repeat(
            "alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.benchmark.tsv",
            config.get("minimap2_align", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("minimap2_align", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("minimap2_align", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("minimap2_align", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("minimap2_align", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("minimap2_align", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("minimap2_align", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("minimap2_align", {}).get("container", config["default_container"])
    message:
        "{rule}: run minimap2 to align reads from {input.query} to {input.target}"
    wrapper:
        "v4.3.0/bio/minimap2/aligner"

↔ input / output files

Rule parameters Key Value Description
input target expand( "alignment/minimap2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("minimap2_align", {}).get("preset", ""), ) a mininmap2 index file for the reference genome
query lambda wildcards: get_minimap2_query(wildcards) bam file with unaligned reads
output bam "alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam" bam file with aligned reads (Note that the methylation tags will also be present in the aligned bam file)

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
mmi string fasta reference mmi file (generated with the same preset as specified in the config)
preset string minimap2 preset options for various types long read sequencing data (e.g., map-hifi or map-ont)
read_group string RG string will be added to bam file generated.
The RG string is generated by the function generate_minimap2_read_group defined in common.smk
sort_order string how the bam file should be sorted (default coordinate)
sort_extra string parameters that should be forwarded to sorting (NB. do not set -@ or --threads here, this is set from {threads})

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

minimap2_index

Prepare reference index file for minimap2.

🐍 Rule

rule minimap2_index:
    input:
        target=config.get("reference", {}).get("fasta", ""),
    output:
        mmi=expand(
            "alignment/minimap2_index/{ref}.{preset}.mmi",
            ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
            preset=config.get("minimap2_align", {}).get("preset", ""),
        ),
    params:
        extra=set_minimap2_preset,
    log:
        "alignment/minimap2_index/minimap2_index.log",
    benchmark:
        repeat(
            "alignment/minimap2_index/minimap2_index.benchmark.tsv", config.get("minimap2_index", {}).get("benchmark_repeats", 1)
        )
    threads: config.get("minimap2_index", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("minimap2_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("minimap2_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("minimap2_index", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("minimap2_index", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("minimap2_index", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("minimap2_index", {}).get("container", config["default_container"])
    message:
        "{rule}: index {input.target} with minimap2"
    wrapper:
        "v4.3.0/bio/minimap2/index"

↔ input / output files

Rule parameters Key Value Description
input target config.get("reference", {}).get("fasta", "") reference/target genome
output mmi expand( "alignment/minimap2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("minimap2_align", {}).get("preset", ""), ) minimap2 index file of the target/reference genome

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

minimap2_merge

Merge minimap2 .bam files from the same sample using samtools merge.

🐍 Rule

rule minimap2_merge:
    input:
        bams=lambda wildcards: [
            "alignment/minimap2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode)
            for u in get_units(units, wildcards)
        ],
    output:
        bam=temp("alignment/minimap2_align/{sample}_{type}.bam"),
    params:
        extra=config.get("minimap2_merge", {}).get("extra", ""),
    log:
        "alignment/minimap2_align/{sample}_{type}.bam.log",
    benchmark:
        repeat(
            "alignment/minimap2_align/{sample}_{type}.bam.benchmark.tsv",
            config.get("minimap2", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("minimap2_merge", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("minimap2_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("minimap2_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("minimap2_merge", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("minimap2_merge", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("minimap2_merge", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("minimap2_merge", {}).get("container", config["default_container"])
    message:
        "{rule}: merge {input.bams} using samtools merge"
    wrapper:
        "v3.9.0/bio/samtools/merge"

↔ input / output files

Rule parameters Key Value Description
input bams lambda wildcards: [ "alignment/minimap2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode) for u in get_units(units, wildcards) ] bam files from the same sample (and the same sample type)
a list of bam files is obtained using the information in the units.tsv file
output bam "alignment/minimap2_align/{sample}_{type}.bam" a sorted merged bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
RECOMMENDATION: use -c -p to only keep one of the read groups IDs when merging files from the same sample and flowcell

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

pbmm2_align

Align long read sequencing data stored in a bamfile to a reference genome to produce a bam with aligned reads.

🐍 Rule

rule pbmm2_align:
    input:
        query=lambda wildcards: get_minimap2_query(wildcards),
        reference=expand(
            "alignment/pbmm2_index/{ref}.{preset}.mmi",
            ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
            preset=config.get("pbmm2_align", {}).get("preset", ""),
        ),
    output:
        bam=temp("alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam"),
    params:
        preset=config.get("pbmm2_align", {}).get("preset", ""),
        sample=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
        loglevel="INFO",
        extra=" --sort %s " % (config.get("pbmm2_align", {}).get("extra", "")),
    log:
        bam="alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.log",
    benchmark:
        repeat(
            "alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.benchmark.tsv",
            config.get("pbmm2_align", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("pbmm2_align", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("pbmm2_align", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("pbmm2_align", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("pbmm2_align", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("pbmm2_align", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("pbmm2_align", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("pbmm2_align", {}).get("container", config["default_container"])
    message:
        "{rule}: Align reads in {input.query} against {input.reference}"
    wrapper:
        "v4.3.0/bio/pbmm2/align"

↔ input / output files

Rule parameters Key Value Description
input query lambda wildcards: get_minimap2_query(wildcards) bam file of unaligned pacbio reads
reference expand( "alignment/pbmm2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("pbmm2_align", {}).get("preset", ""), ) pbmm2 reference index file
output bam "alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam" bam file with aligned reads (Note that the methylation tags will also be present in the aligned bam file)

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
preset string pbmm2 preset options for various types pacbio sequencing data (e.g., HIFI)

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

pbmm2_index

Prepare reference index file for pbmm2.

🐍 Rule

rule pbmm2_index:
    input:
        reference=config.get("reference", {}).get("fasta", ""),
    output:
        mmi=expand(
            "alignment/pbmm2_index/{ref}.{preset}.mmi",
            ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
            preset=config.get("pbmm2_align", {}).get("preset", ""),
        ),
    params:
        preset=config.get("pbmm2_align", {}).get("preset", ""),
        extra=config.get("pbmm2_index", {}).get("extra", ""),
    log:
        "alignment/pbmm2_index/pbmm2_index.log",
    benchmark:
        repeat("alignment/pbmm2_index/pbmm2_index.benchmark.tsv", config.get("pbmm2_index", {}).get("benchmark_repeats", 1))
    threads: config.get("pbmm2_index", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("pbmm2_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("pbmm2_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("pbmm2_index", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("pbmm2_index", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("pbmm2_index", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("pbmm2_index", {}).get("container", config["default_container"])
    message:
        "{rule}: index {input.reference} with pbmm2"
    wrapper:
        "v3.9.0/bio/pbmm2/index"

↔ input / output files

Rule parameters Key Value Description
input reference config.get("reference", {}).get("fasta", "") target/reference genome fasta file
output mmi expand( "alignment/pbmm2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("pbmm2_align", {}).get("preset", ""), ) pbmm2 index file of the target/reference genome

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
preset string preset for indexing the target genome

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

pbmm2_merge

Merge pbmm2 .bam files from the same sample using samtools merge.

🐍 Rule

rule pbmm2_merge:
    input:
        bams=lambda wildcards: [
            "alignment/pbmm2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode)
            for u in get_units(units, wildcards)
        ],
    output:
        bam=temp("alignment/pbmm2_align/{sample}_{type}.bam"),
    params:
        extra=config.get("pbmm2_merge", {}).get("extra", ""),
    log:
        "alignment/pbmm2_align/{sample}_{type}.bam.log",
    benchmark:
        repeat(
            "alignment/pbmm2_align/{sample}_{type}.bam.benchmark.tsv",
            config.get("pbmm2_align", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("pbmm2_merge", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("pbmm2_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("pbmm2_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("pbmm2_merge", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("pbmm2_merge", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("pbmm2_merge", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("pbmm2_merge", {}).get("container", config["default_container"])
    message:
        "{rule}: merge bam file {input} using samtools"
    wrapper:
        "v3.9.0/bio/samtools/merge"

↔ input / output files

Rule parameters Key Value Description
input bams lambda wildcards: [ "alignment/pbmm2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode) for u in get_units(units, wildcards) ] bam files from the same sample (and the same sample type)
a list of bam files is obtained using the information in the units.tsv file
output bam "alignment/pbmm2_align/{sample}_{type}.bam" a sorted merged bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

picard mark duplicates

Generate a bam file for a single chromosome with duplicates marked

🐍 Rule

rule picard_mark_duplicates:
    input:
        bams="alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam",
    output:
        bam=temp("alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam"),
        metrics=temp("alignment/picard_mark_duplicates/{sample}_{type}_{chr}.metrics.txt"),
    params:
        extra=config.get("picard_mark_duplicates", {}).get("extra", ""),
    log:
        "alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam.log",
    benchmark:
        repeat(
            "alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam.benchmark.tsv",
            config.get("picard_mark_duplicates", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("picard_mark_duplicates", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("picard_mark_duplicates", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("picard_mark_duplicates", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("picard_mark_duplicates", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("picard_mark_duplicates", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("picard_mark_duplicates", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("picard_mark_duplicates", {}).get("container", config["default_container"])
    message:
        "{rule}: mark duplicates in {input} using picard"
    wrapper:
        "v1.25.0/bio/picard/markduplicates"

↔ input / output files

Rule parameters Key Value Description
input bams "alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam" bam file containing one chromosome
output bam "alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam" duplicate marked bam file containing one chromosome
metrics "alignment/picard_mark_duplicates/{sample}_{type}_{chr}.metrics.txt" duplicate statistics for qc

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

picard_mark_duplicates_non_chr

Generate a bam file for a non-chromosomal contigs and unmapped reads with duplicates marked

🐍 Rule

rule picard_mark_duplicates_non_chr:
    input:
        bams="alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam",
    output:
        bam=temp("alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam"),
        metrics=temp("alignment/picard_mark_duplicates/{sample}_{type}_non_chr.metrics.txt"),
    params:
        extra=config.get("picard_mark_duplicates_non_chr", {}).get("extra", ""),
    log:
        "alignment/picard_mark_duplicates_non_chr/{sample}_{type}.output.log",
    benchmark:
        repeat(
            "alignment/picard_mark_duplicates_non_chr/{sample}_{type}.output.benchmark.tsv",
            config.get("picard_mark_duplicates_non_chr", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("picard_mark_duplicates_non_chr", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("picard_mark_duplicates_non_chr", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("picard_mark_duplicates_non_chr", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("picard_mark_duplicates_non_chr", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("picard_mark_duplicates_non_chr", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("picard_mark_duplicates_non_chr", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("picard_mark_duplicates", {}).get("container", config["default_container"])
    message:
        "{rule}: mark duplicates in {input.bams} using picard"
    wrapper:
        "v1.25.0/bio/picard/markduplicates"

↔ input / output files

Rule parameters Key Value Description
input bams "alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam" bam file containing non-chromosomal contigs requested in the config and unmapped reads
output bam "alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam" duplicate marked bam file containing non-chromosomal contigs requested in the config and unmapped reads
metrics "alignment/picard_mark_duplicates/{sample}_{type}_non_chr.metrics.txt" duplicate statistics for qc

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools_extract_reads

Extract reads from each chromosome and put into separate .bam files using samtools view.

🐍 Rule

rule samtools_extract_reads:
    input:
        bam="alignment/bwa_mem/{sample}_{type}.bam",
        bai="alignment/bwa_mem/{sample}_{type}.bam.bai",
    output:
        bam=temp("alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam"),
    params:
        extra=config.get("samtools_extract_reads", {}).get("extra", ""),
    log:
        "alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam.benchmark.tsv",
            config.get("samtools_extract_reads", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_extract_reads", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_extract_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_extract_reads", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_extract_reads", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_extract_reads", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_extract_reads", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_extract_reads", {}).get("container", config["default_container"])
    message:
        "{rule}: create bam {output} with only reads from {wildcards.chr}"
    shell:
        "(samtools view -@ {threads} {params.extra} -b {input} {wildcards.chr} > {output}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem/{sample}_{type}.bam" bam file
bai "alignment/bwa_mem/{sample}_{type}.bam.bai" bam index file
output bam "alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam" one bam file for each chromosome

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools_extract_reads_non_chr

Extract reads from non-chromosomal contigs and unmapped reads to separate .bam files using samtools view.

🐍 Rule

rule samtools_extract_reads_non_chr:
    input:
        bam="alignment/bwa_mem/{sample}_{type}.bam",
        bai="alignment/bwa_mem/{sample}_{type}.bam.bai",
    output:
        bam=temp("alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam"),
    params:
        contigs=get_contig_list,
        extra=config.get("samtools_extract_reads_non_chr", {}).get("extra", ""),
    log:
        "alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam.benchmark.tsv",
            config.get("samtools_extract_reads_non_chr", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_extract_reads_non_chr", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_extract_reads_non_chr", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_extract_reads_non_chr", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("samtools_extract_reads_non_chr", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_extract_reads_non_chr", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_extract_reads_non_chr", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_extract_reads_non_chr", {}).get("container", config["default_container"])
    message:
        "{rule}: create bam {output} with only reads from {params.contigs}"
    shell:
        "(samtools view -@ {threads} {params.extra} -b {input} {params.contigs} '*' > {output}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem/{sample}_{type}.bam" bam file
bai "alignment/bwa_mem/{sample}_{type}.bam.bai" bam index file
output bam "alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam" one bam file containing non-chromosomal contigs, requested using merged_contigs in config, and unmapped reads

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

samtools_extract_reads_umi

Extract reads from each chromosome and put into separate .bam files using samtools view.

🐍 Rule

rule samtools_extract_reads_umi:
    input:
        bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
        bai="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai",
    output:
        bam=temp("alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam"),
    params:
        extra=config.get("samtools_extract_reads", {}).get("extra", ""),
    log:
        "alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.bam.benchmark.tsv",
            config.get("samtools_extract_reads_umi", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_extract_reads_umi", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_extract_reads_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_extract_reads_umi", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_extract_reads_umi", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_extract_reads_umi", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_extract_reads_umi", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_extract_reads_umi", {}).get("container", config["default_container"])
    message:
        "{rule}: create bam {output} with only reads from {wildcards.chr}"
    shell:
        "(samtools view -@ {threads} {params.extra} -b {input} {wildcards.chr} > {output}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" bam file
bai "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai" bam index file
output bam "alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam" one bam file for each chromosome

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools_extract_reads_non_chr_umi

Extract reads from contigs specified by merged_contigs and put into separate .bam files using samtools view.

🐍 Rule

rule samtools_extract_reads_non_chr_umi:
    input:
        bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
        bai="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai",
    output:
        bam=temp("alignment/samtools_extract_reads/{sample}_{type}_non_chr.umi.bam"),
    params:
        contigs=get_contig_list,
        extra=config.get("samtools_extract_reads_non_chr_umi", {}).get("extra", ""),
    log:
        "alignment/samtools_extract_reads_non_chr_umi/{sample}_{type}_non_chr.umi.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_extract_reads_non_chr_umi/{sample}_{type}_non_chr.umi.bam.benchmark.tsv",
            config.get("samtools_extract_reads_non_chr_umi", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_extract_reads_non_chr_umi", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_extract_reads_non_chr_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_extract_reads_non_chr_umi", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("samtools_extract_reads_non_chr_umi", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_extract_reads_non_chr_umi", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_extract_reads_non_chr_umi", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_extract_reads_non_chr_umi", {}).get("container", config["default_container"])
    message:
        "{rule}: create bam {output} with only reads from {params.contigs}"
    shell:
        "(samtools view -@ {threads} {params.extra} -b {input} {params.contigs} '*' > {output}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" bam file
bai "alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai" bam index file
output bam "alignment/samtools_extract_reads/{sample}_{type}_non_chr.umi.bam" one bam file containing non-chromosomal contigs, requested using merged_contigs in config, and unmapped reads

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

samtools_fastq

Converts a bam file to separate fastq files

🐍 Rule

rule samtools_fastq:
    input:
        bam="alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam",
    output:
        fastq1="alignment/samtools_fastq/{sample}_{type}.fastq1.umi.fastq.gz",
        fastq2="alignment/samtools_fastq/{sample}_{type}.fastq2.umi.fastq.gz",
    params:
        sort=config.get("samtools_fastq", {}).get("sort", "-m 4G"),
        fastq=config.get("samtools_fastq", {}).get("fastq", "-n"),
    log:
        "alignment/samtools_fastq/{sample}_{type}.output.log",
    benchmark:
        repeat(
            "alignment/samtools_fastq/{sample}_{type}.output.benchmark.tsv",
            config.get("samtools_fastq", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_fastq", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_fastq", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_fastq", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_fastq", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_fastq", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_fastq", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_fastq", {}).get("container", config["default_container"])
    message:
        "{rule}: Convert the bam file {input.bam} into a fastq file"
    wrapper:
        "v2.6.0/bio/samtools/fastq/separate"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" input bam file
output fastq1 "alignment/samtools_fastq/{sample}_{type}.fastq1.umi.fastq.gz" fastq file with the first read in the read pair
fastq2 "alignment/samtools_fastq/{sample}_{type}.fastq2.umi.fastq.gz" fastq file with the second read in the read pair

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
sort string parameters that should be forwarded to samtools sort
fastq string parameters that should be forwarded to samtools fastq

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory used per cpu
NOTE: should be at least the amount put into the sort parameters
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available
RECOMMENDATION: use at least 3 threads so that sort gets at least 1 exclusive thread
NOTE: if multiple threads is used the memory must also be increased (mem_mb)

samtools index

Index .bam files using samtools index.

🐍 Rule

rule samtools_index:
    input:
        bam="{file}.bam",
    output:
        bai=temp("{file}.bam.bai"),
    params:
        extra=config.get("samtools_index", {}).get("extra", ""),
    log:
        "{file}.bam.bai.log",
    benchmark:
        repeat(
            "{file}.bam.bai.benchmark.tsv",
            config.get("samtools_index", {}).get("benchmark_repeats", 1),
        )
    container:
        config.get("samtools_index", {}).get("container", config["default_container"])
    threads: config.get("samtools_index", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_index", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_index", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_index", {}).get("time", config["default_resources"]["time"]),
    message:
        "{rule}: create index for {input.bam}"
    wrapper:
        "v1.1.0/bio/samtools/index"

↔ input / output files

Rule parameters Key Value Description
input bam "{file}.bam" bam file
output bai "{file}.bam.bai" bam index file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools_merge_bam

Merge .bam files from the same sample using samtools merge.

🐍 Rule

rule samtools_merge_bam:
    input:
        bams=get_chrom_bams,
        non_chr_bams="alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam"
        if config.get("reference", {}).get("merge_contigs", None) is not None
        else [],
    output:
        bam=temp("alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam"),
    params:
        extra=config.get("samtools_merge_bam", {}).get("extra", ""),
    log:
        "alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam.benchmark.tsv",
            config.get("samtools_merge_bam", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_merge_bam", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_merge_bam", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_merge_bam", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_merge_bam", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_merge_bam", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_merge_bam", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools", {}).get("container", config["default_container"])
    message:
        "{rule}: merge chr bam files, creating {output}"
    wrapper:
        "v1.1.0/bio/samtools/merge"

↔ input / output files

Rule parameters Key Value Description
input bams get_chrom_bams list of bam files for all the chromosomes
the list is generated by the function extract_chr defined in the hydra-genetics module
output bam "alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam" merged unsorted bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
RECOMMENDED: use -c -p to only keep one of the read groups IDs when merging files from the same sample

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools sort

Sort .bam files using samtools sort.

🐍 Rule

rule samtools_sort:
    input:
        bam="{file}_unsorted.bam",
    output:
        bam=temp("{file}.bam"),
    params:
        extra=config.get("samtools_sort", {}).get("extra", ""),
    log:
        "{file}.bam.sort.log",
    benchmark:
        repeat(
            "{file}.bam.sort.benchmark.tsv",
            config.get("samtools_sort", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_sort", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_sort", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_sort", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_sort", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_sort", {}).get("container", config["default_container"])
    message:
        "{rule}: sort bam file {input.bam} using samtools"
    wrapper:
        "v2.0.0/bio/samtools/sort"

↔ input / output files

Rule parameters Key Value Description
input bam "{file}_unsorted.bam" unsorted bam file
output bam "{file}.bam" sorted bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools sort_umi

Sort .bam files using samtools sort. Sort on query name.

🐍 Rule

rule samtools_sort:
    input:
        bam="{file}_unsorted.bam",
    output:
        bam=temp("{file}.bam"),
    params:
        extra=config.get("samtools_sort", {}).get("extra", ""),
    log:
        "{file}.bam.sort.log",
    benchmark:
        repeat(
            "{file}.bam.sort.benchmark.tsv",
            config.get("samtools_sort", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_sort", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_sort", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_sort", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_sort", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_sort", {}).get("container", config["default_container"])
    message:
        "{rule}: sort bam file {input.bam} using samtools"
    wrapper:
        "v2.0.0/bio/samtools/sort"

↔ input / output files

Rule parameters Key Value Description
input bam "{file}_unsorted.bam" unsorted bam file
output bam "{file}.bam" sorted bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

samtools_filter_reads

Filter .bam files using samtools view.

🐍 Rule

rule samtools_filter_reads:
    input:
        bam="alignment/samtools_merge_bam/{sample}_{type}.bam",
    output:
        bam=temp("alignment/samtools_filter_reads/{sample}_{type}.bam"),
    params:
        extra=config.get("samtools_filter_reads", {}).get("extra", "-f 2"),
    log:
        "alignment/samtools_filter_reads/{sample}_{type}.bam.log",
    benchmark:
        repeat(
            "alignment/samtools_filter_reads/{sample}_{type}.bam.benchmark.tsv",
            config.get("samtools_filter_reads", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("samtools_filter_reads", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("samtools_filter_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("samtools_filter_reads", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("samtools_filter_reads", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("samtools_filter_reads", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("samtools_filter_reads", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("samtools_filter_reads", {}).get("container", config["default_container"])
    message:
        "{rule}: filter reads in {input.bam} with {params.extra}"
    shell:
        "(samtools view -@ {threads} {params.extra} -b {input.bam} > {output.bam}) &> {log}"

↔ input / output files

Rule parameters Key Value Description
input bam "alignment/samtools_merge_bam/{sample}_{type}.bam" input bam file
output bam "alignment/samtools_filter_reads/{sample}_{type}.bam" filtered bam file

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available

star

Align .fastq files to a reference genome and generate a .bam file. Star is a split read aware aligner for RNA-data.

🐍 Rule

rule star:
    input:
        fq1="prealignment/merged/{sample}_{type}_fastq1.fastq.gz",
        fq2="prealignment/merged/{sample}_{type}_fastq2.fastq.gz",
        idx=config.get("star", {}).get("genome_index", ""),
    output:
        bam=temp("alignment/star/{sample}_{type}.bam"),
        sj=temp("alignment/star/{sample}_{type}.SJ.out.tab"),
    params:
        extra=config.get("star", {}).get("extra", "--outSAMtype BAM SortedByCoordinate"),
        idx="{input.idx}",
    log:
        "alignment/star/{sample}_{type}.bam.log",
    benchmark:
        repeat("alignment/star/{sample}_{type}.bam.benchmark.tsv", config.get("star", {}).get("benchmark_repeats", 1))
    threads: config.get("star", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("star", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("star", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("star", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("star", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("star", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("star", {}).get("container", config["default_container"])
    message:
        "{rule}: align with star, creating {output.bam}"
    wrapper:
        "v1.3.2/bio/star/align"

↔ input / output files

Rule parameters Key Value Description
input fq1 "prealignment/merged/{sample}_{type}_fastq1.fastq.gz" merged fastq file from read 1
fq2 "prealignment/merged/{sample}_{type}_fastq2.fastq.gz" merged fastq file from read 2
idx config.get("star", {}).get("genome_index", "") star reference genome index
file location is set in config.yaml
output bam "alignment/star/{sample}_{type}.bam" aligned bam file
sj "alignment/star/{sample}_{type}.SJ.out.tab" junction file with split read information useful for interpreting rna data

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded
RECOMMENDATION: --outSAMtype BAM SortedByCoordinate
this will output coordinate sorted bam files instead of a sam file
genome_index string path to star reference index

Resources settings (resources.yaml)

Key Type Description
mem_mb integer memory in MB used per cpu
mem_per_cpu integer memory used per cpu
partition string partition to use on cluster
time string max execution time
threads integer number of threads to be available
RECOMMENDATION: use multiple threads for decreased run time.
NOTE: if multiple threads is used the memory must also be increased (mem_mb)