Softwares used in the alignment module
Align .fastq files to a reference genome and generate a .bam file.
Rule
rule bwa_mem:
input:
reads=lambda wildcards: alignment_input(wildcards),
idx=[
config.get("bwa_mem", {}).get("amb", ""),
config.get("bwa_mem", {}).get("ann", ""),
config.get("bwa_mem", {}).get("bwt", ""),
config.get("bwa_mem", {}).get("pac", ""),
config.get("bwa_mem", {}).get("sa", ""),
],
output:
bam=temp("alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam"),
params:
extra=lambda wildcards: "%s %s %s"
% (
config.get("bwa_mem", {}).get("extra", ""),
config.get("bwa_mem", {}).get("read_group", generate_read_group(wildcards)),
get_deduplication_option(wildcards),
),
sorting=config.get("bwa_mem", {}).get("sort", "samtools"),
sort_order=config.get("bwa_mem", {}).get("sort_order", "coordinate"),
sort_extra="-@ %s"
% str(config.get("bwa_mem", config["default_resources"]).get("threads", config["default_resources"]["threads"])),
log:
"alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam.log",
benchmark:
repeat(
"alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam.benchmark.tsv",
config.get("bwa_mem", {}).get("benchmark_repeats", 1),
)
threads: config.get("bwa_mem", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("bwa_mem", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("bwa_mem", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("bwa_mem", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("bwa_mem", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("bwa_mem", {}).get("time", config["default_resources"]["time"]),
container:
config.get("bwa_mem", {}).get("container", config["default_container"])
message:
"{rule}: align fastq files {input.reads} using bwa mem against {input.idx[2]}"
wrapper:
"v1.3.1/bio/bwa/mem"
| Rule parameters |
Key |
Value |
Description |
| input |
reads |
lambda wildcards: alignment_input(wildcards) |
fastq files from the same sample. fastq files obtained by get_fastq_file defined in common.smk
|
| idx |
[ config.get("bwa_mem", {}).get("amb", ""), config.get("bwa_mem", {}).get("ann", ""), config.get("bwa_mem", {}).get("bwt", ""), config.get("bwa_mem", {}).get("pac", ""), config.get("bwa_mem", {}).get("sa", ""), ] |
reference files for bwa-mem, location are defined in the config.yaml
|
| output |
bam |
"alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam" |
aligned bam file. NOTE: if the fastq files are divided into different lanes for the same sample, they will be aligned separately
|
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
Parameters that should be forwarded. NOTE: If the sample is marked with umi in the deduplication column in the samples.tsv file the -Y flag is added by the get_deduplication_option function (in common.smk)
|
| read_group |
string |
RG string will be added to bam file generated. The RG string is generated by the function generate_read_group defined in common.smk
|
| sorting |
string |
program handling the bam sorting (default samtools) |
| sort_order |
string |
how the bam file should be sorted (default coordinate) |
| sort_extra |
string |
parameters that should be forwarded to sorting (ie, number of threads) |
| amb |
string |
fasta reference amb file |
| ann |
string |
fasta reference ann file |
| bwt |
string |
fasta reference bwt file |
| pac |
string |
fasta reference pac file |
| sa |
string |
fasta reference sa file |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu NOTE: bwa mem uses a large amount of memory.
|
| mem_per_cpu |
integer |
memory used per cpu NOTE: bwa mem uses a large amount of memory.
|
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available RECOMMENDATION: use multiple threads for decreased run time. NOTE: if multiple threads is used the memory must also be increased (mem_mb)
|
Merge .bam files from the same sample using samtools merge.
Rule
rule bwa_mem_merge:
input:
bams=lambda wildcards: [
"alignment/bwa_mem/{sample}_{type}_%s_%s_%s.bam" % (u.flowcell, u.lane, u.barcode)
for u in get_units(units, wildcards)
],
output:
bam=temp("alignment/bwa_mem/{sample}_{type}_unsorted.bam"),
params:
config.get("bwa_mem_merge", {}).get("extra", ""),
log:
"alignment/bwa_mem/{sample}_{type}_unsorted.bam.log",
benchmark:
repeat(
"alignment/bwa_mem/{sample}_{type}_unsorted.bam.benchmark.tsv",
config.get("bwa_mem_merge", {}).get("benchmark_repeats", 1),
)
threads: config.get("bwa_mem_merge", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("bwa_mem_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("bwa_mem_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("bwa_mem_merge", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("bwa_mem_merge", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("bwa_mem_merge", {}).get("time", config["default_resources"]["time"]),
container:
config.get("bwa_mem_merge", {}).get("container", config["default_container"])
message:
"{rule}: merge bam file {input} using samtools"
wrapper:
"v1.1.0/bio/samtools/merge"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
lambda wildcards: [ "alignment/bwa_mem/{sample}_{type}_%s_%s_%s.bam" % (u.flowcell, u.lane, u.barcode) for u in get_units(units, wildcards) ] |
bam files from the same sample (and the same sample type) a list of bam files is obtained using the information in the units.tsv file
|
| output |
bam |
"alignment/bwa_mem/{sample}_{type}_unsorted.bam" |
an unsorted merged bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded RECOMMENDATION: use -c -p to only keep one of the read groups IDs when merging files from the same sample
|
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Realign after consensus read creation by fgbio_call_and_filter_consensus_reads and generate a .bam file.
Rule
rule bwa_mem_realign_consensus_reads:
input:
bam="alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam",
output:
bam=temp("alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi_unsorted.bam"),
params:
extra_bwa_mem=config.get("bwa_mem_realign_consensus_reads", {}).get("extra_bwa_mem", ""),
reference=config.get("reference", {}).get("fasta", ""),
tmp_dir="alignment/tmp_realign_{sample}_{type}",
fgbio_sorted_unmapped="alignment/tmp_realign_{sample}_{type}/fgbio_query_sorted.bam",
log:
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.log",
benchmark:
repeat(
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.benchmark.tsv",
config.get("bwa_mem_realign_consensus_reads", {}).get("benchmark_repeats", 1),
)
threads: config.get("bwa_mem_realign_consensus_reads", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("bwa_mem_realign_consensus_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("bwa_mem_realign_consensus_reads", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("bwa_mem_realign_consensus_reads", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("bwa_mem_realign_consensus_reads", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("bwa_mem_realign_consensus_reads", {}).get("time", config["default_resources"]["time"]),
container:
config.get("bwa_mem_realign_consensus_reads", {}).get("container", config["default_container"])
message:
"{rule}: realign unmappend consensus reads found in {input.bam}"
shell:
'sh -c "'
"set -e; "
"mkdir -p {params.tmp_dir}; "
"trap 'rm -rf {params.tmp_dir}' EXIT; "
"fgbio -Xmx16g SortBam -i {input.bam} -s Queryname -o {params.fgbio_sorted_unmapped}; "
"samtools fastq -n {params.fgbio_sorted_unmapped} | "
"bwa mem -t {threads} -p -K 150000000 -Y {params.reference} {params.extra_bwa_mem} - | "
"fgbio -Xmx16g SortBam -i /dev/stdin -s Queryname -o /dev/stdout | "
"fgbio -Xmx16g ZipperBams "
"--unmapped {params.fgbio_sorted_unmapped} "
"--ref {params.reference} "
"--tags-to-reverse cd ce ad ae bd be aq bq "
"--tags-to-revcomp ac bc "
"-o {output.bam}"
'" >& {log}'
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" |
unmapped bam file with consensus reads based on umi barcodes |
| output |
bam |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi_unsorted.bam" |
realigned and sorted bam file based on umi consensus reads |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra_bwa_mem |
string |
parameters that should be forwarded to bwa_mem |
| extra_sort |
string |
parameters that should be forwarded to samtools sort |
| extra_zipper_bam |
string |
parameters that should be forwarded to fgbios ZipperBam |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available RECOMMENDATION: Use multiple threads for decreased run time. NOTE: If multiple threads is used the memory must also be increased (mem_mb)
|
| time |
string |
max execution time |
Call and filter consensus reads based on umis using fgbio (CallDuplexConsensusReads followed by FilterConsensusReads)
Rule
rule fgbio_call_and_filter_consensus_reads:
input:
bam="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam",
output:
bam=temp("alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam"),
params:
extra_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get("extra_call", ""),
extra_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get("extra_filter", ""),
max_base_error_rate=config.get("fgbio_call_and_filter_consensus_reads", {}).get("max_base_error_rate", "0.2"),
min_reads_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get("min_reads_call", "1 1 1"),
min_reads_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get("min_reads_filter", "1 1 1"),
min_input_base_quality_call=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
"min_input_base_quality_call", "20"
),
min_input_base_quality_filter=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
"min_input_base_quality_filter", "45"
),
reference=config.get("reference", {}).get("fasta", ""),
log:
"alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped.bam.log",
benchmark:
repeat(
"alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped.bam.benchmark.tsv",
config.get("fgbio_call_and_filter_consensus_reads", {}).get("benchmark_repeats", 1),
)
threads: config.get("fgbio_call_and_filter_consensus_reads", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fgbio_call_and_filter_consensus_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("fgbio_call_and_filter_consensus_reads", {}).get(
"partition", config["default_resources"]["partition"]
),
threads=config.get("fgbio_call_and_filter_consensus_reads", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fgbio_call_and_filter_consensus_reads", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fgbio_call_and_filter_consensus_reads", {}).get("container", config["default_container"])
message:
"{rule}: call and filter consensus reads in {input.bam} into an unmapped bam file"
shell:
'sh -c "'
"fgbio -Xmx4g --compression 0 CallDuplexConsensusReads "
"--input {input.bam} "
"--output /dev/stdout "
"--min-reads {params.min_reads_call} "
"--min-input-base-quality {params.min_input_base_quality_call} "
"--threads {threads} "
"{params.extra_call} "
"| fgbio -Xmx8g --compression 1 FilterConsensusReads "
"--input /dev/stdin "
"--output {output.bam} "
"--ref {params.reference} "
"--min-reads {params.min_reads_filter} "
"--min-base-quality {params.min_input_base_quality_filter} "
"--max-base-error-rate {params.max_base_error_rate} "
'{params.extra_filter}" >& {log}'
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam" |
input 'bam' file with umi tags |
| output |
bam |
"alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" |
unmapped bam file with consensus reads that are hard filtered |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra_call |
string |
parameters that should be forwarded to CallDuplexConsensusReads |
| extra_filter |
string |
parameters that should be forwarded to FilterConsensusReads |
| max_base_error_rate |
string |
mask bases with N if the % of reads that differ from consensus in higher than max error rate in FilterConsensusReads |
| min_reads_call |
string |
String of three numbers for which the reads are filtered if the number of reads are under these numbers in CallDuplexConsensusReads. The first number is reads from both strands while the second and third number is for the individual strands. The first number must be greater or equal to the other numbers.
|
| min_reads_filter |
string |
String of three numbers for which the reads are filtered if the number of reads are under these numbers in FilterConsensusReads. The first number is reads from both strands while the second and third number is for the individual strands. The first number must be greater or equal to the other numbers.
|
| min_input_base_quality_call |
integer |
only consider bases over min base quality in consensus creation in CallDuplexConsensusReads |
| min_input_base_quality_filter |
integer |
mask bases with N if under min base quality in consensus creation in FilterConsensusReads |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available NOTE: must be at least 8 GB
|
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available RECOMMENDATION: Use multiple threads for decreased run time.
|
| time |
string |
max execution time |
Group and sort reads based on umi using fgbio in preparation for fgbio_call_and_filter_consensus_reads. Also add mate pair MQ sam tags using samblaster.
Rule
rule fgbio_group_reads_by_umi:
input:
bam="alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam",
output:
bam="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam",
histo="alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.histo.tsv",
params:
extra=config.get("fgbio_group_reads_by_umi", {}).get("extra", ""),
umi_strategy=config.get("fgbio_group_reads_by_umi", {}).get("umi_strategy", "paired"),
log:
"alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam.log",
benchmark:
repeat(
"alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam.benchmark.tsv",
config.get("fgbio_group_reads_by_umi", {}).get("benchmark_repeats", 1),
)
threads: config.get("fgbio_group_reads_by_umi", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fgbio_group_reads_by_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fgbio_group_reads_by_umi", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("fgbio_group_reads_by_umi", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("fgbio_group_reads_by_umi", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fgbio_group_reads_by_umi", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fgbio_group_reads_by_umi", {}).get("container", config["default_container"])
message:
"{rule}: group reads by umi in {input.bam} and output umi sorted bam"
shell:
"(fgbio GroupReadsByUmi "
"-i {input.bam} "
"-o {output.bam} "
"-f {output.histo} "
"-s {params.umi_strategy} "
"{params.extra}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam" |
input bam file |
| output |
bam |
"alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.bam" |
output bam that is umi sorted |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| umi_strategy |
string |
umi strategy for how the umis should be grouped (paired for duplex umis) |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Call consensus bases from overlapping reads.
Rule
rule fgbio_call_overlapping_consensus_bases:
input:
bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
ref=config.get("reference", {}).get("fasta", ""),
output:
bam=temp("alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam"),
metrics=temp("alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.metrics.txt"),
params:
agreement_strategy=config.get("fgbio_call_overlapping_consensus_bases", {}).get("agreement_strategy", "Consensus"),
disagreement_strategy=config.get("fgbio_call_overlapping_consensus_bases", {}).get("disagreement_strategy", "Consensus"),
extra=config.get("fgbio_call_overlapping_consensus_bases", {}).get("extra", ""),
jvm_args=config.get("fgbio_call_overlapping_consensus_bases", {}).get("jvm_args", "-Xmx6g"),
log:
"alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam.log",
benchmark:
repeat(
"alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam.benchmark.tsv",
config.get("fgbio_call_overlapping_consensus_bases", {}).get("benchmark_repeats", 1),
)
threads: config.get("fgbio_call_overlapping_consensus_bases", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fgbio_call_overlapping_consensus_bases", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fgbio_call_overlapping_consensus_bases", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("fgbio_call_overlapping_consensus_bases", {}).get(
"partition", config["default_resources"]["partition"]
),
threads=config.get("fgbio_call_overlapping_consensus_bases", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fgbio_call_overlapping_consensus_bases", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fgbio_call_overlapping_consensus_bases", {}).get("container", config["default_container"])
message:
"{rule}: call overlapping consensus bases on {input.bam}"
shell:
'sh -c "'
"fgbio {params.jvm_args} CallOverlappingConsensusBases "
"--input {input.bam} "
"--output {output.bam} "
"--metrics {output.metrics} "
"--ref {input.ref} "
"--agreement-strategy {params.agreement_strategy} "
"--disagreement-strategy {params.disagreement_strategy} "
'{params.extra}" >& {log}'
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" |
input query sorted bam file |
| ref |
config.get("reference", {}).get("fasta", "") |
genome reference file |
| output |
bam |
"alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.bam" |
output bam file with corrected overlapping consensus bases |
| metrics |
"alignment/fgbio_call_overlapping_consensus_bases/{sample}_{type}.umi.metrics.txt" |
output metrics file with statistics of the overlapping consensus bases correction |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| agreement_strategy |
string |
set agreement strategy for fgbio_call_overlapping_consensus_bases |
| disagreement_strategy |
string |
set disagreement strategy for fgbio_call_overlapping_consensus_bases |
| extra |
string |
parameters that should be forwarded |
| jvm_args |
string |
set jvm args for fgbio_call_overlapping_consensus_bases |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Copies the UMI at the end of the BAM’s read name to the RX tag using fgbio in preparation for fgbio_group_reads_by_umi
Rule
rule fgbio_copy_umi_from_read_name:
input:
bam="alignment/bwa_mem/{sample}_{type}.umi.bam",
output:
bam=temp("alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam"),
params:
extra=config.get("fgbio_copy_umi_from_read_name", {}).get("extra", ""),
log:
"alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam.log",
benchmark:
repeat(
"alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam.benchmark.tsv",
config.get("fgbio_copy_umi_from_read_name", {}).get("benchmark_repeats", 1),
)
threads: config.get("fgbio_copy_umi_from_read_name", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fgbio_copy_umi_from_read_name", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fgbio_copy_umi_from_read_name", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("fgbio_copy_umi_from_read_name", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("fgbio_copy_umi_from_read_name", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fgbio_copy_umi_from_read_name", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fgbio_copy_umi_from_read_name", {}).get("container", config["default_container"])
message:
"{rule}: Copy UMI from read name to sam tag on {input.bam}"
shell:
'sh -c "'
"(samtools view "
"-h "
"-F 0x900 "
"{input.bam} "
"| samblaster "
"--addMateTags "
"--ignoreUnmated "
"| fgbio CopyUmiFromReadName "
"-i /dev/stdin "
"-o {output.bam} "
'{params.extra})" &> {log}'
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem/{sample}_{type}.umi.bam" |
input bam file |
| output |
bam |
"alignment/fgbio_copy_umi_from_read_name/{sample}_{type}.umi.bam" |
Output bam file with umi tag added (default tag name RX) extracted from the read name |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available RECOMMENDATION: Use multiple threads for decreased run time.
|
| time |
string |
max execution time |
Align long read sequencing data stored in a bamfile to a reference genome to produce a bam with aligned reads.
Rule
rule minimap2_align:
input:
query=lambda wildcards: get_minimap2_query(wildcards),
target=expand(
"alignment/minimap2_index/{ref}.{preset}.mmi",
ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
preset=config.get("minimap2_align", {}).get("preset", ""),
),
output:
bam=temp("alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam"),
params:
extra=lambda wildcards, input: "%s %s -x %s"
% (
config.get("minimap2_align", {}).get("extra", ""),
config.get("minimap2_align", {}).get("read_group", generate_minimap2_read_group(wildcards, input)),
config.get("minimap2_align", {}).get("preset", ""),
),
sorting=config.get("minimap2_align", {}).get("sort_order", "coordinate"),
sort_extra=config.get("minimap2_align", {}).get("sort_extra", ""),
log:
"alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.log",
benchmark:
repeat(
"alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.benchmark.tsv",
config.get("minimap2_align", {}).get("benchmark_repeats", 1),
)
threads: config.get("minimap2_align", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("minimap2_align", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("minimap2_align", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("minimap2_align", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("minimap2_align", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("minimap2_align", {}).get("time", config["default_resources"]["time"]),
container:
config.get("minimap2_align", {}).get("container", config["default_container"])
message:
"{rule}: run minimap2 to align reads from {input.query} to {input.target}"
wrapper:
"v4.3.0/bio/minimap2/aligner"
| Rule parameters |
Key |
Value |
Description |
| input |
target |
expand( "alignment/minimap2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("minimap2_align", {}).get("preset", ""), ) |
a mininmap2 index file for the reference genome |
| query |
lambda wildcards: get_minimap2_query(wildcards) |
bam file with unaligned reads |
| output |
bam |
"alignment/minimap2_align/{sample}_{type}_{processing_unit}_{barcode}.bam" |
bam file with aligned reads (Note that the methylation tags will also be present in the aligned bam file) |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| mmi |
string |
fasta reference mmi file (generated with the same preset as specified in the config) |
| preset |
string |
minimap2 preset options for various types long read sequencing data (e.g., map-hifi or map-ont) |
| read_group |
string |
RG string will be added to bam file generated. The RG string is generated by the function generate_minimap2_read_group defined in common.smk
|
| sort_order |
string |
how the bam file should be sorted (default coordinate) |
| sort_extra |
string |
parameters that should be forwarded to sorting (NB. do not set -@ or --threads here, this is set from {threads}) |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Prepare reference index file for minimap2.
Rule
rule minimap2_index:
input:
target=config.get("reference", {}).get("fasta", ""),
output:
mmi=expand(
"alignment/minimap2_index/{ref}.{preset}.mmi",
ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
preset=config.get("minimap2_align", {}).get("preset", ""),
),
params:
extra=set_minimap2_preset,
log:
"alignment/minimap2_index/minimap2_index.log",
benchmark:
repeat(
"alignment/minimap2_index/minimap2_index.benchmark.tsv", config.get("minimap2_index", {}).get("benchmark_repeats", 1)
)
threads: config.get("minimap2_index", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("minimap2_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("minimap2_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("minimap2_index", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("minimap2_index", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("minimap2_index", {}).get("time", config["default_resources"]["time"]),
container:
config.get("minimap2_index", {}).get("container", config["default_container"])
message:
"{rule}: index {input.target} with minimap2"
wrapper:
"v4.3.0/bio/minimap2/index"
| Rule parameters |
Key |
Value |
Description |
| input |
target |
config.get("reference", {}).get("fasta", "") |
reference/target genome |
| output |
mmi |
expand( "alignment/minimap2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("minimap2_align", {}).get("preset", ""), ) |
minimap2 index file of the target/reference genome |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Merge minimap2 .bam files from the same sample using samtools merge.
Rule
rule minimap2_merge:
input:
bams=lambda wildcards: [
"alignment/minimap2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode)
for u in get_units(units, wildcards)
],
output:
bam=temp("alignment/minimap2_align/{sample}_{type}.bam"),
params:
extra=config.get("minimap2_merge", {}).get("extra", ""),
log:
"alignment/minimap2_align/{sample}_{type}.bam.log",
benchmark:
repeat(
"alignment/minimap2_align/{sample}_{type}.bam.benchmark.tsv",
config.get("minimap2", {}).get("benchmark_repeats", 1),
)
threads: config.get("minimap2_merge", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("minimap2_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("minimap2_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("minimap2_merge", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("minimap2_merge", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("minimap2_merge", {}).get("time", config["default_resources"]["time"]),
container:
config.get("minimap2_merge", {}).get("container", config["default_container"])
message:
"{rule}: merge {input.bams} using samtools merge"
wrapper:
"v3.9.0/bio/samtools/merge"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
lambda wildcards: [ "alignment/minimap2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode) for u in get_units(units, wildcards) ] |
bam files from the same sample (and the same sample type) a list of bam files is obtained using the information in the units.tsv file
|
| output |
bam |
"alignment/minimap2_align/{sample}_{type}.bam" |
a sorted merged bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded RECOMMENDATION: use -c -p to only keep one of the read groups IDs when merging files from the same sample and flowcell
|
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Align long read sequencing data stored in a bamfile to a reference genome to produce a bam with aligned reads.
Rule
rule pbmm2_align:
input:
query=lambda wildcards: get_minimap2_query(wildcards),
reference=expand(
"alignment/pbmm2_index/{ref}.{preset}.mmi",
ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
preset=config.get("pbmm2_align", {}).get("preset", ""),
),
output:
bam=temp("alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam"),
params:
preset=config.get("pbmm2_align", {}).get("preset", ""),
sample=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
loglevel="INFO",
extra=" --sort %s " % (config.get("pbmm2_align", {}).get("extra", "")),
log:
bam="alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.log",
benchmark:
repeat(
"alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam.benchmark.tsv",
config.get("pbmm2_align", {}).get("benchmark_repeats", 1),
)
threads: config.get("pbmm2_align", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pbmm2_align", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pbmm2_align", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("pbmm2_align", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("pbmm2_align", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pbmm2_align", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pbmm2_align", {}).get("container", config["default_container"])
message:
"{rule}: Align reads in {input.query} against {input.reference}"
wrapper:
"v4.3.0/bio/pbmm2/align"
| Rule parameters |
Key |
Value |
Description |
| input |
query |
lambda wildcards: get_minimap2_query(wildcards) |
bam file of unaligned pacbio reads |
| reference |
expand( "alignment/pbmm2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("pbmm2_align", {}).get("preset", ""), ) |
pbmm2 reference index file |
| output |
bam |
"alignment/pbmm2_align/{sample}_{type}_{processing_unit}_{barcode}.bam" |
bam file with aligned reads (Note that the methylation tags will also be present in the aligned bam file) |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| preset |
string |
pbmm2 preset options for various types pacbio sequencing data (e.g., HIFI) |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Prepare reference index file for pbmm2.
Rule
rule pbmm2_index:
input:
reference=config.get("reference", {}).get("fasta", ""),
output:
mmi=expand(
"alignment/pbmm2_index/{ref}.{preset}.mmi",
ref=os.path.basename(config.get("reference", {}).get("fasta", "")),
preset=config.get("pbmm2_align", {}).get("preset", ""),
),
params:
preset=config.get("pbmm2_align", {}).get("preset", ""),
extra=config.get("pbmm2_index", {}).get("extra", ""),
log:
"alignment/pbmm2_index/pbmm2_index.log",
benchmark:
repeat("alignment/pbmm2_index/pbmm2_index.benchmark.tsv", config.get("pbmm2_index", {}).get("benchmark_repeats", 1))
threads: config.get("pbmm2_index", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pbmm2_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pbmm2_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("pbmm2_index", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("pbmm2_index", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pbmm2_index", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pbmm2_index", {}).get("container", config["default_container"])
message:
"{rule}: index {input.reference} with pbmm2"
wrapper:
"v3.9.0/bio/pbmm2/index"
| Rule parameters |
Key |
Value |
Description |
| input |
reference |
config.get("reference", {}).get("fasta", "") |
target/reference genome fasta file |
| output |
mmi |
expand( "alignment/pbmm2_index/{ref}.{preset}.mmi", ref=os.path.basename(config.get("reference", {}).get("fasta", "")), preset=config.get("pbmm2_align", {}).get("preset", ""), ) |
pbmm2 index file of the target/reference genome |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| preset |
string |
preset for indexing the target genome |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Merge pbmm2 .bam files from the same sample using samtools merge.
Rule
rule pbmm2_merge:
input:
bams=lambda wildcards: [
"alignment/pbmm2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode)
for u in get_units(units, wildcards)
],
output:
bam=temp("alignment/pbmm2_align/{sample}_{type}.bam"),
params:
extra=config.get("pbmm2_merge", {}).get("extra", ""),
log:
"alignment/pbmm2_align/{sample}_{type}.bam.log",
benchmark:
repeat(
"alignment/pbmm2_align/{sample}_{type}.bam.benchmark.tsv",
config.get("pbmm2_align", {}).get("benchmark_repeats", 1),
)
threads: config.get("pbmm2_merge", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pbmm2_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pbmm2_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("pbmm2_merge", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("pbmm2_merge", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pbmm2_merge", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pbmm2_merge", {}).get("container", config["default_container"])
message:
"{rule}: merge bam file {input} using samtools"
wrapper:
"v3.9.0/bio/samtools/merge"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
lambda wildcards: [ "alignment/pbmm2_align/{sample}_{type}_%s_%s.bam" % (u.processing_unit, u.barcode) for u in get_units(units, wildcards) ] |
bam files from the same sample (and the same sample type) a list of bam files is obtained using the information in the units.tsv file
|
| output |
bam |
"alignment/pbmm2_align/{sample}_{type}.bam" |
a sorted merged bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Generate a bam file for a single chromosome with duplicates marked
Rule
rule picard_mark_duplicates:
input:
bams="alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam",
output:
bam=temp("alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam"),
metrics=temp("alignment/picard_mark_duplicates/{sample}_{type}_{chr}.metrics.txt"),
params:
extra=config.get("picard_mark_duplicates", {}).get("extra", ""),
log:
"alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam.log",
benchmark:
repeat(
"alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam.benchmark.tsv",
config.get("picard_mark_duplicates", {}).get("benchmark_repeats", 1),
)
threads: config.get("picard_mark_duplicates", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("picard_mark_duplicates", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("picard_mark_duplicates", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("picard_mark_duplicates", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("picard_mark_duplicates", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("picard_mark_duplicates", {}).get("time", config["default_resources"]["time"]),
container:
config.get("picard_mark_duplicates", {}).get("container", config["default_container"])
message:
"{rule}: mark duplicates in {input} using picard"
wrapper:
"v1.25.0/bio/picard/markduplicates"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
"alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam" |
bam file containing one chromosome |
| output |
bam |
"alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam" |
duplicate marked bam file containing one chromosome |
| metrics |
"alignment/picard_mark_duplicates/{sample}_{type}_{chr}.metrics.txt" |
duplicate statistics for qc |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Generate a bam file for a non-chromosomal contigs and unmapped reads with duplicates marked
Rule
rule picard_mark_duplicates_non_chr:
input:
bams="alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam",
output:
bam=temp("alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam"),
metrics=temp("alignment/picard_mark_duplicates/{sample}_{type}_non_chr.metrics.txt"),
params:
extra=config.get("picard_mark_duplicates_non_chr", {}).get("extra", ""),
log:
"alignment/picard_mark_duplicates_non_chr/{sample}_{type}.output.log",
benchmark:
repeat(
"alignment/picard_mark_duplicates_non_chr/{sample}_{type}.output.benchmark.tsv",
config.get("picard_mark_duplicates_non_chr", {}).get("benchmark_repeats", 1),
)
threads: config.get("picard_mark_duplicates_non_chr", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("picard_mark_duplicates_non_chr", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("picard_mark_duplicates_non_chr", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("picard_mark_duplicates_non_chr", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("picard_mark_duplicates_non_chr", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("picard_mark_duplicates_non_chr", {}).get("time", config["default_resources"]["time"]),
container:
config.get("picard_mark_duplicates", {}).get("container", config["default_container"])
message:
"{rule}: mark duplicates in {input.bams} using picard"
wrapper:
"v1.25.0/bio/picard/markduplicates"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
"alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam" |
bam file containing non-chromosomal contigs requested in the config and unmapped reads |
| output |
bam |
"alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam" |
duplicate marked bam file containing non-chromosomal contigs requested in the config and unmapped reads |
| metrics |
"alignment/picard_mark_duplicates/{sample}_{type}_non_chr.metrics.txt" |
duplicate statistics for qc |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Extract reads from each chromosome and put into separate .bam files using samtools view.
Rule
rule samtools_extract_reads:
input:
bam="alignment/bwa_mem/{sample}_{type}.bam",
bai="alignment/bwa_mem/{sample}_{type}.bam.bai",
output:
bam=temp("alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam"),
params:
extra=config.get("samtools_extract_reads", {}).get("extra", ""),
log:
"alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam.log",
benchmark:
repeat(
"alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam.benchmark.tsv",
config.get("samtools_extract_reads", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_extract_reads", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_extract_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_extract_reads", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_extract_reads", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_extract_reads", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_extract_reads", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_extract_reads", {}).get("container", config["default_container"])
message:
"{rule}: create bam {output} with only reads from {wildcards.chr}"
shell:
"(samtools view -@ {threads} {params.extra} -b {input} {wildcards.chr} > {output}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem/{sample}_{type}.bam" |
bam file |
| bai |
"alignment/bwa_mem/{sample}_{type}.bam.bai" |
bam index file |
| output |
bam |
"alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam" |
one bam file for each chromosome |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Extract reads from non-chromosomal contigs and unmapped reads to separate .bam files using samtools view.
Rule
rule samtools_extract_reads_non_chr:
input:
bam="alignment/bwa_mem/{sample}_{type}.bam",
bai="alignment/bwa_mem/{sample}_{type}.bam.bai",
output:
bam=temp("alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam"),
params:
contigs=get_contig_list,
extra=config.get("samtools_extract_reads_non_chr", {}).get("extra", ""),
log:
"alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam.log",
benchmark:
repeat(
"alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam.benchmark.tsv",
config.get("samtools_extract_reads_non_chr", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_extract_reads_non_chr", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_extract_reads_non_chr", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_extract_reads_non_chr", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("samtools_extract_reads_non_chr", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_extract_reads_non_chr", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_extract_reads_non_chr", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_extract_reads_non_chr", {}).get("container", config["default_container"])
message:
"{rule}: create bam {output} with only reads from {params.contigs}"
shell:
"(samtools view -@ {threads} {params.extra} -b {input} {params.contigs} '*' > {output}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem/{sample}_{type}.bam" |
bam file |
| bai |
"alignment/bwa_mem/{sample}_{type}.bam.bai" |
bam index file |
| output |
bam |
"alignment/samtools_extract_reads/{sample}_{type}_non_chr.bam" |
one bam file containing non-chromosomal contigs, requested using merged_contigs in config, and unmapped reads |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Extract reads from each chromosome and put into separate .bam files using samtools view.
Rule
rule samtools_extract_reads_umi:
input:
bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
bai="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai",
output:
bam=temp("alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam"),
params:
extra=config.get("samtools_extract_reads", {}).get("extra", ""),
log:
"alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam.log",
benchmark:
repeat(
"alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.bam.benchmark.tsv",
config.get("samtools_extract_reads_umi", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_extract_reads_umi", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_extract_reads_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_extract_reads_umi", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_extract_reads_umi", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_extract_reads_umi", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_extract_reads_umi", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_extract_reads_umi", {}).get("container", config["default_container"])
message:
"{rule}: create bam {output} with only reads from {wildcards.chr}"
shell:
"(samtools view -@ {threads} {params.extra} -b {input} {wildcards.chr} > {output}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" |
bam file |
| bai |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai" |
bam index file |
| output |
bam |
"alignment/samtools_extract_reads_umi/{sample}_{type}_{chr}.umi.bam" |
one bam file for each chromosome |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Extract reads from contigs specified by merged_contigs and put into separate .bam files using samtools view.
Rule
rule samtools_extract_reads_non_chr_umi:
input:
bam="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam",
bai="alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai",
output:
bam=temp("alignment/samtools_extract_reads/{sample}_{type}_non_chr.umi.bam"),
params:
contigs=get_contig_list,
extra=config.get("samtools_extract_reads_non_chr_umi", {}).get("extra", ""),
log:
"alignment/samtools_extract_reads_non_chr_umi/{sample}_{type}_non_chr.umi.bam.log",
benchmark:
repeat(
"alignment/samtools_extract_reads_non_chr_umi/{sample}_{type}_non_chr.umi.bam.benchmark.tsv",
config.get("samtools_extract_reads_non_chr_umi", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_extract_reads_non_chr_umi", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_extract_reads_non_chr_umi", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_extract_reads_non_chr_umi", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("samtools_extract_reads_non_chr_umi", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_extract_reads_non_chr_umi", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_extract_reads_non_chr_umi", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_extract_reads_non_chr_umi", {}).get("container", config["default_container"])
message:
"{rule}: create bam {output} with only reads from {params.contigs}"
shell:
"(samtools view -@ {threads} {params.extra} -b {input} {params.contigs} '*' > {output}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam" |
bam file |
| bai |
"alignment/bwa_mem_realign_consensus_reads/{sample}_{type}.umi.bam.bai" |
bam index file |
| output |
bam |
"alignment/samtools_extract_reads/{sample}_{type}_non_chr.umi.bam" |
one bam file containing non-chromosomal contigs, requested using merged_contigs in config, and unmapped reads |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Converts a bam file to separate fastq files
Rule
rule samtools_fastq:
input:
bam="alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam",
output:
fastq1="alignment/samtools_fastq/{sample}_{type}.fastq1.umi.fastq.gz",
fastq2="alignment/samtools_fastq/{sample}_{type}.fastq2.umi.fastq.gz",
params:
sort=config.get("samtools_fastq", {}).get("sort", "-m 4G"),
fastq=config.get("samtools_fastq", {}).get("fastq", "-n"),
log:
"alignment/samtools_fastq/{sample}_{type}.output.log",
benchmark:
repeat(
"alignment/samtools_fastq/{sample}_{type}.output.benchmark.tsv",
config.get("samtools_fastq", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_fastq", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_fastq", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_fastq", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_fastq", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_fastq", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_fastq", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_fastq", {}).get("container", config["default_container"])
message:
"{rule}: Convert the bam file {input.bam} into a fastq file"
wrapper:
"v2.6.0/bio/samtools/fastq/separate"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/fgbio_call_and_filter_consensus_reads/{sample}_{type}.umi.unmapped_bam" |
input bam file |
| output |
fastq1 |
"alignment/samtools_fastq/{sample}_{type}.fastq1.umi.fastq.gz" |
fastq file with the first read in the read pair |
| fastq2 |
"alignment/samtools_fastq/{sample}_{type}.fastq2.umi.fastq.gz" |
fastq file with the second read in the read pair |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| sort |
string |
parameters that should be forwarded to samtools sort |
| fastq |
string |
parameters that should be forwarded to samtools fastq |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory used per cpu NOTE: should be at least the amount put into the sort parameters
|
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available RECOMMENDATION: use at least 3 threads so that sort gets at least 1 exclusive thread NOTE: if multiple threads is used the memory must also be increased (mem_mb)
|
Index .bam files using samtools index.
Rule
rule samtools_index:
input:
bam="{file}.bam",
output:
bai=temp("{file}.bam.bai"),
params:
extra=config.get("samtools_index", {}).get("extra", ""),
log:
"{file}.bam.bai.log",
benchmark:
repeat(
"{file}.bam.bai.benchmark.tsv",
config.get("samtools_index", {}).get("benchmark_repeats", 1),
)
container:
config.get("samtools_index", {}).get("container", config["default_container"])
threads: config.get("samtools_index", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_index", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_index", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_index", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_index", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_index", {}).get("time", config["default_resources"]["time"]),
message:
"{rule}: create index for {input.bam}"
wrapper:
"v1.1.0/bio/samtools/index"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"{file}.bam" |
bam file |
| output |
bai |
"{file}.bam.bai" |
bam index file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Merge .bam files from the same sample using samtools merge.
Rule
rule samtools_merge_bam:
input:
bams=get_chrom_bams,
non_chr_bams="alignment/picard_mark_duplicates/{sample}_{type}_non_chr.bam"
if config.get("reference", {}).get("merge_contigs", None) is not None
else [],
output:
bam=temp("alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam"),
params:
extra=config.get("samtools_merge_bam", {}).get("extra", ""),
log:
"alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam.log",
benchmark:
repeat(
"alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam.benchmark.tsv",
config.get("samtools_merge_bam", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_merge_bam", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_merge_bam", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_merge_bam", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_merge_bam", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_merge_bam", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_merge_bam", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools", {}).get("container", config["default_container"])
message:
"{rule}: merge chr bam files, creating {output}"
wrapper:
"v1.1.0/bio/samtools/merge"
| Rule parameters |
Key |
Value |
Description |
| input |
bams |
get_chrom_bams |
list of bam files for all the chromosomes the list is generated by the function extract_chr defined in the hydra-genetics module
|
| output |
bam |
"alignment/samtools_merge_bam/{sample}_{type}_unsorted.bam" |
merged unsorted bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded RECOMMENDED: use -c -p to only keep one of the read groups IDs when merging files from the same sample
|
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Sort .bam files using samtools sort.
Rule
rule samtools_sort:
input:
bam="{file}_unsorted.bam",
output:
bam=temp("{file}.bam"),
params:
extra=config.get("samtools_sort", {}).get("extra", ""),
log:
"{file}.bam.sort.log",
benchmark:
repeat(
"{file}.bam.sort.benchmark.tsv",
config.get("samtools_sort", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_sort", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_sort", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_sort", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_sort", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_sort", {}).get("container", config["default_container"])
message:
"{rule}: sort bam file {input.bam} using samtools"
wrapper:
"v2.0.0/bio/samtools/sort"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"{file}_unsorted.bam" |
unsorted bam file |
| output |
bam |
"{file}.bam" |
sorted bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Sort .bam files using samtools sort. Sort on query name.
Rule
rule samtools_sort:
input:
bam="{file}_unsorted.bam",
output:
bam=temp("{file}.bam"),
params:
extra=config.get("samtools_sort", {}).get("extra", ""),
log:
"{file}.bam.sort.log",
benchmark:
repeat(
"{file}.bam.sort.benchmark.tsv",
config.get("samtools_sort", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_sort", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_sort", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_sort", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_sort", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_sort", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_sort", {}).get("container", config["default_container"])
message:
"{rule}: sort bam file {input.bam} using samtools"
wrapper:
"v2.0.0/bio/samtools/sort"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"{file}_unsorted.bam" |
unsorted bam file |
| output |
bam |
"{file}.bam" |
sorted bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Filter .bam files using samtools view.
Rule
rule samtools_filter_reads:
input:
bam="alignment/samtools_merge_bam/{sample}_{type}.bam",
output:
bam=temp("alignment/samtools_filter_reads/{sample}_{type}.bam"),
params:
extra=config.get("samtools_filter_reads", {}).get("extra", "-f 2"),
log:
"alignment/samtools_filter_reads/{sample}_{type}.bam.log",
benchmark:
repeat(
"alignment/samtools_filter_reads/{sample}_{type}.bam.benchmark.tsv",
config.get("samtools_filter_reads", {}).get("benchmark_repeats", 1),
)
threads: config.get("samtools_filter_reads", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("samtools_filter_reads", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("samtools_filter_reads", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("samtools_filter_reads", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("samtools_filter_reads", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("samtools_filter_reads", {}).get("time", config["default_resources"]["time"]),
container:
config.get("samtools_filter_reads", {}).get("container", config["default_container"])
message:
"{rule}: filter reads in {input.bam} with {params.extra}"
shell:
"(samtools view -@ {threads} {params.extra} -b {input.bam} > {output.bam}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/samtools_merge_bam/{sample}_{type}.bam" |
input bam file |
| output |
bam |
"alignment/samtools_filter_reads/{sample}_{type}.bam" |
filtered bam file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
Align .fastq files to a reference genome and generate a .bam file. Star is a split read aware aligner for RNA-data.
Rule
rule star:
input:
fq1="prealignment/merged/{sample}_{type}_fastq1.fastq.gz",
fq2="prealignment/merged/{sample}_{type}_fastq2.fastq.gz",
idx=config.get("star", {}).get("genome_index", ""),
output:
bam=temp("alignment/star/{sample}_{type}.bam"),
sj=temp("alignment/star/{sample}_{type}.SJ.out.tab"),
params:
extra=config.get("star", {}).get("extra", "--outSAMtype BAM SortedByCoordinate"),
idx="{input.idx}",
log:
"alignment/star/{sample}_{type}.bam.log",
benchmark:
repeat("alignment/star/{sample}_{type}.bam.benchmark.tsv", config.get("star", {}).get("benchmark_repeats", 1))
threads: config.get("star", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("star", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("star", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("star", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("star", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("star", {}).get("time", config["default_resources"]["time"]),
container:
config.get("star", {}).get("container", config["default_container"])
message:
"{rule}: align with star, creating {output.bam}"
wrapper:
"v1.3.2/bio/star/align"
| Rule parameters |
Key |
Value |
Description |
| input |
fq1 |
"prealignment/merged/{sample}_{type}_fastq1.fastq.gz" |
merged fastq file from read 1 |
| fq2 |
"prealignment/merged/{sample}_{type}_fastq2.fastq.gz" |
merged fastq file from read 2 |
| idx |
config.get("star", {}).get("genome_index", "") |
star reference genome index file location is set in config.yaml
|
| output |
bam |
"alignment/star/{sample}_{type}.bam" |
aligned bam file |
| sj |
"alignment/star/{sample}_{type}.SJ.out.tab" |
junction file with split read information useful for interpreting rna data |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded RECOMMENDATION: --outSAMtype BAM SortedByCoordinate this will output coordinate sorted bam files instead of a sam file
|
| genome_index |
string |
path to star reference index |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available RECOMMENDATION: use multiple threads for decreased run time. NOTE: if multiple threads is used the memory must also be increased (mem_mb)
|