-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy pathbloomfilter.sh
More file actions
executable file
·106 lines (90 loc) · 3.93 KB
/
bloomfilter.sh
File metadata and controls
executable file
·106 lines (90 loc) · 3.93 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#!/bin/bash
usage(){
echo "
Written by Brian Bushnell
Last modified September 20, 2022
Description: Filters reads potentially sharing a kmer with a reference.
The more memory, the higher the accuracy. Reads going to outu are guaranteed
to not match the reference, but reads going to outm might may or may not
match the reference.
Usage: bloomfilter.sh in=<input file> out=<nonmatches> outm=<matches> ref=<reference>
Example:
bloomfilter.sh in=reads.fq outm=nonhuman.fq out=human.fq k=31 minhits=3 ref=human.fa
Error correction and depth filtering can be done simultaneously.
File parameters:
in=<file> Primary input, or read 1 input.
in2=<file> Read 2 input if reads are in two files.
outm=<file> (out) Primary matched read output.
outm2=<file> (out2) Matched read 2 output if reads are in two files.
outu=<file> Primary unmatched read output.
outu2=<file> Unmatched read 2 output if reads are in two files.
outc=<file> Optional output stream for kmer counts.
ref=<file> Reference sequence file, or a comma-delimited list.
For depth-based filtering, set this to the same as the input.
overwrite=t (ow) Set to false to force the program to abort rather than
overwrite an existing file.
Hashing parameters:
k=31 Kmer length.
hashes=2 Number of hashes per kmer. Higher generally reduces
false positives at the expense of speed.
sw=t (symmetricwrite) Increases accuracy when bits>1 and hashes>1.
minprob=0.5 Ignore reference kmers with probability of being correct
below this (affects fastq references only).
memmult=1.0 Fraction of free memory to use for Bloom filter. 1.0 should
generally work; if the program crashes with an out of memory
error, set this lower. Higher increases specificity.
cells= Option to set the number of cells manually. By default this
will be autoset to use all available memory. The only reason
to set this is to ensure deterministic output.
seed=0 This will change the hash function used.
bits= Bits per cell; it is set automatically from mincount.
Reference-matching parameters:
minhits=3 Consecutive kmer hits for a read to be considered matched.
Higher reduces false positives at the expense of sensitivity.
mincount=1 Minimum number of times a read kmer must occur in the
reference to be considered a match (or printed to outc).
requireboth=f Require both reads in a pair to match the ref in order to go
to outm. By default, pairs go to outm if either matches.
Java Parameters:
-Xmx This will set Java's memory usage, overriding autodetection.
-Xmx20g will specify 20 gigs of RAM, and -Xmx200m will
specify 200 megs. The max is typically 85% of physical memory.
-eoom This flag will cause the process to exit if an out-of-memory
exception occurs. Requires Java 8u92+.
-da Disable assertions.
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org
"
}
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
usage
exit
fi
resolveSymlinks(){
SCRIPT="$(cd "$(dirname "$0")" && pwd)/$(basename "$0")"
while [ -h "$SCRIPT" ]; do
DIR="$(dirname "$SCRIPT")"
SCRIPT="$(readlink "$SCRIPT")"
[ "${SCRIPT#/}" = "$SCRIPT" ] && SCRIPT="$DIR/$SCRIPT"
done
DIR="$(cd "$(dirname "$SCRIPT")" && pwd)"
if [ -f "$DIR/bbtools.jar" ]; then
CP="$DIR/bbtools.jar"
else
CP="$DIR/current/"
fi
}
setEnv(){
. "$DIR/javasetup.sh"
. "$DIR/memdetect.sh"
parseJavaArgs "--xmx=4000m" "--xms=4000m" "--percent=84" "--mode=auto" "$@"
setEnvironment
}
launch() {
CMD="java $EA $EOOM $SIMD $XMX $XMS -cp $CP bloom.BloomFilterWrapper $@"
echo "$CMD" >&2
eval $CMD
}
resolveSymlinks
setEnv "$@"
launch "$@"