You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using all-against-all read mapping, yacrd performs:
10
4
11
5
1. computation of pile-up coverage for each read
@@ -16,7 +10,7 @@ Chimera detection is done as follows:
16
10
1. for each region where coverage is smaller or equal than `min_coverage` (default 0), yacrd creates a _bad region_.
17
11
2. if there is a _bad region_ that starts at a position strictly after the beginning of the read and ends strictly before the end of the read, the read is marked as `Chimeric`
18
12
3. if total _bad region_ length > 0.8 * read length, the read is marked as `NotCovered`
19
-
4. if read isn't `Chimeric` or `NotCovered` is `NotBad`
13
+
4. if a read isn't `Chimeric` or `NotCovered` is `NotBad`
For nanopore data, we recommand to use minimap2 with all-vs-all nanopore preset with maximal distance between seeds fixe to 500 (option `-g 500`) to generate overlap. We recommand to run yacrd with minimal coverage fixed to 4 (option `-c`) and minimal coverage of read fixed to 0.4 (option `-n`).
95
+
For nanopore data, we recommend using minimap2 with all-vs-all nanopore preset with a maximal distance between seeds fixe to 500 (option `-g 500`) to generate overlap. We recommend to run yacrd with minimal coverage fixed to 4 (option `-c`) and minimal coverage of read fixed to 0.4 (option `-n`).
For pacbio P6-C4 data, we recommand to use minimap2 with all-vs-all pacbio preset with maximal distance between seeds fixe to 800 (option `-g 800`) to generate overlap. We recommand to run yacrd with minimal coverage fixed to 4 (option `-c 4`) and minimal coverage of read fixed to 0.4 (option `-n 0.4`).
103
+
For pacbio P6-C4 data, we recommend to use minimap2 with all-vs-all pacbio preset with a maximal distance between seeds fixe to 800 (option `-g 800`) to generate overlap. We recommend to run yacrd with minimal coverage fixed to 4 (option `-c 4`) and minimal coverage of read fixed to 0.4 (option `-n 0.4`).
For pacbio Sequel data, we recommand to use minimap2 with all-vs-all pacbio preset with maximal distance between seeds fixe to 5000 (option `-g 5000`) to generate overlap. We recommand to run yacrd with minimal coverage fixed to 3 (option `-c 3`) and minimal coverage of read fixed to 0.4 (option `-n 0.4`).
110
+
For pacbio Sequel data, we recommend to use minimap2 with all-vs-all pacbio preset with a maximal distance between seeds fixe to 5000 (option `-g 5000`) to generate overlap. We recommand to run yacrd with minimal coverage fixed to 3 (option `-c 3`) and minimal coverage of read fixed to 0.4 (option `-n 0.4`).
@@ -133,7 +127,7 @@ yacrd use extension to detect format file if your filename contains (anywhere):
133
127
134
128
#### Compression
135
129
136
-
yacrd automaticly detect file if is compress or not (gzip, bzip2 and lzma compression is avaible). For post-detection operation if input is compress output have same compression.
130
+
yacrd automatically detect file if is compress or not (gzip, bzip2 and lzma compression is available). For post-detection operation, if input is compressed output have the same compression format.
137
131
138
132
#### Use yacrd report as input
139
133
@@ -142,13 +136,13 @@ You can use yacrd report as input in place of overlap file, `ondisk` option are
Yacrd use overlap between reads, to detect 'good' and 'bad' region,
30
-
region with coverage over threshold is 'good' other are 'bad'.
31
-
If read have a 'bad' region in middle this reads is mark as 'Chimeric'.
32
-
If ratio of 'bad' region length on total read length is larger than threshold this reads is mark as 'Not_covered'.
30
+
a region with coverage over the threshold is 'good' others are 'bad'.
31
+
If read has a 'bad' region in middle this reads is mark as 'Chimeric'.
32
+
If the ratio of 'bad' region length on total read length is larger than threshold this reads is mark as 'Not_covered'.
33
33
34
34
Yacrd can make some other actions:
35
-
- filter: for sequence or overlap file, record with reads marked as Chimeric or Not_covered isn't write in output
36
-
- extract: for sequence or overlap file, record contain reads marked as Chimeric or Not_covered is write in output
37
-
- split: for sequence file bad region in middle of reads are removed, Not_covered read is removed
38
-
- scrubb: for sequence file all bad region are removed, Not_covered read is removed
35
+
- filter: for sequence or overlap file, record with reads marked as Chimeric or NotCovered isn't written in the output
36
+
- extract: for sequence or overlap file, record contains reads marked as Chimeric or NotCovered is written in the output
37
+
- split: for sequence file bad region in the middle of reads are removed, NotCovered read is removed
38
+
- scrubb: for sequence file all bad region are removed, NotCovered read is removed
39
39
"
40
40
)]
41
41
pubstructCommand{
42
42
#[structopt(
43
43
short = "i",
44
44
long = "input",
45
45
required = true,
46
-
help = "path to input file overlap (.paf|.m4) or yacrd report (.yacrd) format audetected input-format overide detection"
46
+
help = "path to input file overlap (.paf|.m4|.mhap) or yacrd report (.yacrd), format is autodetect and compression input is allowed (gz|bzip2|lzma)"
47
47
)]
48
48
pubinput:String,
49
49
50
50
#[structopt(
51
51
short = "o",
52
52
long = "output",
53
53
required = true,
54
-
help = "path output file, yacrd format by default output-format can overide this value"
54
+
help = "path output file"
55
55
)]
56
56
puboutput:String,
57
57
58
-
#[structopt(long = "input-format", possible_values = &["paf","m4","yacrd","json"], help = "set the input-format")]
59
-
pubinput_format:Option<String>,
60
-
61
-
#[structopt(long = "output-format", possible_values = &["yacrd","json"], default_value = "yacrd", help = "set the output-format")]
62
-
puboutput_format:String,
63
-
64
58
#[structopt(
65
59
short = "c",
66
60
long = "coverage",
@@ -73,21 +67,21 @@ pub struct Command {
73
67
short = "n",
74
68
long = "not-coverage",
75
69
default_value = "0.8",
76
-
help = "if ratio of bad region length on total lengh is lower that this value, all read is mark as bad"
70
+
help = "if the ratio of bad region length on total length is lower than this value, read is marked as NotCovered"
77
71
)]
78
72
pubnot_coverage:f64,
79
73
80
74
#[structopt(
81
75
short = "d",
82
76
long = "ondisk",
83
-
help = "if it set yacrd create tempory file, with value of this parameter as prefix, to reduce memory usage but increase the runtime, warning if prefix contain path separator (`/` for unix or `\\` for windows) directory is delete"
77
+
help = "yacrd switches to 'ondisk' mode which will reduce memory usage but increase computation time. The value passed as a parameter is used as a prefix for the temporary files created by yacrd. Be careful if the prefix contains path separators (`/` for unix or `\\` for windows) this folder will be deleted"
84
78
)]
85
79
pubondisk:Option<String>,
86
80
87
81
#[structopt(
88
82
long = "ondisk-buffer-size",
89
83
default_value = "64000000",
90
-
help = "with the default value yacrd in ondisk mode use around 800 MBytes, you can increase to reduce runtime but increase memory usage"
84
+
help = "with the default value yacrd in 'ondisk' mode use around 1 GBytes, you can increase to reduce runtime but increase memory usage"
91
85
)]
92
86
pubondisk_buffer_size:String,
93
87
@@ -99,11 +93,11 @@ pub struct Command {
99
93
pubenumSubCommand{
100
94
#[structopt(about = "All bad region of read is removed")]
101
95
Scrubb(Scrubb),
102
-
#[structopt(about = "Record mark as chimeric or Not_covered is filter")]
96
+
#[structopt(about = "Record mark as chimeric or NotCovered is filter")]
103
97
Filter(Filter),
104
-
#[structopt(about = "Record mark as chimeric or Not_covered is extract")]
98
+
#[structopt(about = "Record mark as chimeric or NotCovered is extract")]
105
99
Extract(Extract),
106
-
#[structopt(about = "Record mark as chimeric or Not_covered is split")]
100
+
#[structopt(about = "Record mark as chimeric or NotCovered is split")]
0 commit comments