-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into DOR-993_fix_auto_batchsize_for_short_chunk…
…_supv5
- Loading branch information
Showing
46 changed files
with
1,406 additions
and
391 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
### Custom Adapter and Primer Sequences | ||
|
||
Dorado will normally automatically detect and trim any adapter or primer sequences it finds. The specific sequences it searches for depend on the specified sequencing kit. This applies to both the basecaller subcommand, where the kit name is expected to be embedded in the read in the input pod5 file, and the trim subcommand, where the kit must be specified as a command-line option to dorado. | ||
|
||
In some cases, it may be necessary to find and remove adapter and/or primer sequences that would not normally be associated with the sequencing kit that was used, or you may be working with older data for which the sequencing kit and/or primers being used are no longer directly supported by dorado (for example, anything prior to kit14). In such cases, you can specify a custom adapter/primer file, using the command-line option `--primer-sequences`. | ||
|
||
If this option is used, then the sequences encoded in the specified file will be used instead of the built-in sequences that dorado normally searches for. | ||
|
||
#### Custom adapter/primer file format | ||
|
||
The custom adapter/primer file is really just a fasta file, with the desired sequences specified within. However, some additional metadata is needed to allow dorado to properly interpret how the sequences should be used. | ||
|
||
* The record name for each sequence must be of the form `[id]_front` or `[id]_rear`. | ||
* The `id` part of the record name may occur, at most, twice in the file: Once with `_front` and once with `_rear`. | ||
* Immediately following the record name must be a space, followed by either `type=adapter` or `type=primer`. | ||
* Following the type designator, you can have an additional space, followed by `kits=[kit1],[kit2],[kit3][etc...]`. | ||
|
||
The `_front` and `_rear` part of the record name tells dorado how to search for the sequence. In the case of adapters, dorado will look for the `front` sequence near the beginning of the read, and for the `rear` sequence near the end of the read. For primers, dorado also look for the `front` and `rear` sequences at the beginning and end of the read, just as with adapters, but it will also look for the reverse-complement of the `rear` sequence near the beginning of the read, and for the reverse-complement of the `front` sequence near the end of the read. | ||
|
||
The `type` designator is required to designate whether the sequence in an adapter or a primer sequence, so that dorado knows how it should be used. | ||
|
||
The `kits` designator is optional. If provided, then the sequence will only be searched for if the sequencing-kit information in the read matches one of the kit names in the custom file. If the `kits` designator is not provided, then the sequence will be searched for in all reads, regardless of the kit that was used. Note that the kit names are case-insensitive. | ||
|
||
#### Example custom adapter/primer file. | ||
|
||
The following could be used to detect the PCR_PSK_rev1 and PCR_PSK_rev2 primers, along with the LSK109 adapters, for older data. | ||
|
||
``` | ||
>LSK109_front type=adapter | ||
AATGTACTTCGTTCAGTTACGTATTGCT | ||
>LSK109_rear type=adapter | ||
AGCAATACGTAACTGAACGAAGT | ||
>PCR_PSK_front type=primer | ||
ACTTGCCTGTCGCTCTATCTTCGGCGTCTGCTTGGGTGTTTAACC | ||
>PCR_PSK_rear type=primer | ||
AGGTTAAACACCCAAGCAGACGCCGCAATATCAGCACCAACAGAAA | ||
``` | ||
|
||
In this case, the above adapters and primers would be searched for in all reads, regardless of the sequencing-kit information encoded in the read file, or in the case of dorado trim, regardless of the sequencing-kit specified on the command-line. If you wanted to restrict the software so that the primers would only be searched for in reads with `SQK-PSK004` specified as the kit name, and the adapters would only be searched for if the kit name was specified as either `SQK-PSK004` or `SQK-LSK109`, then the following could be used. | ||
|
||
``` | ||
>LSK109_front type=adapter kits=SQK-PSK004,SQK-LSK109 | ||
AATGTACTTCGTTCAGTTACGTATTGCT | ||
>LSK109_rear type=adapter kits=SQK-PSK004,SQK-LSK109 | ||
AGCAATACGTAACTGAACGAAGT | ||
>PCR_PSK_front type=primer kits=SQK-PSK004 | ||
ACTTGCCTGTCGCTCTATCTTCGGCGTCTGCTTGGGTGTTTAACC | ||
>PCR_PSK_rear type=primer kits=SQK-PSK004 | ||
AGGTTAAACACCCAAGCAGACGCCGCAATATCAGCACCAACAGAAA | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -588,9 +588,6 @@ const polisher::ModelConfig resolve_model(const polisher::BamInfo& bam_info, | |
// Example: [email protected]_polish_rl_mv | ||
std::string model_name = basecaller_model + polish_model_suffix; | ||
|
||
// Example: dna_r10.4.1_e8.2_400bps_hac_v5.0.0_polish_rl_mv | ||
std::replace(std::begin(model_name), std::end(model_name), '@', '_'); | ||
|
||
spdlog::info("Downloading model: '{}'", model_name); | ||
model_dir = download_model(model_name); | ||
|
||
|
@@ -613,9 +610,6 @@ const polisher::ModelConfig resolve_model(const polisher::BamInfo& bam_info, | |
// Example: [email protected]_polish_rl_mv | ||
std::string model_name = basecaller_model + polish_model_suffix; | ||
|
||
// Example: dna_r10.4.1_e8.2_400bps_hac_v5.0.0_polish_rl_mv | ||
std::replace(std::begin(model_name), std::end(model_name), '@', '_'); | ||
|
||
spdlog::info("Downloading model: '{}'", model_name); | ||
model_dir = download_model(model_name); | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.