Skip to content

Commit 47bcc0e

Browse files
Improve documentation for findorfs function and NaiveFinder algorithm; clarify usage examples and enhance explanations of keyword arguments and output.
1 parent 2b8b55d commit 47bcc0e

File tree

1 file changed

+16
-7
lines changed

1 file changed

+16
-7
lines changed

docs/src/getstarted.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
11
## Finding complete and overlapped ORFIs
22

3-
The main package function is `findorfs`. Under the hood, the `findorfs` function is an interface for different gene finding algorithms that can be plugged using the `finder` keyword argument. By default it uses the `NaiveFinder` algorithm, which is a simple algorithm that finds all (non-outbounded) ORFIs in a DNA sequence (see the [NaiveFinder](https://camilogarciabotero.github.io/GeneFinder.jl/dev/api/#GeneFinder.NaiveFinder-Union{Tuple{Union{BioSequences.LongDNA{N},%20BioSequences.LongSubSeq{BioSequences.DNAAlphabet{N}}}},%20Tuple{N}}%20where%20N) documentation for more details).
3+
The main function in the GeneFinder package is `findorfs`, which serves as an interface to various gene-finding algorithms. By default, `findorfs` uses a `NaiveFinder` algorithm, a simple approach that detects all non-outbounded Open Reading Frames (ORFs) in a DNA sequence. You can also specify a different algorithm by setting the `finder` keyword argument. For more details on the NaiveFinder algorithm, see [NaiveFinder](https://camilogarciabotero.github.io/GeneFinder.jl/dev/api/#GeneFinder.NaiveFinder-Union{Tuple{Union{BioSequences.LongDNA{N},%20BioSequences.LongSubSeq{BioSequences.DNAAlphabet{N}}}},%20Tuple{N}}%20where%20N) documentation for more details.
44

55
!!! note
6-
The `minlen` kwarg in the `NaiveFinder` mehtod has been set to 6nt, so it will catch random ORFIs not necesarily genes thus it might consider `dna"ATGTGA"` -> `aa"M*"` as a plausible ORFI.
6+
The minlen keyword argument in `NaiveFinder` is set to a minimum length of 6 nucleotides (nt). As a result, it may identify short ORFs that aren't necessarily genes, such as dna"ATGTGA" producing the amino acid sequence aa"M*".
77

8-
Here is an example of how to use the `findorfs` function with the `NaiveFinder` algorithm:
8+
9+
## Usage example
10+
11+
Here's an example of using `findorfs` with the `NaiveFinder` algorithm to identify ORFs in a DNA sequence:
912

1013
```julia
11-
using BioSequences, GeneFinder
14+
julia> using BioSequences, GeneFinder
1215

1316
# > 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)
1417
seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"
@@ -30,10 +33,12 @@ orfs = findorfs(seq, finder=NaiveFinder) # use finder=NaiveCollector as an alter
3033
ORFI{NaiveFinder}(695:706, '+', 2)
3134
```
3235

36+
## Extracting Sequences from ORFIs
37+
3338
The `ORFI` structure displays the location, frame, and strand, but currently does not include the sequence *per se*. To extract the sequence of an `ORFI` instance, you can use the `sequence` method directly on it, or you can also broadcast it over the `orfs` collection using the dot syntax `.`:
3439

3540
```julia
36-
sequence.(orfs)
41+
julia> sequence.(orfs)
3742

3843
12-element Vector{LongSubSeq{DNAAlphabet{4}}}:
3944
ATGCAACCCTGA
@@ -50,10 +55,12 @@ sequence.(orfs)
5055
ATGCAACCCTGA
5156
```
5257

58+
## Translating ORFIs to Amino Acid Sequences
59+
5360
Similarly, you can extract the amino acid sequences of the ORFIs using the `translate` function.
5461

5562
```julia
56-
translate.(orfs)
63+
julia> translate.(orfs)
5764

5865
12-element Vector{LongAA}:
5966
MQP*
@@ -68,4 +75,6 @@ translate.(orfs)
6875
M*
6976
MCPTAA*
7077
MQP*
71-
```
78+
```
79+
80+
This returns a vector of translated amino acid sequences, allowing for easy interpretation of each ORF's potential protein product.

0 commit comments

Comments
 (0)