-
Notifications
You must be signed in to change notification settings - Fork 499
added orthologer tool #7604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
added orthologer tool #7604
Changes from 21 commits
57601c7
bc871dc
9086a62
5dbdeb7
af75fe4
600a161
4c5000c
838f730
6a54b53
5130974
3a28088
8f17d5d
affd0f2
378ce04
ce59a4a
3c9fde9
1ed7b86
9a5903d
2cdae52
1c5b461
8e14420
21a7313
ac2611d
c8d9349
775a323
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| name: orthologer | ||
| owner: iuc | ||
| description: Ortholog detection. | ||
| long_description: Ortholog detection for comparative genomics and fast functional annotation behind OrthoDB and BUSCO. | ||
| categories: | ||
| - Sequence Analysis | ||
| - Phylogenetics | ||
| homepage_url: https://orthologer.ezlab.org | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/orthologer | ||
| type: unrestricted |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,67 @@ | ||||||||
| <tool id="ODB-mapper" name="Map to orthology" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||||||||
| <description>Map FASTA to OrthoDB orthology</description> | ||||||||
| <macros> | ||||||||
| <import>macros.xml</import> | ||||||||
| </macros> | ||||||||
| <expand macro="requirements"/> | ||||||||
| <expand macro="version_command"/> | ||||||||
| <command detect_errors="exit_code"><![CDATA[ | ||||||||
| ## set the number of threads to be used | ||||||||
| export NTHREADS=\${GALAXY_SLOTS:-1} && | ||||||||
| ODB-mapper SETUP && | ||||||||
| ODB-mapper MAP odbmap | ||||||||
| odbmap:'$fasta' | ||||||||
| #if '$node': | ||||||||
| '$node' | ||||||||
| #end if | ||||||||
| ## | ||||||||
| ## softlink result files - the directory name depends on OrthoDB version used in the mapping | ||||||||
| && results_path="\$(ODB-mapper CONFIG project)/Results" | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of interest where is this safed? Can we not provide this path upfront?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As in a previous comment, the path depends on the OrthoDB API version used. In principle the mapper can be run on different OrthoDB versions although I have not implemented it here.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use PWD/CWD and put the results in ./results.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping ...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You mean that you want to provide it as an output? For this the |
||||||||
| && grep -v '^#' \$results_path/odbmap.og.annotations > odbmap.og.annotations | ||||||||
| && grep -v '^#' \$results_path/odbmap.og.odb > odbmap.og.odb | ||||||||
| && grep -v '^#' \$results_path/odbmap.og.hits | awk '{print$1,$2}' | sort > odbmap.og.hits | ||||||||
| && ln -sf \$results_path/odbmap.summary.txt ./ | ||||||||
| ]]></command> | ||||||||
| <inputs> | ||||||||
| <param type="data" name="fasta" format="fasta" label="Input FASTA file"/> | ||||||||
bgruening marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
| <param type="integer" name="node" optional="true" label="optional OrthDB node as NCBI taxid - if not given, it uses busco autolineage to establish node"/> | ||||||||
bgruening marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
| </inputs> | ||||||||
| <outputs> | ||||||||
| <data name="annotations" label="${tool.name} on ${on_string} : mapped genes with annotations" format="tsv" from_work_dir="odbmap.og.annotations"> | ||||||||
ftegenfe marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
| <actions> | ||||||||
| <action name="column_names" type="metadata" default="query,odb_og,evalue,score,COG_category,Description,GOs_mf,GOs_bp,EC,KEGG_ko,Interpro" /> | ||||||||
| </actions> | ||||||||
| </data> | ||||||||
| <data name="clusters" label="${tool.name} on ${on_string} : clusters with OrthoDB ids" format="txt" from_work_dir="odbmap.og.odb"> | ||||||||
| <actions> | ||||||||
| <action name="column_names" type="metadata" default="odb_og,gene_id,og_type,nvertices,ali_start,ali_end,pid,score,evalue" /> | ||||||||
| </actions> | ||||||||
| </data> | ||||||||
| <data name="hits" label="${tool.name} on ${on_string} : mapped genes and OrthoDB cluster ids" format="txt" from_work_dir="odbmap.og.hits"> | ||||||||
| <actions> | ||||||||
| <action name="column_names" type="metadata" default="odb_og,gene_id" /> | ||||||||
| </actions> | ||||||||
| </data> | ||||||||
| <data name="summary" label="${tool.name} on ${on_string} : clustering stats" format="txt" from_work_dir="odbmap.summary.txt" /> | ||||||||
| </outputs> | ||||||||
| <tests> | ||||||||
| <test expect_num_outputs="4"> | ||||||||
| <param name="fasta" value="example.fs"/> | ||||||||
| <param name="node" value="1489911"/> | ||||||||
| <output name="hits" file="refhits.og" lines_diff="2"/> | ||||||||
|
||||||||
| </test> | ||||||||
| </tests> | ||||||||
| <help><![CDATA[ | ||||||||
|
|
||||||||
| This tool maps a given fasta file against OrthoDB orthology. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this query online data? How much data is transferred?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this downloads data and it may vary a lot depending on which target level is chosen.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should cache the orthodb locally. Potentially downloading GB per job seems not to be a good idea if it can be avoided. Seems that
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, yes that is true - not sure how to set it in a galaxy environment though.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some docs are here: https://docs.galaxyproject.org/en/master/admin/data_tables.html https://docs.galaxyproject.org/en/master/dev/data_managers.html Maybe the busco data table gives a bit of inspiration... |
||||||||
| The level is given as an NCBI taxid (e.g 33208 for Metazoa). | ||||||||
| If no level is given, a level is selected using busco autolineage. | ||||||||
|
||||||||
| <param name="cached_db" label="Cached database with lineage" type="select"> |
BUSCO_OFFLINE=0 && export BUSCO_DATA="$cached_db.fields.path" && at the beginning of the tool script (can you double check if 0 really means offline -- looks strange to me)Since the reference data is huge we need to trick a bit, see:
tools-iuc/tools/busco/busco.xml
Line 20 in 157d1ec
| ## tool tests can not run with --offline (otherwise we would need to store a lot of data at IUC) |
If a lineage needs to be selected somewhere you can do it like so:
tools-iuc/tools/busco/busco.xml
Line 222
in
157d1ec
<param argument="--lineage_dataset" type="select" label="Lineage">
tools-iuc/tools/busco/busco.xml
Line 222 in 157d1ec
| <param argument="--lineage_dataset" type="select" label="Lineage"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, BUSCO_OFFLINE=1 means offline.
At the moment I have disabled the possibility to run with auto-lineage. That is the mapping level parameter is required.
I have been away a bit from work. I will have a look at the data tables documentation links you sent wrt setting MAP_ORTHODB_DATA.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| <macros> | ||
| <token name="@TOOL_VERSION@">3.9.0</token> | ||
| <token name="@VERSION_SUFFIX@">0</token> | ||
| <token name="@PROFILE@">25.0</token> | ||
| <xml name="citations"> | ||
| <citations> | ||
| <citation type="doi">10.1093/nar/gkae987</citation> | ||
| </citations> | ||
| </xml> | ||
| <xml name="version_command"> | ||
| <version_command><![CDATA[ODB-mapper CONFIG]]></version_command> | ||
ftegenfe marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| </xml> | ||
| <xml name="requirements"> | ||
| <requirements> | ||
| <requirement type="package" version="@TOOL_VERSION@">orthologer</requirement> | ||
| </requirements> | ||
| </xml> | ||
| </macros> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shhoudb't name and id be orthologer?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I had missed this comment! True, the orthologer package contains two main tools.
One that computes orthology from a set of fasta files. The other one is based on that one but maps a given fasta file to OrthoDB data. The priority was to first add the mapper tool. My intention is to add the orthologer tool as well.
Maybe another naming scheme would be appropriate?
Thinking about it a bit more - computing orthology is usually done over many fasta files. That's not very suitable in this environment. It will quickly use a lot of resources.
I noticed that FastOMA gives a warning about that particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we should add
suiteandauto_tool_repositoriesto the.shed.yamlfile, see eg https://github.com/galaxyproject/tools-iuc/blob/main/tools/ampvis2/.shed.ymlHow about
name="orthologer map"and<description>FASTA to OrthoDB orthology</description>. This would render asorthologer map: FASTA to OrthoDB orthology.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed in the current setup a separate job would be created for each input fasta. Which gives maximum possible parallelisation (but overhead of job creation). What is the processing time per fasta? An alternative would be to use
multiple="true"in the fasta input. Then a single job would be created and the loop over the files needs to be done in the tool command (The user may still create appropriately sized jobs by using collections --- but this is then for advanced users).Not sure if the fastoma comment applies to your tool (xref. The reasoning there was that the tool wraps a whole workflow instead of implementing the workflow in Galaxy - which limits achievable parallelisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, the two modes in orthologer are quite different:
Your suggestion of allowing multiple fasta files to ODB-mapper makes sense. I will do that.
I'm working on the second to test locally but would prefer to publish that one later.
Your suggestion for names also makes sense. I will make the changes.
I will also update the .shed.yml according to your suggestion in order to prepare for including the 2nd tool.