Skip to content
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
57601c7
added orthologer tool
ftegenfe Jan 20, 2026
bc871dc
updated category in shed
ftegenfe Jan 20, 2026
9086a62
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
5dbdeb7
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
af75fe4
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
600a161
updated help
ftegenfe Jan 21, 2026
4c5000c
several updates for pull request
ftegenfe Jan 21, 2026
838f730
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
6a54b53
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
5130974
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 21, 2026
3a28088
Update tools/orthologer/macros.xml
ftegenfe Jan 21, 2026
8f17d5d
Update tools/orthologer/macros.xml
ftegenfe Jan 21, 2026
affd0f2
fixes in ODB-mapper
ftegenfe Jan 21, 2026
378ce04
merge
ftegenfe Jan 21, 2026
ce59a4a
Merge branch 'galaxyproject:main' into add_orthologer
ftegenfe Jan 21, 2026
3c9fde9
merge
ftegenfe Jan 21, 2026
1ed7b86
Update tools/orthologer/.shed.yml
ftegenfe Jan 21, 2026
9a5903d
Merge branch 'galaxyproject:main' into add_orthologer
ftegenfe Jan 22, 2026
2cdae52
updated test
ftegenfe Jan 22, 2026
1c5b461
Merge branch 'add_orthologer' of github.com:ftegenfe/tools-iuc-fork i…
ftegenfe Jan 22, 2026
8e14420
added missing test data + cleanup
ftegenfe Jan 22, 2026
21a7313
Update tools/orthologer/ODB-mapper.xml
ftegenfe Jan 23, 2026
ac2611d
new tests, less output data
ftegenfe Jan 23, 2026
c8d9349
added orthologer.xml
ftegenfe Jan 30, 2026
775a323
Merge branch 'galaxyproject:main' into add_orthologer
ftegenfe Feb 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions tools/orthologer/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: orthologer
owner: iuc
description: Ortholog detection.
long_description: Ortholog detection for comparative genomics and fast functional annotation behind OrthoDB and BUSCO.
categories:
- Sequence Analysis
- Phylogenetics
homepage_url: https://orthologer.ezlab.org
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/orthologer
type: unrestricted
67 changes: 67 additions & 0 deletions tools/orthologer/ODB-mapper.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<tool id="ODB-mapper" name="Map to orthology" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shhoudb't name and id be orthologer?

Copy link
Author

@ftegenfe ftegenfe Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had missed this comment! True, the orthologer package contains two main tools.
One that computes orthology from a set of fasta files. The other one is based on that one but maps a given fasta file to OrthoDB data. The priority was to first add the mapper tool. My intention is to add the orthologer tool as well.
Maybe another naming scheme would be appropriate?
Thinking about it a bit more - computing orthology is usually done over many fasta files. That's not very suitable in this environment. It will quickly use a lot of resources.
I noticed that FastOMA gives a warning about that particular issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we should add suite and auto_tool_repositories to the .shed.yaml file, see eg https://github.com/galaxyproject/tools-iuc/blob/main/tools/ampvis2/.shed.yml

How about name="orthologer map" and <description>FASTA to OrthoDB orthology</description>. This would render as orthologer map: FASTA to OrthoDB orthology.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it a bit more - computing orthology is usually done over many fasta files. That's not very suitable in this environment. It will quickly use a lot of resources.

Indeed in the current setup a separate job would be created for each input fasta. Which gives maximum possible parallelisation (but overhead of job creation). What is the processing time per fasta? An alternative would be to use multiple="true" in the fasta input. Then a single job would be created and the loop over the files needs to be done in the tool command (The user may still create appropriately sized jobs by using collections --- but this is then for advanced users).

Not sure if the fastoma comment applies to your tool (xref. The reasoning there was that the tool wraps a whole workflow instead of implementing the workflow in Galaxy - which limits achievable parallelisation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize, the two modes in orthologer are quite different:

  1. ODB-mapper - maps FASTA to OrthoDB orthology
  2. orthologer - computes orthology among a set of given FASTA files (no input from OrthoDB)
    Your suggestion of allowing multiple fasta files to ODB-mapper makes sense. I will do that.
    I'm working on the second to test locally but would prefer to publish that one later.

Your suggestion for names also makes sense. I will make the changes.

I will also update the .shed.yml according to your suggestion in order to prepare for including the 2nd tool.

<description>Map FASTA to OrthoDB orthology</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<expand macro="version_command"/>
<command detect_errors="exit_code"><![CDATA[
## set the number of threads to be used
export NTHREADS=\${GALAXY_SLOTS:-1} &&
ODB-mapper SETUP &&
ODB-mapper MAP odbmap
odbmap:'$fasta'
#if '$node':
'$node'
#end if
##
## softlink result files - the directory name depends on OrthoDB version used in the mapping
&& results_path="\$(ODB-mapper CONFIG project)/Results"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest where is this safed? Can we not provide this path upfront?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in a previous comment, the path depends on the OrthoDB API version used. In principle the mapper can be run on different OrthoDB versions although I have not implemented it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use PWD/CWD and put the results in ./results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?

You mean that you want to provide it as an output? For this the directory datatype could help, or tar.gz?

&& grep -v '^#' \$results_path/odbmap.og.annotations > odbmap.og.annotations
&& grep -v '^#' \$results_path/odbmap.og.odb > odbmap.og.odb
&& grep -v '^#' \$results_path/odbmap.og.hits | awk '{print$1,$2}' | sort > odbmap.og.hits
&& ln -sf \$results_path/odbmap.summary.txt ./
]]></command>
<inputs>
<param type="data" name="fasta" format="fasta" label="Input FASTA file"/>
<param type="integer" name="node" optional="true" label="optional OrthDB node as NCBI taxid - if not given, it uses busco autolineage to establish node"/>
</inputs>
<outputs>
<data name="annotations" label="${tool.name} on ${on_string} : mapped genes with annotations" format="tsv" from_work_dir="odbmap.og.annotations">
<actions>
<action name="column_names" type="metadata" default="query,odb_og,evalue,score,COG_category,Description,GOs_mf,GOs_bp,EC,KEGG_ko,Interpro" />
</actions>
</data>
<data name="clusters" label="${tool.name} on ${on_string} : clusters with OrthoDB ids" format="txt" from_work_dir="odbmap.og.odb">
<actions>
<action name="column_names" type="metadata" default="odb_og,gene_id,og_type,nvertices,ali_start,ali_end,pid,score,evalue" />
</actions>
</data>
<data name="hits" label="${tool.name} on ${on_string} : mapped genes and OrthoDB cluster ids" format="txt" from_work_dir="odbmap.og.hits">
<actions>
<action name="column_names" type="metadata" default="odb_og,gene_id" />
</actions>
</data>
<data name="summary" label="${tool.name} on ${on_string} : clustering stats" format="txt" from_work_dir="odbmap.summary.txt" />
</outputs>
<tests>
<test expect_num_outputs="4">
<param name="fasta" value="example.fs"/>
<param name="node" value="1489911"/>
<output name="hits" file="refhits.og" lines_diff="2"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test assertions for the other outputs as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 outputs. Three contain the clusters in different formats:

  1. annotations - mapped genes with details on annotation, with OrthoDB cluster id's
  2. hits - only the mapped genes together with OrthoDB cluster id's
  3. clusters - full clusters, raw output from the mapping
  4. summary meta data; percentage genes mapped, resource usage, OrthoDB API version

Looking through this now, it would be enough to have only 1 as 2 is just the 2 first columns of 1.
The summary file can also be ignored. However this file gives crucially the OrthoDB API version used and hence is useful.

I suggest the following:

  • remove outputs 2 and 3
  • test against annotations file
  • keep summary output
  • can add a test on the summary output using something weak like the nr of lines for the assertion

</test>
</tests>
<help><![CDATA[

This tool maps a given fasta file against OrthoDB orthology.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this query online data? How much data is transferred?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this downloads data and it may vary a lot depending on which target level is chosen.
It varies from 1M for the most specific levels to a few Gb for the top node (3.6G for eukaryota root).
Also mapping at higher levels will take longer time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should cache the orthodb locally. Potentially downloading GB per job seems not to be a good idea if it can be avoided.

Seems that MAP_ORTHODB_DATA allows to specify where data is saved. So we could provide this via reference data (data table)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes that is true - not sure how to set it in a galaxy environment though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level is given as an NCBI taxid (e.g 33208 for Metazoa).
If no level is given, a level is selected using busco autolineage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto lineage mode transfers quite a bit of data. Can we use busco reference data that is cached in Galaxy already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can use cached busco data but yes that would be useful.
Another option is to require the user to provide a level hence avoiding autolineage option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ftegenfe check the docs: There are two cariables that we should definitely use:

  • BUSCO_OFFLINE set to 0 if BUSCO should run offline - if so it will look in BUSCO_DATA for files
  • BUSCO_DATA BUSCO data install directory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do not allow auto lineage, those two will not be used.
But yes if we do allow it, BUSCO_DATA could point to some storage where the BUSCO lineage files reside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be that you need to do something like:

  • setup a select for a BUSCO DB:
    <param name="cached_db" label="Cached database with lineage" type="select">
  • add BUSCO_OFFLINE=0 && export BUSCO_DATA="$cached_db.fields.path" && at the beginning of the tool script (can you double check if 0 really means offline -- looks strange to me)

Since the reference data is huge we need to trick a bit, see:

## tool tests can not run with --offline (otherwise we would need to store a lot of data at IUC)

If a lineage needs to be selected somewhere you can do it like so:
<param argument="--lineage_dataset" type="select" label="Lineage">

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, BUSCO_OFFLINE=1 means offline.
At the moment I have disabled the possibility to run with auto-lineage. That is the mapping level parameter is required.
I have been away a bit from work. I will have a look at the data tables documentation links you sent wrt setting MAP_ORTHODB_DATA.


Taxids can be found at `NCBI <https://www.ncbi.nlm.nih.gov/taxonomy>`_ .

For more information
* `orthologer user manual <https://orthologer.ezlab.org>`_

]]></help>
<expand macro="citations"/>
</tool>
18 changes: 18 additions & 0 deletions tools/orthologer/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<macros>
<token name="@TOOL_VERSION@">3.9.0</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">25.0</token>
<xml name="citations">
<citations>
<citation type="doi">10.1093/nar/gkae987</citation>
</citations>
</xml>
<xml name="version_command">
<version_command><![CDATA[ODB-mapper CONFIG]]></version_command>
</xml>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">orthologer</requirement>
</requirements>
</xml>
</macros>
Loading