added orthologer tool #7604

ftegenfe · 2026-01-20T15:53:40Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

tools/orthologer/ODB-mapper.xml

bgruening · 2026-01-20T20:22:50Z

tools/orthologer/ODB-mapper.xml

+    #end if
+    ##
+    ## softlink result files - the directory name depends on OrthoDB version used in the mapping
+    && results_path="\$(ODB-mapper CONFIG project)/Results"


Out of interest where is this safed? Can we not provide this path upfront?

As in a previous comment, the path depends on the OrthoDB API version used. In principle the mapper can be run on different OrthoDB versions although I have not implemented it here.

In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?

You can use PWD/CWD and put the results in ./results.

In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?

You mean that you want to provide it as an output? For this the directory datatype could help, or tar.gz?

Co-authored-by: Björn Grüning <[email protected]>

tools/orthologer/ODB-mapper.xml

tools/orthologer/macros.xml

Co-authored-by: Saim Momin <[email protected]>

tools/orthologer/.shed.yml

tools/orthologer/ODB-mapper.xml

Co-authored-by: Saim Momin <[email protected]>

…nto add_orthologer

bgruening · 2026-01-22T14:45:44Z

Ok, then please clean up the test comment and fix the last failing test:

Output hits: Test output file (refhits.og) is missing. If you are using planemo, try adding --update_test_data to generate it.

bgruening

I think this one is ready to go ... just one open comment. Would you like to look at this?

bernt-matthias · 2026-01-22T18:38:22Z

tools/orthologer/ODB-mapper.xml

@@ -0,0 +1,67 @@
+<tool id="ODB-mapper" name="Map to orthology" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">


Shhoudb't name and id be orthologer?

Sorry I had missed this comment! True, the orthologer package contains two main tools.
One that computes orthology from a set of fasta files. The other one is based on that one but maps a given fasta file to OrthoDB data. The priority was to first add the mapper tool. My intention is to add the orthologer tool as well.
Maybe another naming scheme would be appropriate?
Thinking about it a bit more - computing orthology is usually done over many fasta files. That's not very suitable in this environment. It will quickly use a lot of resources.
I noticed that FastOMA gives a warning about that particular issue.

In this case we should add suite and auto_tool_repositories to the .shed.yaml file, see eg https://github.com/galaxyproject/tools-iuc/blob/main/tools/ampvis2/.shed.yml

How about name="orthologer map" and <description>FASTA to OrthoDB orthology</description>. This would render as orthologer map: FASTA to OrthoDB orthology.

Thinking about it a bit more - computing orthology is usually done over many fasta files. That's not very suitable in this environment. It will quickly use a lot of resources.

Indeed in the current setup a separate job would be created for each input fasta. Which gives maximum possible parallelisation (but overhead of job creation). What is the processing time per fasta? An alternative would be to use multiple="true" in the fasta input. Then a single job would be created and the loop over the files needs to be done in the tool command (The user may still create appropriately sized jobs by using collections --- but this is then for advanced users).

Not sure if the fastoma comment applies to your tool (xref. The reasoning there was that the tool wraps a whole workflow instead of implementing the workflow in Galaxy - which limits achievable parallelisation.

To summarize, the two modes in orthologer are quite different:

ODB-mapper - maps FASTA to OrthoDB orthology

orthologer - computes orthology among a set of given FASTA files (no input from OrthoDB)
Your suggestion of allowing multiple fasta files to ODB-mapper makes sense. I will do that.
I'm working on the second to test locally but would prefer to publish that one later.

Your suggestion for names also makes sense. I will make the changes.

I will also update the .shed.yml according to your suggestion in order to prepare for including the 2nd tool.

tools/orthologer/macros.xml

bernt-matthias · 2026-01-22T18:41:02Z

tools/orthologer/ODB-mapper.xml

+    #end if
+    ##
+    ## softlink result files - the directory name depends on OrthoDB version used in the mapping
+    && results_path="\$(ODB-mapper CONFIG project)/Results"


In principle the whole result path could be interesting. In a galaxy context, how can one provide a whole path with its content?

You mean that you want to provide it as an output? For this the directory datatype could help, or tar.gz?

tools/orthologer/ODB-mapper.xml

bernt-matthias · 2026-01-22T18:44:22Z

tools/orthologer/ODB-mapper.xml

+        <test expect_num_outputs="4">
+            <param name="fasta" value="example.fs"/>
+            <param name="node" value="1489911"/>
+            <output name="hits" file="refhits.og" lines_diff="2"/>


Can you add test assertions for the other outputs as well?

There are 4 outputs. Three contain the clusters in different formats:

annotations - mapped genes with details on annotation, with OrthoDB cluster id's

hits - only the mapped genes together with OrthoDB cluster id's

clusters - full clusters, raw output from the mapping

summary meta data; percentage genes mapped, resource usage, OrthoDB API version

Looking through this now, it would be enough to have only 1 as 2 is just the 2 first columns of 1.
The summary file can also be ignored. However this file gives crucially the OrthoDB API version used and hence is useful.

I suggest the following:

remove outputs 2 and 3

test against annotations file

keep summary output

can add a test on the summary output using something weak like the nr of lines for the assertion

bernt-matthias · 2026-01-22T18:45:21Z

tools/orthologer/ODB-mapper.xml

+    </tests>
+    <help><![CDATA[
+
+    This tool maps a given fasta file against OrthoDB orthology.


Does this query online data? How much data is transferred?

Yes this downloads data and it may vary a lot depending on which target level is chosen.
It varies from 1M for the most specific levels to a few Gb for the top node (3.6G for eukaryota root).
Also mapping at higher levels will take longer time.

I think we should cache the orthodb locally. Potentially downloading GB per job seems not to be a good idea if it can be avoided.

Seems that MAP_ORTHODB_DATA allows to specify where data is saved. So we could provide this via reference data (data table)

Ok, yes that is true - not sure how to set it in a galaxy environment though.

Some docs are here: https://docs.galaxyproject.org/en/master/admin/data_tables.html https://docs.galaxyproject.org/en/master/dev/data_managers.html

Maybe the busco data table gives a bit of inspiration...

bernt-matthias · 2026-01-22T18:46:05Z

tools/orthologer/ODB-mapper.xml

+
+    This tool maps a given fasta file against OrthoDB orthology.
+    The level is given as an NCBI taxid (e.g 33208 for Metazoa).
+    If no level is given, a level is selected using busco autolineage.


The auto lineage mode transfers quite a bit of data. Can we use busco reference data that is cached in Galaxy already?

I don't know if we can use cached busco data but yes that would be useful.
Another option is to require the user to provide a level hence avoiding autolineage option.

Hi @ftegenfe check the docs: There are two cariables that we should definitely use:

BUSCO_OFFLINE set to 0 if BUSCO should run offline - if so it will look in BUSCO_DATA for files

BUSCO_DATA BUSCO data install directory

If I do not allow auto lineage, those two will not be used.
But yes if we do allow it, BUSCO_DATA could point to some storage where the BUSCO lineage files reside.

My guess would be that you need to do something like:

setup a select for a BUSCO DB:

tools-iuc/tools/busco/busco.xml

Line 164 in 157d1ec

<param name="cached_db" label="Cached database with lineage" type="select">

add BUSCO_OFFLINE=0 && export BUSCO_DATA="$cached_db.fields.path" && at the beginning of the tool script (can you double check if 0 really means offline -- looks strange to me)

Since the reference data is huge we need to trick a bit, see:

tools-iuc/tools/busco/busco.xml

Line 20 in 157d1ec

## tool tests can not run with --offline (otherwise we would need to store a lot of data at IUC)

If a lineage needs to be selected somewhere you can do it like so:

tools-iuc/tools/busco/busco.xml

Line 222 in 157d1ec

<param argument="--lineage_dataset" type="select" label="Lineage">

Hi, BUSCO_OFFLINE=1 means offline.
At the moment I have disabled the possibility to run with auto-lineage. That is the mapping level parameter is required.
I have been away a bit from work. I will have a look at the data tables documentation links you sent wrt setting MAP_ORTHODB_DATA.

Co-authored-by: M Bernt <[email protected]>

ftegenfe · 2026-01-26T13:01:01Z

What is further required in terms of changes? Has this tool been approved for merge with the main branch?

ftegenfe added 2 commits January 20, 2026 16:40

added orthologer tool

57601c7

updated category in shed

bc871dc

bgruening reviewed Jan 20, 2026

View reviewed changes

ftegenfe and others added 5 commits January 21, 2026 07:10

Update tools/orthologer/ODB-mapper.xml

9086a62

Co-authored-by: Björn Grüning <[email protected]>

Update tools/orthologer/ODB-mapper.xml

5dbdeb7

Co-authored-by: Björn Grüning <[email protected]>

Update tools/orthologer/ODB-mapper.xml

af75fe4

Co-authored-by: Björn Grüning <[email protected]>

updated help

600a161

several updates for pull request

4c5000c

bgruening reviewed Jan 21, 2026

View reviewed changes

tools/orthologer/ODB-mapper.xml Outdated Show resolved Hide resolved

SaimMomin12 reviewed Jan 21, 2026

View reviewed changes

ftegenfe and others added 9 commits January 21, 2026 13:07

Update tools/orthologer/ODB-mapper.xml

838f730

Co-authored-by: Saim Momin <[email protected]>

Update tools/orthologer/ODB-mapper.xml

6a54b53

Co-authored-by: Saim Momin <[email protected]>

Update tools/orthologer/ODB-mapper.xml

5130974

Co-authored-by: Saim Momin <[email protected]>

Update tools/orthologer/macros.xml

3a28088

Co-authored-by: Saim Momin <[email protected]>

Update tools/orthologer/macros.xml

8f17d5d

Co-authored-by: Saim Momin <[email protected]>

fixes in ODB-mapper

affd0f2

merge

378ce04

Merge branch 'galaxyproject:main' into add_orthologer

ce59a4a

merge

3c9fde9

SaimMomin12 reviewed Jan 21, 2026

View reviewed changes

tools/orthologer/.shed.yml Outdated Show resolved Hide resolved

tools/orthologer/ODB-mapper.xml Show resolved Hide resolved

ftegenfe and others added 4 commits January 21, 2026 17:04

Update tools/orthologer/.shed.yml

1ed7b86

Co-authored-by: Saim Momin <[email protected]>

Merge branch 'galaxyproject:main' into add_orthologer

9a5903d

updated test

2cdae52

Merge branch 'add_orthologer' of github.com:ftegenfe/tools-iuc-fork i…

1c5b461

…nto add_orthologer

added missing test data + cleanup

8e14420

bgruening approved these changes Jan 22, 2026

View reviewed changes

bernt-matthias reviewed Jan 22, 2026

View reviewed changes

Update tools/orthologer/ODB-mapper.xml

21a7313

Co-authored-by: M Bernt <[email protected]>

new tests, less output data

ac2611d

ftegenfe added 2 commits January 30, 2026 17:14

added orthologer.xml

c8d9349

Merge branch 'galaxyproject:main' into add_orthologer

775a323

		@@ -0,0 +1,67 @@
		<tool id="ODB-mapper" name="Map to orthology" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">

added orthologer tool #7604

Are you sure you want to change the base?

added orthologer tool #7604

Uh oh!

Conversation

ftegenfe commented Jan 20, 2026 • edited by bernt-matthias Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bgruening commented Jan 22, 2026

Uh oh!

bgruening left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ftegenfe Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

If a lineage needs to be selected somewhere you can do it like so: tools-iuc/tools/busco/busco.xml Line 222 in 157d1ec <param argument="--lineage_dataset" type="select" label="Lineage">

Uh oh!

ftegenfe commented Jan 20, 2026 •

edited by bernt-matthias

Loading

ftegenfe Jan 27, 2026 •

edited

Loading

If a lineage needs to be selected somewhere you can do it like so:

tools-iuc/tools/busco/busco.xml

Line 222 in 157d1ec

<param argument="--lineage_dataset" type="select" label="Lineage">