Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide a tabix version #25

Open
koszulordie opened this issue Dec 11, 2024 · 1 comment
Open

Decide a tabix version #25

koszulordie opened this issue Dec 11, 2024 · 1 comment
Labels
question Further information is requested

Comments

@koszulordie
Copy link
Collaborator

Different tabix versions turn out to have incompatible indexing behavior.

@FedericaBrando we can discuss which tabix version we want the next releases to use, then change the code to make it compatible with the new version if necessary.

@FedericaBrando
Copy link
Member

FedericaBrando commented Dec 11, 2024

As of now we use pytabix package. This package is reimplementing tabix from htslib in c and c++ - very efficient. This means that it uses it's own version of tabix that does not specifically run any of the "official" tabix releases. By looking at one of the bug-fixing commits of the repo, I see some weird things:

Image

slowkow/pytabix@1ec53ff#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7

slowkow/pytabix@1ec53ff#diff-6ff1619ffe3ee525abe52c386f3a1ba180384dd06d90f066e0908fd386f8abec

The initial release was 0.1, then the bug fix was called 0.0.2. This does not make any sense, because logically speaking is like going back in time in terms of releases. What we are using - pytabix==0.0.2 is the actual bug-free pytabix version.

The "old" one - which is actually newer logically (0.1 > 0.0.2) has the bug. This lead to a MAJOR problem: when we do not specify the correct version of pytabix - we automatically download the older version because semantically is newer. You get this info only by looking at the code. The correct version should have been 0.1.2 - or 0.2.

The pytabix==0.0.2 correctly returns an output when querying positions that have same START and END.

Now - we have several options on how to proceed:

  1. Document it and let the user (and ourselves) be aware of this problem - faster, we keep what we are doing
  2. Try other packages that do the same as pytabix but more update and that could rely on the official tabix. - unknown, it could be faster if very easy to implement or slower based on the package we use.
  3. Avoid using tabix. - slower, new implementation and solution needed

I think it is needed to note that pytabix has not been updated since 9 years - so we might see if it's safe to continue using this method or not.

@FedericaBrando FedericaBrando added the question Further information is requested label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants