Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure what we should or want to do regarding the plink functionality #11

Open
gregorgorjanc opened this issue Aug 22, 2023 · 6 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@gregorgorjanc
Copy link
Member

See

https://pypi.org/project/alphaplinkpython/

https://bitbucket.org/hickeyjohnteam/alphaplinkpython

@gregorgorjanc gregorgorjanc added the question Further information is requested label Aug 22, 2023
@gregorgorjanc
Copy link
Member Author

gregorgorjanc commented Aug 23, 2023

There is a Python package for PLINK BED files https://pypi.org/project/bed-reader/ - should we just use that and build a converter? This is Python only code I think, so might be less efficient than what David wrote for alphaplinkpython, but by using “established” Python solution we get OS portability and speed ups and fixes from upstream when they occur.

FYI Python/Rust extension for PLINK at https://towardsdatascience.com/nine-rules-for-writing-python-extensions-in-rust-d35ea3a4ec29, but that should be for the Python package!

@XingerTang
Copy link
Contributor

There is a Python package for PLINK BED files https://pypi.org/project/bed-reader/ - should we just use that and build a converter? This is Python only code I think, so might be less efficient than what David wrote for alphaplinkpython, but by using “established” Python solution we get OS portability and speed ups and fixes from upstream when they occur.

FYI Python/Rust extension for PLINK at https://towardsdatascience.com/nine-rules-for-writing-python-extensions-in-rust-d35ea3a4ec29, but that should be for the Python package!

@gregorgorjanc I read about the package, and I think the advantages of using it might be:

  • The package is written partly Python and partly Rust, so it is quite efficient, nevertheless, I don't need to install the Rust compiler to use it.
  • The documentation and the code are written in a clear and informative way.
  • alphaplinkpython is written in C++ so I guess it cannot be fixed in the short term.

The disadvantages are:

  • It seems that it requires Python version >= 3.7 (it doesn't explicitly specify this, but the software is only tested for version 3.7 or above)
  • It also requires other Python packages (pandas>=0.25.1, pooch>=1.4.0, chardet>=5.1.0) that the Alphatools don't require. The current dependency on numba is already causing problems while using Alphatools in Rstudio, as Alphatools requires numba, and numba requires llvmlite, the llvmlite is the one not that compatible with Rstudio, although it can be fixed.
  • It has an open issue Bed() and Pheno() etc don't like Windows Paths, but the issue is actually not for this package (bed-read) but another package that requires this package (pysnptools). I'm wondering why the issue is opened here. But, after all the tests for the package are run on all three operating systems, and all passed. There shouldn't be any issues related to the platform.

Overall, I think it should be a good option for the Alphatools.

@gregorgorjanc gregorgorjanc added the enhancement New feature or request label Aug 30, 2023
@gregorgorjanc
Copy link
Member Author

@XingerTang thanks for exploring these pros and cons!!!

I am leaning to using the bed-reader, BUT I guess we would still need some sort of a converter to our format!?

@XingerTang
Copy link
Contributor

@gregorgorjanc Yes, we need a converter, but as data are stored in numpy arrays for both, I guesss it won't be too complicated.

@CarlKCarlK
Copy link

CarlKCarlK commented Nov 6, 2023

I saw your reference to bed-reader@ . Thank for you for considering its use!

Inspired by @XingerTang's comments, I've updated the project so that it only (by default) depends on numpy. The dependency on pooch is now optional (needed only if you want to download my sample files). The dependencies on pandas and chatdet are gone. There is new optional dependency on scipy, but it is only needed if you want to create scipy sparse matrices.

The new version is on PyPi in beta. You can install it with:

pip install bed-reader==1.0.0b3

The beta documentation is here: https://fastlmm.github.io/bed-reader/beta/.

If you have any questions or suggestions, just let me know.

-- Carl
p.s. With respect to versions, we try to support all officially supported versions of Python, which is currently 3.8 to 3.12. We support Linux/Mac (including ARM)/Windows.

@gregorgorjanc
Copy link
Member Author

Maybe we look into sgkit https://github.com/pystatgen/sgkit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants