Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm breaks when a map is supplied using 1kg_extendend dataset #3

Open
Npaffen opened this issue Jan 24, 2023 · 8 comments
Open
Labels
bug Something isn't working

Comments

@Npaffen
Copy link

Npaffen commented Jan 24, 2023

AlphaFamImpute, as all Alphatools who do phasing and imputation I guess, need a map file to recognize multiple chromosomes. I used the 1000 genomes extended dataset featuring around 600 trios added the 23andme data from my family to generate some phasing data to compare this in a plot. In theory this should work with this tool.

Command that was running when the error appeared :
AlphaFamImpute -o /media/oem/ssd_01/alphaimp_data/pa_offspring_map -genotype /media/oem/ssd_01/alphaimp_data/genotype_ll.txt -pedigree /media/oem/ssd_01/alphaimp_data/pedigree.txt -iothreads 12 -map /media/oem/ssd_01/Genomics_prac_guide/map/genetic_map_hg38_withX.txt

Error that appeared:
/home/oem/.local/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /home/oem/.local/lib/python3.10/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Segmentation fault (core dumped)

The last output from Alphafamimpute was (on chr2 I guess):
Imputing family of NA18523 and MotherOfNA18521, using 1 high-density offspring.

When I try to run Alphafamimpute without the map on the full chromosome set it works but should thereby provide false results since the algorithm anticipates only one chromosome.

@gregorgorjanc
Copy link
Member

@Npaffen thanks for reporting this!

I have never ran the Alpha tools with more than one chromosome and I think this is the mode we assume, but should definitely allow for! Is this where the error is coming from? Please test and let us know.

@Npaffen
Copy link
Author

Npaffen commented Jan 24, 2023

I don't know if the mode is the error since it looks like that the first chromosome seems to run without problems. The second then also looks fine until a certain sample, the one I mentioned in the end of my report. Could you look into this? My python knowledge is only mediocre but it looks like either something with the data or with the algorithm produces an empty vector/matrix? I will try to run AlphaFamImpute again using only the second chromosome of the data. If this works I will try to run AlphaFamImpute chromosome wise without a map. If this fixes it then there is some problem with either the map-imput data , which should be correct since the 1000 genomes extendend dataset is by default in hg38?

@Npaffen
Copy link
Author

Npaffen commented Jan 24, 2023

A run just with chr2 seems to work! Will now run AlphaFamImpute chromosome-wise and report back.

@gregorgorjanc
Copy link
Member

gregorgorjanc commented Jan 24, 2023 via email

@Npaffen
Copy link
Author

Npaffen commented Jan 24, 2023

I anticipate you mean a reproducible example of the error. Since I don't want to share my 23andme data I will have to try this just with the 1kg_extendend dataset and see if this happens without my data. I will report back if this error is reproducible just with the 1kg_extendend dataset.

@gregorgorjanc
Copy link
Member

Maybe just a tiny fraction of the dataset - say first few loci?

@Npaffen
Copy link
Author

Npaffen commented Jan 24, 2023

Sure this should be possible. Let me prep something and see if I can reproduce the error with the subset.

@Npaffen
Copy link
Author

Npaffen commented Jan 25, 2023

I finally was able to create a reproducible example based only on the data of the 1000 genomes extended dataset including
~600 trios. I wrote a R-script and added all necessary files to the folder. The map-file is a subset of the genetic_map_hg38_withX_alpha.txt. This subsetted map only contains the autosome-positions and the header is removed so that the format of the documentation is satisfied. Be aware that you need plink 1.9 and convertf from the EIGENSTRAT software package and datamash sudo apt install datamash to run my code. While plink is a standard tool and I guess you are familiar with it, convertf was used to recreate your genotype.txt format. This step was necessary since the option to use plink-format as input failed on my linux system #4

Link to the files : https://e1.pcloud.link/publink/show?code=XZsLGmZsN2waHGnatf49E7M4IauwfV5kVpy

@gregorgorjanc gregorgorjanc added the bug Something isn't working label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants