-
Notifications
You must be signed in to change notification settings - Fork 17
CCT no p-values #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @spiderduckpig, You are correct that For For your other question, is this the ancestry PCs you used in the null model fitting or the annotation PCs you used in the rare variant association testing? Best, |
Hi @xihaoli, thank you so much for the response. For the CCT error, is there any particular input that you think might be associated with this error? For example, I reviewed the other issues on this github and I noticed another person had issues with the data types of the different functional annotations, could this be associated with the error? Or could it be associated with the sparse relatedness matrix? The PCs are just the ancestry PCs used for null model fitting. I suspect that maybe I have included too many PCs, so I am overfitting. |
Hi @spiderduckpig, Maybe you included some annotations that have missing values, e.g., you used The number of covariates used in the null model fitting should not affect the rare variant association analysis in terms of variant set selection, as long as the null model can be properly fit. So I am not sure why adding 10 PCs does not generate results but 20 PCs does. Best, |
I identified the issue, several Functional Annotation columns (clnsigincl, clndnincl, clndisdbincl, metasvm_pred) were mistakenly interpreted by R's read_csv as Factor/Logical data types instead of String data types, this caused an error when the gene-centric analysis tried to compare metasvm_pred to the string "D" because it had been coerced to "True" and "False" values. I had to modify the read_csv function to make sure it did not coerce these columns. |
Thank you for sharing, @spiderduckpig. It seems like you are annotating your data using FAVOR Full Database instead of FAVOR Essential Database. Could you please confirm this is the case? Best, |
Hi @xihaoli, Yes, I am currently using the FAVOR Full database. |
Got it. We typically assume that one uses the FAVOR Essential Database for STAARpipeline analysis. Your proposed modification looks good to me. |
Hello,
I am currently running a WGS on a small sample of 517 patients and I have noticed that, while running the Gene-centric coding analysis I am getting the following output:
Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix!
I saw in issue 14 that the "Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, :
genotype is not a matrix!" messages are expected and are just warnings of low variant counts. But is the same true for the "Error in CCT" messages?
Additionally, when running the analysis with rare_maf_cutoff = 0.05, rv_num_cutoff=2, and 10 PCs, I am currently not getting any data in the Rdata files generated from gene-centric coding analysis (just null values). I can only get data when I run with 20 PCs. Do I simply not have enough variants?
Thank you for your time and help.
The text was updated successfully, but these errors were encountered: