Skip to content

CCT no p-values #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
spiderduckpig opened this issue Mar 28, 2025 · 7 comments
Open

CCT no p-values #80

spiderduckpig opened this issue Mar 28, 2025 · 7 comments

Comments

@spiderduckpig
Copy link

spiderduckpig commented Mar 28, 2025

Hello,

I am currently running a WGS on a small sample of 517 patients and I have noticed that, while running the Gene-centric coding analysis I am getting the following output:

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix!

I saw in issue 14 that the "Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, :
genotype is not a matrix!" messages are expected and are just warnings of low variant counts. But is the same true for the "Error in CCT" messages?

Additionally, when running the analysis with rare_maf_cutoff = 0.05, rv_num_cutoff=2, and 10 PCs, I am currently not getting any data in the Rdata files generated from gene-centric coding analysis (just null values). I can only get data when I run with 20 PCs. Do I simply not have enough variants?

Thank you for your time and help.

@xihaoli
Copy link
Owner

xihaoli commented Mar 28, 2025

Hi @spiderduckpig,

You are correct that Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! messages are expected and are just warnings of low variant counts.

For Error in CCT(pvalues) : Cannot have NAs in the p-values!, this is indicating that your input p-values for CCT() include NA's. Typically, this shouldn't happen, so you may want to take a look at which specific p-value is NA for your data analysis.

For your other question, is this the ancestry PCs you used in the null model fitting or the annotation PCs you used in the rare variant association testing?

Best,
Xihao

@spiderduckpig
Copy link
Author

spiderduckpig commented Mar 28, 2025

Hi @xihaoli, thank you so much for the response. For the CCT error, is there any particular input that you think might be associated with this error? For example, I reviewed the other issues on this github and I noticed another person had issues with the data types of the different functional annotations, could this be associated with the error? Or could it be associated with the sparse relatedness matrix?

The PCs are just the ancestry PCs used for null model fitting. I suspect that maybe I have included too many PCs, so I am overfitting.

@xihaoli
Copy link
Owner

xihaoli commented Mar 28, 2025

Hi @spiderduckpig,

Maybe you included some annotations that have missing values, e.g., you used variant_type = "variant" while specifying some input annotations that do not have values for indel variants.

The number of covariates used in the null model fitting should not affect the rare variant association analysis in terms of variant set selection, as long as the null model can be properly fit. So I am not sure why adding 10 PCs does not generate results but 20 PCs does.

Best,
Xihao

@spiderduckpig
Copy link
Author

I identified the issue, several Functional Annotation columns (clnsigincl, clndnincl, clndisdbincl, metasvm_pred) were mistakenly interpreted by R's read_csv as Factor/Logical data types instead of String data types, this caused an error when the gene-centric analysis tried to compare metasvm_pred to the string "D" because it had been coerced to "True" and "False" values. I had to modify the read_csv function to make sure it did not coerce these columns.

@xihaoli
Copy link
Owner

xihaoli commented Apr 2, 2025

Thank you for sharing, @spiderduckpig.

It seems like you are annotating your data using FAVOR Full Database instead of FAVOR Essential Database. Could you please confirm this is the case?

Best,
Xihao

@spiderduckpig
Copy link
Author

spiderduckpig commented Apr 3, 2025

Hi @xihaoli,

Yes, I am currently using the FAVOR Full database.

@xihaoli
Copy link
Owner

xihaoli commented Apr 3, 2025

Got it. We typically assume that one uses the FAVOR Essential Database for STAARpipeline analysis. Your proposed modification looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants