CCT no p-values #80

spiderduckpig · 2025-03-28T05:38:35Z

Hello,

I am currently running a WGS on a small sample of 517 patients and I have noticed that, while running the Gene-centric coding analysis I am getting the following output:

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in CCT(pvalues) : Cannot have NAs in the p-values! Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix!

I saw in issue 14 that the "Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, :
genotype is not a matrix!" messages are expected and are just warnings of low variant counts. But is the same true for the "Error in CCT" messages?

Additionally, when running the analysis with rare_maf_cutoff = 0.05, rv_num_cutoff=2, and 10 PCs, I am currently not getting any data in the Rdata files generated from gene-centric coding analysis (just null values). I can only get data when I run with 20 PCs. Do I simply not have enough variants?

Thank you for your time and help.

The text was updated successfully, but these errors were encountered:

xihaoli · 2025-03-28T19:29:43Z

Hi @spiderduckpig,

You are correct that Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix! messages are expected and are just warnings of low variant counts.

For Error in CCT(pvalues) : Cannot have NAs in the p-values!, this is indicating that your input p-values for CCT() include NA's. Typically, this shouldn't happen, so you may want to take a look at which specific p-value is NA for your data analysis.

For your other question, is this the ancestry PCs you used in the null model fitting or the annotation PCs you used in the rare variant association testing?

Best,
Xihao

spiderduckpig · 2025-03-28T19:35:06Z

Hi @xihaoli, thank you so much for the response. For the CCT error, is there any particular input that you think might be associated with this error? For example, I reviewed the other issues on this github and I noticed another person had issues with the data types of the different functional annotations, could this be associated with the error? Or could it be associated with the sparse relatedness matrix?

The PCs are just the ancestry PCs used for null model fitting. I suspect that maybe I have included too many PCs, so I am overfitting.

xihaoli · 2025-03-28T19:40:44Z

Hi @spiderduckpig,

Maybe you included some annotations that have missing values, e.g., you used variant_type = "variant" while specifying some input annotations that do not have values for indel variants.

The number of covariates used in the null model fitting should not affect the rare variant association analysis in terms of variant set selection, as long as the null model can be properly fit. So I am not sure why adding 10 PCs does not generate results but 20 PCs does.

Best,
Xihao

spiderduckpig · 2025-04-02T20:00:04Z

I identified the issue, several Functional Annotation columns (clnsigincl, clndnincl, clndisdbincl, metasvm_pred) were mistakenly interpreted by R's read_csv as Factor/Logical data types instead of String data types, this caused an error when the gene-centric analysis tried to compare metasvm_pred to the string "D" because it had been coerced to "True" and "False" values. I had to modify the read_csv function to make sure it did not coerce these columns.

xihaoli · 2025-04-02T20:37:48Z

Thank you for sharing, @spiderduckpig.

It seems like you are annotating your data using FAVOR Full Database instead of FAVOR Essential Database. Could you please confirm this is the case?

Best,
Xihao

spiderduckpig · 2025-04-03T02:50:40Z

Hi @xihaoli,

Yes, I am currently using the FAVOR Full database.

xihaoli · 2025-04-03T03:12:29Z

Got it. We typically assume that one uses the FAVOR Essential Database for STAARpipeline analysis. Your proposed modification looks good to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCT no p-values #80

CCT no p-values #80

spiderduckpig commented Mar 28, 2025 •

edited

Loading

xihaoli commented Mar 28, 2025

spiderduckpig commented Mar 28, 2025 •

edited

Loading

xihaoli commented Mar 28, 2025

spiderduckpig commented Apr 2, 2025

xihaoli commented Apr 2, 2025

spiderduckpig commented Apr 3, 2025 •

edited

Loading

xihaoli commented Apr 3, 2025

CCT no p-values #80

CCT no p-values #80

Comments

spiderduckpig commented Mar 28, 2025 • edited Loading

xihaoli commented Mar 28, 2025

spiderduckpig commented Mar 28, 2025 • edited Loading

xihaoli commented Mar 28, 2025

spiderduckpig commented Apr 2, 2025

xihaoli commented Apr 2, 2025

spiderduckpig commented Apr 3, 2025 • edited Loading

xihaoli commented Apr 3, 2025

spiderduckpig commented Mar 28, 2025 •

edited

Loading

spiderduckpig commented Mar 28, 2025 •

edited

Loading

spiderduckpig commented Apr 3, 2025 •

edited

Loading