Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assestion Error with the parser #127

Open
PCApple opened this issue Aug 3, 2020 · 1 comment
Open

Assestion Error with the parser #127

PCApple opened this issue Aug 3, 2020 · 1 comment

Comments

@PCApple
Copy link

PCApple commented Aug 3, 2020

Hello,
Right now I am running into a problem while trying to parse my cnv and vcf data. I am currently able to parse the vcf data without the cnv data(using regions=all), but when I try to use the --cnvs I keep getting this error:
Traceback (most recent call last):
File "create_phylowgs_inputs.py", line 1420, in
main()
File "create_phylowgs_inputs.py", line 1388, in main
grouper.exclude_variants_in_multiple_abnormal_or_unlisted_regions()
File "create_phylowgs_inputs.py", line 989, in exclude_variants_in_multiple_abnormal_or_unlisted_regions
self._filter_variants_outside_regions(self._multisamp_cnv.load_cnvs(), 'all_variants', 'within_cn_regions')
File "create_phylowgs_inputs.py", line 856, in load_cnvs
abnormal_cnvs = self.load_single_abnormal_state_cnvs()
File "create_phylowgs_inputs.py", line 811, in load_single_abnormal_state_cnvs
states_for_all_samples = self._get_abnormal_state_for_all_samples(chrom, cnv)
File "create_phylowgs_inputs.py", line 773, in _get_abnormal_state_for_all_samples
assert len(retained_sampidxs) == len(set(retained_sampidxs))
AssertionError

The error is accompanied by this comment in the code:
Sanity check: we should have no duplicate samples. While a given sample
may report any number of records for a region, above we discarded normal
regions, and ensured that only one abnormal state exists in all samples.
Thus, we should have no more than one record per sample for this region.

I check through the cnv data, and as far as I could tell, there were no duplicates, and I am only working with one sample. I'm not really proficient with the biology side of this, but as least from the cs side I was seeing that in _get_abnormal_state_for_all_samples() function, the retained_smapidx was picking up 2 entries sometimes instead of 1. Do you know why this may be happening?
Thanks in advance.

@Ignatiocalvin
Copy link

Hi,

I'd also like to know if there were any solutions for this.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants