Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'IndexError: list index out of range' when running epi.tl.geneactivity() #104

Open
pabloswfly opened this issue Aug 12, 2021 · 3 comments
Open

Comments

@pabloswfly
Copy link

Hi,

I am getting the following error when I run epi.tl.geneactivity():

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_105708/3365853572.py in <module>
----> 1 epi.tl.geneactivity(episcanpy_atac, gtf_file, key_added="gene_scores")

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/episcanpy/tools/_geneactivity.py in geneactivity(adata, gtf_file, key_added, upstream, feature_type, annotation, layer_name, raw, copy)
     73         line = line.split('_')
     74         if line[0] not in raw_adata_features.keys():
---> 75             raw_adata_features[line[0]] = [[int(line[1]),int(line[2]), feature_index]]
     76         else:
     77             raw_adata_features[line[0]].append([int(line[1]),int(line[2]), feature_index])

IndexError: list index out of range

This has to do with the way _geneactivity.py iterates over the feature names from the anndata object. By looking at the code, I saw that line = line.split('_') tries to separate diverse string fields, but each line variable in my case is:

chr1:629499-630394
chr1:633580-634634
chr1:778282-779198
chr1:816872-817778
chr1:827063-827952
chr1:844145-844994
chr1:869467-870372
chr1:904350-905199
chr1:920760-921655

So there is nothing to split by '_' character.

I tried to tweak the code and separate myself into starting position and ending position for each feature to create the raw_adata_features dictionary, but If I do this I receive an empty gene_activity_X matrix later on. I am using the same GTF file as in the example gencode.v36.annotation.gtf.

Can you help me with this? Thanks!

@HelloWorldLTY
Copy link

HelloWorldLTY commented Sep 20, 2022

Hi, I think this code will help you:
var_list = []

for i in adata_atac.var_names: new_i=i.replace('-', '_').replace(':','_') var_list.append(new_i)
image

But it seems that the gene number they found is quite small...

@DaneseAnna
Copy link
Collaborator

Hi,

If the number you obtain is quite small, you should check if the adata.var are sorted per coordinates.

Best,
Anna

@HelloWorldLTY
Copy link

Hi, I do not quite understand your meaning. Do I need to sort adata.var_names in ahead? Do you have any tutorials about this part? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants