Skip to content

Try to create the chicken cisdatabase #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LJZYaaa opened this issue Jun 26, 2024 · 8 comments
Open

Try to create the chicken cisdatabase #50

LJZYaaa opened this issue Jun 26, 2024 · 8 comments

Comments

@LJZYaaa
Copy link

LJZYaaa commented Jun 26, 2024

Hello,I'm trying to create the chicken's cistarget database for my single-cell research analysis, and already creating the GRCg6a.regions_vs_motifs.rankings.feather throught EPD's bed and v10_clust motifs. But when i try to run the pyscenic ctx using the feather file and motifs-v10-nr.chicken-m0.00001-o0.0.tbl , it report that
{A743906C-64B5-4897-BF20-BEFF1F2209C7}
{A743906C-64B5-4897-BF20-BEFF1F2209C7}
{A743906C-64B5-4897-BF20-BEFF1F2209C7}
Can you make some advice for that wrong ?

@ghuls
Copy link
Member

ghuls commented Jun 28, 2024

Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be None according to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database.

@LJZYaaa
Copy link
Author

LJZYaaa commented Jun 30, 2024

Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be None according to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database.
So glad to receive your reply, i have successed in making the genes vs motifs file by adding -g '|ENSGALG[0-9]+|ENSGALT[0-9.]+$' .
But i still have some questions about this instruction. Because as you konw that many chicken's genes dont have the accurate gene symbol and only named as eg:ENSGALG00010029927, what should i do if i want to these genes also been included in the genes vs motifs.feather file?
the below picture is the gene names of my TSS.fa file :
image
Anyway, thanks for your help, hope everyting go well with you! @ghuls

@heckern
Copy link

heckern commented Jul 2, 2024

ENSEMBL switched from GRCg6a (galGal6) to GRCg7b (bGalGal1) in recent releases. ENSGALG00010029927, for example, is an ENSEMBL ID for the GRCg7b assembly. So, the coordinates would not match if you are using GRCg6a.

They still provide gene annotations for GRCg6a on their ftp server (at least for ENSEMBL 108). But many gene names are missing. We updated the protein-coding gene names for the GRCg6a version (ENSEMBL 108) for one of our recent data sets, in case that is helpful:

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE262nnn/GSE262321/suppl/GSE262321%5Fgex%5Fgenes%2Etsv%2Egz

@LJZYaaa
Copy link
Author

LJZYaaa commented Jul 3, 2024

I have made two feather file which one is made from 500bp around TSS the other is made from 10KB around TSS , and find that both of them are not ideal ,only 6 to 7 TFs are detected in the results. What's more, the detected TSs are totally different! feel sad about results, EMMMMMMMMMM
As your resources has a chicken's tbl , i wonder whehter your them have ever tried to make a chicken's cisdatabases? TAT
Antway, thanks for your reply, hope you have a good day ^-^

@ghuls
Copy link
Member

ghuls commented Jul 8, 2024

@LJZYaaa Are you sure that you are using the correct gene annotation GRCg7b (bGalGal1) with the correct FASTA (GRCg7b (bGalGal1)) file?

@LJZYaaa
Copy link
Author

LJZYaaa commented Jul 8, 2024

@ghuls I made the fa file throught Ensembl biomart as below:
image
And this is the code that i create the feather file:
image
Is there anything wrong in that? hope for receiving your reply ^^

@ghuls
Copy link
Member

ghuls commented Jul 9, 2024

At first glance it looks OK.

Now that your gene names are from GRCg7b in the Feather database, make sure to convert your expression matrix gene names to GRCg7b too.

For pySCENIC it might be better to use the human motif to TF than the chicken (GRCg6a) one (: https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl, same with the TF list: https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt (prune it, so you only have TFs that are actually in your database).

As we mainly work with scATAC data instead of scRNA, I think we might not have a gene-based chicken cisTarget database internally, but only region-based ones. So there is a chance that the gene-based version does not work very well.

@LJZYaaa
Copy link
Author

LJZYaaa commented Jul 12, 2024

@ghuls ok, i will try it. It will be with great regret that your tool can't be used in the non-model specise. Anyway, thanks for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants