GitHub - StatBiomed/DNA-AI-hackathon

permalink	layout	title	list_title
/	page	About	AI Hackathon on Genomic Language Model

Welcome to the AI Hackathon on Genomic Language Model hosted by the StatBiomed lab at the University of Hong Kong.

Materials are stored in this GitHub repo: https://github.com/StatBiomed/DNA-AI-hackathon

Schedule

Full day intense hackathon: April 2nd, 2025
Tutorial: March 26th, 2025, 2-3.30pm by Dr Shumin Li

Candidate models

Evo2 (bioRxiv): 7B or 40B with trained on 9.3 trillion DNA base pairs (in 1 million token context)
NucleotideTransformer (paper) Model: ranging from 50 million up to 2.5 billion parameters and integrating information from 3,202 human genomes and 850 genomes from diverse species
Caduceus (arXiv) MambaDNA-based model
GET (paper) general expression transformer

Tasks

Task1: SNP2GEX

See details in ./task1_SNP2GEX/

In this task, we will use individuals's genomic sequence (with both common and rare variants) to predict its cis gene expression, at both gene level expression (task 1a) or promoter activities (task 1b).

The success of this task will have a huge impact on precision medicine, including the study of the functional effects of rare variants and somatic variants, which lack power in common eQTL studies.

Multiple labs have been working on this, but it remains an extremely challenging task. We are benchmarking multiple state-of-the-art genomic language models.

Task2: Seq2CellxTF

See details ./task2_Seq2CellxTF/

In this task, we will use the reference sequence and its regulatory regions to predict the cell type-specific transcription factor bindings.

As you can see, the input data is limited, mainly the relatively short sequence, optionally with its cic-gene expression (maybe chromatin open accessibility in the future), but the output is extremely large, not only many TFs but in diverse cell conditions.

This would be an ideal task to demonstrate the zero-shot capability of "foundation model"

Name	Name	Last commit message	Last commit date
Latest commit huangyh09 add sponsors Mar 20, 2025 f98663d · Mar 20, 2025 History 6 Commits
task1_SNP2GEX	task1_SNP2GEX	update SNP2GEX dataloader	Mar 17, 2025
task2_Seq2CellxTF	task2_Seq2CellxTF	update	Mar 14, 2025
.gitignore	.gitignore	seq2gex update	Mar 17, 2025
README.md	README.md	add sponsors	Mar 20, 2025
_config.yml	_config.yml	initial edits	Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Schedule

Candidate models

Tasks

Task1: SNP2GEX

Task2: Seq2CellxTF

Sponsors

About

Releases

Packages

Contributors 3

Languages

StatBiomed/DNA-AI-hackathon

Folders and files

Latest commit

History

Repository files navigation

Schedule

Candidate models

Tasks

Task1: SNP2GEX

Task2: Seq2CellxTF

Sponsors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages