🧫 Single-Cell Omics Arena

Single-Cell Omics Arena (SOAR) is a comprehensive benchmark framework designed to evaluate and improve the performance of instruction-tuned large language models (LLMs) in automated cell type annotation from single-cell omics data.

Updates

[2025-08-03] 🎉 We are excited to provide a command line interface and the pre-built package soar_benchmark for easier usages.
[2025-05-12] 🎉 We are excited to announce that Single-Cell Omics Arena (SOAR) is now open-source! We welcome contributions from the community to help advance automated cell type annotation using LLMs.

Installation

Create an environment with Python >= 3.11
Clone the repo via git clone [email protected]:jhliu17/SOAR.git
Install soar_benchmark via pip install -e .

API Key Setup

To execute LLMs provided by OpenAI or hosted on Hugging Face Transformers, an env file (env.toml) should be set up in the project folder. A template env file is provided as env_sample.toml.

Cell Type Annotation with LLMs

Annotation Interface

To print out all supported LLMs, please run:

soar annotate -h

This will print out all available annotation options using LLMs.

soar currently supports the following LLMs for cell type annotation:

Qwen2 series (1.5B, 7B, 72B)
Meta Llama-3 70B
Mixtral-8x7B
GPT-4o
GPT-4o-mini

where all LLMs can leverage zero-shot prompting or zero-shot chain-of-thought prompting to finish cell type annotations.

Each model has specific hardware requirements and configurations:

Smaller models (1.5B-7B): Single GPU with 16GB VRAM
Larger models (70B-72B): Multi-GPU setup with 4-8 GPUs
Mixtral-8x7B: 4 GPUs recommended for optimal performance

For example, to reproduce the SOAR-RNA benchmark result on GPT-4o using zero-shot prompting, one can run

soar annotate soar_rna_with_gpt4_o_zero_shot

Custom Dataset

If one would like to leverage a provided LLM annotation configuration to annotate their own dataset, this can be achieved by

soar annotate soar_rna_with_gpt4_o_zero_shot --config.dataset.json-path YOUR_DATASET_PATH

where the custom dataset should follow the same structure as soar_benchmark/datasets/soar_rna.json.

One can further fine-tune the preset configuration by overriding some arguments. For example, increasing the new token number limit to 2048.

soar annotate soar_rna_with_gpt4_o_zero_shot --config.generation.max-new-tokens 2048

# To see more tunable options
soar annotate soar_rna_with_gpt4_o_zero_shot -h

Custom LLM Configuration

If you would like to implement a custom annotation configuration. Please refer to the detailed configuration settings including batch sizes, memory requirements, and hardware specifications in:

Model configs: soar_benchmark/configs/cell_type_annotation/experiment_soar_rna.py

Once you have implemented a custom configuration, you can use it by calling the built-in annotation function

from soar_benchmark import start_annotation_task

# Your custom configuration
custom_configuration = CellTypeAnnotationTaskConfig(...)

# Start annotation
start_annotation_task(custom_configuration)

Evaluations

To run evaluations on annotated results, please refer to

python -m analysis.cell_type_annotation.squad_eval --chat_results_path outputs/.../qwen2-72b-instruct.json --squad_eval_results_path outputs/.../few_shot_squad_eval_inflect.json

Metrics

To evaluate the free-format cell type annotations generated by LLMs, we employ seven widely used metrics from natural language processing and question answering. These include ROUGE (R-1, R-2, R-L) for measuring n-gram and sequence overlap, METEOR for capturing semantic similarity through surface forms, stems, and synonyms, and BLEU (BLEU-1, BLEU-2, and geometric average) for n-gram overlap, particularly suited for short phrases. Additionally, the Exact Match (EM) and F1 score are used to assess token-level precision and recall, ensuring fair evaluation despite label variability and synonym usage.

SOAR-RNA Benchmark

Zero-shot Cell Type Annotation

Model	R-1	R-2	R-L	MET.	B-1	B-2	BLEU
CellMarker2.0	31.88	13.36	31.76	23.83	41.07	18.05	27.23
SingleR	16.51	5.98	16.49	2.96	24.41	0.00	0.00
ScType	12.37	3.44	12.24	20.18	21.47	6.73	10.77
DeepSeek-LLM-67B	33.13	13.47	32.74	24.27	28.27	10.07	16.87
Qwen2-72B	32.39	14.76	32.05	29.96	18.59	6.67	11.13
Llama-3-70B	30.16	13.45	29.83	27.35	22.31	8.85	14.33
Mixtral-8×7B	20.95	13.40	20.78	16.94	17.61	7.16	10.23
Mixtral-8×22B	39.85	18.19	39.95	28.60	42.06	19.40	29.18
Cell2Sentence	26.87	11.48	26.76	19.45	25.24	11.79	17.25
GPT-4o mini	52.63	27.45	52.26	41.08	45.74	23.29	32.64
GPT-4o	58.45	32.07	58.12	45.39	62.85	42.68	51.79

Zero-shot Chain-of-thought Cell Type Annotation

Model	R-1	R-2	R-L	MET.	B-1	B-2	BLEU
CellMarker2.0	-	-	-	-	-	-	-
SingleR	-	-	-	-	-	-	-
ScType	-	-	-	-	-	-	-
DeepSeek-LLM-67B	40.79	17.50	40.47	31.13	33.72	13.10	21.02
Qwen2-72B	46.56	23.93	46.34	37.09	36.85	17.92	25.69
Llama-3-70B	42.25	21.24	42.02	34.09	25.94	11.64	17.38
Mixtral-8×7B	42.37	21.37	41.82	35.45	31.57	13.83	20.90
Mixtral-8×22B	51.65	26.73	51.26	41.97	40.96	19.40	28.19
Cell2Sentence	-	-	-	-	-	-	-
GPT-4o mini	51.63	26.60	51.17	40.84	50.29	27.89	37.45
GPT-4o	57.67	31.55	57.34	45.36	55.27	32.15	42.15

SOAR-MultiOmics Benchmark

RNA-seq

Model	R-1	R-2	R-L	MET.	B-1	B-2	BLEU
Qwen2-72B	21.83	6.02	20.57	11.54	20.77	5.32	10.51
Llama-3-70B	27.41	11.71	27.55	17.73	27.41	10.10	16.64
Mixtral-8×7B	33.41	18.28	33.65	26.11	33.09	18.45	24.71
Mixtral-8×22B	27.66	11.29	27.67	16.27	30.63	8.06	12.90
Cell2Sentence	28.03	18.29	28.42	20.63	41.35	35.29	38.20
GPT-4o mini	39.63	21.28	39.26	29.92	37.40	21.05	28.06
GPT-4o	41.00	23.10	41.37	30.20	43.75	26.32	33.93

ATAC-seq

Model	R-1	R-2	R-L	MET.	B-1	B-2	BLEU
Qwen2-72B	19.55	8.64	18.43	11.80	16.07	3.79	7.80
Llama-3-70B	31.76	13.01	31.83	19.39	29.84	10.23	17.47
Mixtral-8×7B	29.71	14.95	29.36	24.21	23.08	9.77	15.02
Mixtral-8×22B	30.38	13.13	30.22	19.39	24.08	8.33	12.99
Cell2Sentence	26.54	10.07	26.45	17.29	36.04	22.67	28.58
GPT-4o mini	38.15	17.01	37.06	28.08	35.46	17.14	24.66
GPT-4o	38.47	16.88	38.31	25.23	38.84	21.18	28.68

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
analysis		analysis
soar_benchmark		soar_benchmark
.gitattributes		.gitattributes
.gitignore		.gitignore
env_sample.toml		env_sample.toml
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧫 Single-Cell Omics Arena

Updates

Installation

API Key Setup

Cell Type Annotation with LLMs

Annotation Interface

Custom Dataset

Custom LLM Configuration

Evaluations

Metrics

SOAR-RNA Benchmark

Zero-shot Cell Type Annotation

Zero-shot Chain-of-thought Cell Type Annotation

SOAR-MultiOmics Benchmark

RNA-seq

ATAC-seq

About

Uh oh!

Releases

Packages

Languages

jhliu17/SOAR

Folders and files

Latest commit

History

Repository files navigation

🧫 Single-Cell Omics Arena

Updates

Installation

API Key Setup

Cell Type Annotation with LLMs

Annotation Interface

Custom Dataset

Custom LLM Configuration

Evaluations

Metrics

SOAR-RNA Benchmark

Zero-shot Cell Type Annotation

Zero-shot Chain-of-thought Cell Type Annotation

SOAR-MultiOmics Benchmark

RNA-seq

ATAC-seq

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages