Supported Models

OmniGenBench provides plug-and-play evaluation for over 30 genomic foundation models, covering both RNA and DNA modalities. The following are highlights:

Model	Params	Pre-training Corpus	Highlights
OmniGenome	186M	54B plant RNA+DNA tokens	Multi-modal, structure-aware encoder
Agro-NT-1B	985M	48 edible-plant genomes	Billion-scale DNA LM w/ NT-V2 k-mer vocab
RiNALMo	651M	36M ncRNA sequences	Largest public RNA LM; FlashAttention-2
DNABERT-2	117M	32B DNA tokens, 136 species (BPE)	Byte-pair encoding; 2nd-gen DNA BERT
RNA-FM	96M	23M ncRNA sequences	High performance on RNA structure tasks
RNA-MSM	96M	Multi-sequence alignments	MSA-based evolutionary RNA LM
NT-V2	96M	300B DNA tokens (850 species)	Hybrid k-mer vocabulary
HyenaDNA	47M	Human chromosomes	Long-context autoregressive model (1Mb)
SpliceBERT	19M	2M pre-mRNA sequences	Fine-grained splice-site recognition
Caduceus	1.9M	Human chromosomes	Ultra-compact DNA LM (RC-equivariant)
RNA-BERT	0.5M	4,000+ ncRNA families	Small BERT with nucleotide masking
...and more	—		Includes PlantRNA-FM, UTR-LM, MP-RNA, CALM, etc.