Supported Models
OmniGenBench provides plug-and-play evaluation for over 30 genomic foundation models, covering both RNA and DNA modalities. The following are highlights:
| Model | Params | Pre-training Corpus | Highlights |
|---|---|---|---|
| OmniGenome | 186M | 54B plant RNA+DNA tokens | Multi-modal, structure-aware encoder |
| Agro-NT-1B | 985M | 48 edible-plant genomes | Billion-scale DNA LM w/ NT-V2 k-mer vocab |
| RiNALMo | 651M | 36M ncRNA sequences | Largest public RNA LM; FlashAttention-2 |
| DNABERT-2 | 117M | 32B DNA tokens, 136 species (BPE) | Byte-pair encoding; 2nd-gen DNA BERT |
| RNA-FM | 96M | 23M ncRNA sequences | High performance on RNA structure tasks |
| RNA-MSM | 96M | Multi-sequence alignments | MSA-based evolutionary RNA LM |
| NT-V2 | 96M | 300B DNA tokens (850 species) | Hybrid k-mer vocabulary |
| HyenaDNA | 47M | Human chromosomes | Long-context autoregressive model (1Mb) |
| SpliceBERT | 19M | 2M pre-mRNA sequences | Fine-grained splice-site recognition |
| Caduceus | 1.9M | Human chromosomes | Ultra-compact DNA LM (RC-equivariant) |
| RNA-BERT | 0.5M | 4,000+ ncRNA families | Small BERT with nucleotide masking |
| ...and more | — | Includes PlantRNA-FM, UTR-LM, MP-RNA, CALM, etc. |