ProteinGym

Advancing protein engineering and computational biology through structured datasets, reproducible benchmarking, and open-source tools.

Structured, standardized datasets for protein variant effect prediction that replace undocumented CSVs with formal schemas and TOML configurations.

Reproducible benchmarking frameworks for model evaluation with containerized workflows through Docker for consistent, comparable results.

Open-source tools that facilitate research and development in protein science, enabling researchers and ML practitioners to collaborate effectively.