ProteinGym

Advancing protein engineering and computational biology through structured datasets, reproducible benchmarking, and open-source tools.

Structured Datasets

Structured, standardized datasets for protein variant effect prediction that replace undocumented CSVs with formal schemas and TOML configurations.

Reproducible Benchmarking

Reproducible benchmarking frameworks for model evaluation with containerized workflows through Docker for consistent, comparable results.

Open-Source Tools

Open-source tools that facilitate research and development in protein science, enabling researchers and ML practitioners to collaborate effectively.