Association·arcadia
Taxonomic bias alters Foldseek, protein language model, and protein design outcomes
Claim that taxonomic bias in AlphaFold and related datasets alters outputs from Foldseek, protein language models (like Progen2, ESM2), and can negatively affect protein design.
Confidence
80%
active
Evidence Quote
“Taxonomic makeup of AlphaFold and representative proteins used in Foldseek’s clustering workflow reflect biases. Uneven sampling led to systematic biases in the output of protein language models and negatively influenced protein design.”
Relationship
Phylogenetic bias affects Machine learning models for protein design
Connections (3)
Evidence
“Evidence demonstrating evolutionary-scale prediction of atomic-level protein structure using a language model, as described by Lin et al. (2023).”
Lin Z et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model doi:10.1126/science.ade2574 ↗