Fylo›ARCADIA›Graph
Hubs

Taxonomic bias alters Foldseek, protein language model, and protein design outcomes — ARCADIA Knowledge Graph

Association·arcadia

Taxonomic bias alters Foldseek, protein language model, and protein design outcomes

Claim that taxonomic bias in AlphaFold and related datasets alters outputs from Foldseek, protein language models (like Progen2, ESM2), and can negatively affect protein design.

Confidence
80%
active

Evidence Quote

“Taxonomic makeup of AlphaFold and representative proteins used in Foldseek’s clustering workflow reflect biases. Uneven sampling led to systematic biases in the output of protein language models and negatively influenced protein design.”

Relationship

Phylogenetic bias affects Machine learning models for protein design

Arguments

Phylogenetic biassubject
Machine learning models for protein designobject

Connections (3)

Structural similarity often diverges from sequence similarityAssociation
Reasoning: Language models and deep learning in protein structure prediction and designInferenceChain
Tree-of-life sampling and algorithmic biases shape the performance of protein language/design modelsInferenceChain

Evidence

“Evidence demonstrating evolutionary-scale prediction of atomic-level protein structure using a language model, as described by Lin et al. (2023).”

Lin Z et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model doi:10.1126/science.ade2574 ↗