Association·arcadia
Species abundance causes model bias
Protein language models preferentially generate proteins from abundant species creating bias
Confidence
90%
active
Evidence Quote
“Species abundance disparities cause protein language model biases favoring abundant species”
Relationship
Species abundance disparities causing model bias causes pLM performance bias
Connections (3)
Evidence
“Preprint demonstrating that protein language models reflect biases due to unequal sequence sampling in protein databases across taxa”
Ding F & Steinhardt J (2024). Protein language models are biased by unequal sequence sampling across the tree of life doi:10.1101/2024.03.07.584001 ↗