Factor·arcadia
Species abundance disparities causing model bias
Taxonomic bias where protein language models preferentially generate proteins from abundant species due to disparities in protein database species abundance.
Confidence
90%
active
Source
Ding and Steinhardt 2024 on taxonomic abundance biases in protein language models