Factor·arcadia

Species abundance disparities causing model bias

Taxonomic bias where protein language models preferentially generate proteins from abundant species due to disparities in protein database species abundance.

Confidence
90%
active

Source

Ding and Steinhardt 2024 on taxonomic abundance biases in protein language models