Fylo›ARCADIA›Graph
Hubs
ReasoningCheckpoint·arcadia

Algorithmic curation and model training amplify data bias

Algorithmic approaches for data selection, curation, and model training (such as generating or clustering sequences/structures) can amplify existing sampling biases, further entrenching representation gaps.

Confidence
70%
◑partialactive

Part of Chain

Tree-of-life sampling and algorithmic biases shape the performance of protein language/design models