Fylo›ARCADIA›Graph
Hubs
ReasoningCheckpoint·arcadia

Uneven sampling across tree of life distorts sequence/structure space

The overwhelming representation of a few phyla or species in structure and sequence databases leads to non-uniform, biased sampling in datasets, distorting the landscape models can learn from.

Confidence
70%
◑partialactive

Part of Chain

Tree-of-life sampling and algorithmic biases shape the performance of protein language/design models