Fylo›ARCADIA›Graph
Hubs
InferenceChain·arcadia

Taxonomic bias reasoning for AFDB and model outcomes

Logical chain connecting taxonomic biases in protein structure databases to their impact on AlphaFold pLDDT, Foldseek clustering, protein language model outputs, and downstream applications. Explains how dataset imbalances and uneven sampling propagate bias, and considers curation as a mitigating intervention.

Confidence
80%
◑partialactivecomplexity: mid

Reasoning Steps (3)

Taxonomic bias covaries with pLDDT and FoldseekStep 1
Sampling bias induces systematic bias in model outputsStep 2
Intentional dataset curation as remedyStep 3

Source

Synthesis for current paper

Connections (7)

Taxonomic biases covary with AlphaFold pLDDTAssociation
Taxonomic bias impacts pLDDT-dependent AlphaFold applicationsAssociation
AlphaFold and Foldseek show strong taxonomic concordanceAssociation
Taxonomic sampling bias affects protein language modelsAssociation
Dataset curation can mitigate issues from sampling biasAssociation
Balancing curation of AFDB reduces accessible protein universe sizeAssociation
AFDB is an imbalanced datasetAssociation