Factor·arcadia
Phylogenetic bias
Systematic over- or under-representation of certain lineages/taxa in a dataset, creating uneven phylogenetic distribution.
Confidence
100%
active
Source
Prachee Avasthi et al. (2024). How phylogenetic bias shapes protein databases and models doi:10.57844/arcadia-570f-5cfb ↗
Connections (16)
Protein structure databases are phylogenetically biasedAssociation
Protein Data Bank is phylogenetically biasedAssociation
AlphaFold database is phylogenetically biasedAssociation
Phylogenetic bias in databases influences protein model outcomesAssociation
Phylogenetic bias limits evolutionary diversity in databasesAssociation
Taxonomic completeness in AFDB assessed using TimeTreeAssociation
Taxonomic biases covary with AlphaFold pLDDTAssociation
Taxonomic bias impacts pLDDT-dependent AlphaFold applicationsAssociation
Taxonomic sampling bias affects protein language modelsAssociation
Dataset curation can mitigate issues from sampling biasAssociation
AFDB is an imbalanced datasetAssociation
Taxonomic bias in AFDB covaries with pLDDT and impacts Foldseek applicationsAssociation
Taxonomic bias alters Foldseek, protein language model, and protein design outcomesAssociation
Phylogenetic biases and non-independence cap model generalizabilityAssociation
Explicit phylogenetic information in models improves generalizability and accuracyAssociation
Pseudoreplication and non-independence limit language model generalizabilityAssociation