Fylo›ARCADIA›Graph
Hubs
Association·arcadia

Filtering stringency affects data leakage

More stringent sequence similarity filtering reduces likelihood of data leakage, affecting model performance

Confidence
90%
active

Evidence Quote

“Stringency of sequence similarity filtering is a proxy for likelihood of data leakage”

Relationship

Sequence similarity filtering stringency decreases Data leakage

Arguments

Sequence similarity filtering stringencysubject
Data leakageobject

Connections (5)

Data leakage and training data bias impact model performanceInferenceChain
Genome contamination yields HGT false positivesAssociation
Impact of sequence similarity filtering on data leakage and sequence diversityInferenceChain
Data leakage and training data biases impact model performanceInferenceChain
Data leakage and biases impact biological foundation model performanceInferenceChain