Association·arcadia
Filtering stringency affects data leakage and sequence diversity
Claim that sequence similarity filtering stringency affects data leakage and the retained sequence diversity unevenly across protein families, influencing model training data distribution and statistical power.
Confidence
90%
active
Evidence Quote
“Filtering stringency affects the likelihood of data leakage and the distribution of retained sequence diversity across protein families”
Relationship
Sequence similarity filtering stringency affects Effective sequence number