Association·arcadia

Filtering stringency affects data leakage and sequence diversity

Claim that sequence similarity filtering stringency affects data leakage and the retained sequence diversity unevenly across protein families, influencing model training data distribution and statistical power.

Confidence
90%
active

Evidence Quote

Filtering stringency affects the likelihood of data leakage and the distribution of retained sequence diversity across protein families

Relationship

Sequence similarity filtering stringency affects Effective sequence number