Association·arcadia
Clustering reduces database size, maintains diversity
Claim that clustering the NCBI nr protein database using sequence similarity decreases its size by over half while preserving taxonomic diversity in search results
Confidence
90%
active
Evidence Quote
“Clustering the NCBI non-redundant protein database collapses similar sequences, reduces database by over half, and maintains diversity.”