Fylo›ARCADIA›Graph
Hubs

Clustering reduces database size, maintains diversity — ARCADIA Knowledge Graph

Association·arcadia

Clustering reduces database size, maintains diversity

Claim that clustering the NCBI nr protein database using sequence similarity decreases its size by over half while preserving taxonomic diversity in search results

Confidence
90%
active

Evidence Quote

“Clustering the NCBI non-redundant protein database collapses similar sequences, reduces database by over half, and maintains diversity.”

Arguments

Clustering of NCBI nr based on sequence similaritysubject
Size of protein sequence databaseobject

Connections (4)

Taxonomic diversity of matched search resultsFactor
Clustering yields faster, leaner searches without taxonomic lossInferenceChain
Structural similarity often diverges from sequence similarityAssociation
Reasoning supporting benefits of database clusteringInferenceChain