Introduction
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
The CD-HIT package has CD-HIT, CD-HIT-2D, CD-HIT-EST, CD-HIT-EST-2D, CD-HIT-454, CD-HIT-PARA, PSI-CD-HIT, CD-HIT-OTU, CD-HIT-LAP, CD-HIT-DUP, and over a dozen scripts.
Programming language: Python
Brief description: A program for clustering and comparing protein or nucleotide sequences.
Recommended Software Version
CD-HIT 4.8.1
Parent topic: CD-HIT 4.8.1 Porting Guide (Kylin V10)