Introduction
CPAT is a bioinformatics tool to predict RNA's coding probability based on the RNA sequence characteristics. To achieve this goal, CPAT calculates scores of these 4 linguistic features from a set of known protein-coding genes and another set of non-coding genes.
- ORF size
- ORF coverage
- Fickett TESTCODE
- Hexamer usage bias
CPAT will then build a logistic regression model using these 4 features as predictor variables and the "protein-coding status" as the response variable. After evaluating the performance and determining the probability cutoff, the model can be used to predict new RNA sequences.
For more information, visit the CPAT official website.
Programming language: Python
Brief description: CPAT is a bioinformatics tool to predict RNA's coding probability based on the RNA sequence characteristics.
Open source license: GNU General Public License
Recommended Software Version
CPAT 3.0.4