Introduction
STAR is an ultrafast universal RNA-seq aligner.
Motivation
Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.
Results
To align a large (> 8 billion reads) ENCODE Transcriptome RNA-seq dataset, the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, experiments validated 1960 novel intergenic splice junctions with an 80% to 90% success rate, corroborating the high precision of the STAR mapping strategy.
Availability and Implementation
STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
For more information, visit the official STAR website.
Programming language: C++
Brief description: an RNA-seq aligner
Open source license: GPL 3.0
Recommended Software Version
STAR 2.7.1a