WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data.

BMC bioinformatics

PubMedID: 26335184

Farrant GK, Hoebeke M, Partensky F, Andres G, Corre E, Garczarek L. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics. 2015;16(1):281.
BACKGROUND
The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding.

RESULTS
Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 ( http://gage.cbcb.umd.edu/ ). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome.

CONCLUSION
Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes.