Heterogeneous Cloud Framework for Big Data Genome Sequencing.

IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM

PubMedID: 26357087

Chao Wang , Xi Li , Peng Chen , Aili Wang , Xuehai Zhou , Hong Yu . Heterogeneous Cloud Framework for Big Data Genome Sequencing. IEEE/ACM Trans Comput Biol Bioinform. 2015;12(1):166-78.
The next generation genome sequencing problem with short (long) reads is an emerging field in numerous scientific and big data research domains. However, data sizes and ease of access for scientific researchers are growing and most current methodologies rely on one acceleration approach and so cannot meet the requirements imposed by explosive data scales and complexities. In this paper, we propose a novel FPGA-based acceleration solution with MapReduce framework on multiple hardware accelerators. The combination of hardware acceleration and MapReduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. To evaluate the performance and other metrics, we conducted a theoretical speedup analysis on a MapReduce programming platform, which demonstrates that our proposed architecture have efficient potential to improve the speedup for large scale genome sequencing applications. Also, as a practical study, we have built a hardware prototype on the real Xilinx FPGA chip. Significant metrics on speedup, sensitivity, mapping quality, error rate, and hardware cost are evaluated, respectively. Experimental results demonstrate that the proposed platform could efficiently accelerate the next generation sequencing problem with satisfactory accuracy and acceptable hardware cost.