Browsing by Author "Pal, D."

Now showing 1 - 4 of 4

ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
(2018) Natarajan, S.; KrishnaKumar, N.; Pavan, M.; Pal, D.; Nandy, S.K.
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host. � 2018 IEEE.
ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
(Institute of Electrical and Electronics Engineers Inc., 2018) Natarajan, S.; KrishnaKumar, N.; Pavan, M.; Pal, D.; Nandy, S.K.
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host. Â© 2018 IEEE.
ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads
(2018) Natarajan, S.; KrishnaKumar, N.; Anuchan, H.V.; Pal, D.; Nandy, S.K.
Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150�600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors. � Springer International Publishing AG, part of Springer Nature 2018.
ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads
(Springer Verlag service@springer.de, 2018) Natarajan, S.; KrishnaKumar, N.; Anuchan, H.V.; Pal, D.; Nandy, S.K.
Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150â€“600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors. Â© Springer International Publishing AG, part of Springer Nature 2018.