Browsing by Author "Nandy, S.K."

Now showing 1 - 5 of 5

Micro-Architectural support for High Availability of NoC-based MP-SoC
(Institute of Electrical and Electronics Engineers Inc., 2019) Singh, R.; Ranga, S.V.; Patil, S.; Krishna, M.; Mehta, M.; Anoop, M.N.; Nandy, S.K.; Haldar, C.; Narayan, R.; Neumann, F.; Baufreton, P.
In this paper, we focus on increasing the availability of Multi-Processor System on Chip (MP-SoC) for executing user applications, even when some components of the system are faulty. A Network-on-Chip (NoC) provides high bandwidth communication substrate for the multitude of components/modules in such MP-SoCs. Health of such MP-SoC, and hence its availability, is largely dependent on the health of the NoC. We consider an NoC comprising a bidirectional toroidal mesh interconnection of routers. We use a distributed built-in-self-test to identify faulty communication links. We use information so obtained to determine healthy subsystems that can be made available for executing user applications. This feature is key for enhancing availability of MP-SoCs. We realize this feature as a micro-architectural enhancement in MP-SoC that incurs an insignificant hardware overhead of less than 2%. Latency incurred for analyzing availability of MP-SoC is also insignificant. We functionally validate our proposal by emulating the system on a FPGA device and demonstrate increase in availability of the MP-SoC. Â© 2019 IEEE.
ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
(2018) Natarajan, S.; KrishnaKumar, N.; Pavan, M.; Pal, D.; Nandy, S.K.
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host. � 2018 IEEE.
ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
(Institute of Electrical and Electronics Engineers Inc., 2018) Natarajan, S.; KrishnaKumar, N.; Pavan, M.; Pal, D.; Nandy, S.K.
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host. Â© 2018 IEEE.
ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads
(2018) Natarajan, S.; KrishnaKumar, N.; Anuchan, H.V.; Pal, D.; Nandy, S.K.
Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150�600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors. � Springer International Publishing AG, part of Springer Nature 2018.
ReneGENE-Novo: Co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads
(Springer Verlag service@springer.de, 2018) Natarajan, S.; KrishnaKumar, N.; Anuchan, H.V.; Pal, D.; Nandy, S.K.
Sufficiently long genome strings, permitting adequate overlaps, is key to producing a quality genome assembly with minimal error rates and high coverage. Next Generation Sequencing (NGS) platforms produce large volumes (tera bytes) of short-sized raw genomic strings or reads (150â€“600 genomic alphabets or bases) with minimal error rates. If we are able to increase the read lengths of raw short reads computationally before assembly, then the full potential of short reads from NGS and de novo assembly can be harvested. The large data redundancy offered by billions of such raw reads, compounded by the target genome length of billions of bases, requires a complex big data engineering solution. This paper presents a co-designed algorithm-architecture model for ReneGENE de novo assembly (part of a larger ReneGENE-GI Genome Informatics pipeline). This module takes randomly presented short reads from NGS platforms and extends them iteratively to an appropriate length by identifying overlaps among them, aiding high-coverage assembly with minimal error rates. This task is parallelized across multiple processes, to allow parallel read assembly with performance scalability. Supported by parallel algorithms, multi-dimensional data structures and fine-grain synchronization, the module realises irregular computing for de novo assembly. A single FPGA realization of this model with 128 de novo compute elements, shows a 48.69x improvement in performance when compared to an 8-core implementation on a standard workstation based on Intel Core i7-4770 processors. Â© Springer International Publishing AG, part of Springer Nature 2018.