Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
10 results
Search Results
Item Cache analysis and software optimizations for faster on-chip network simulations(Institute of Electrical and Electronics Engineers Inc., 2016) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speedup the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93× and 3.97× were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively. © 2016 IEEE.Item YaNoC: Yet another network-on-chip simulation acceleration engine using FPGAS(IEEE Computer Society help@computer.org, 2018) Parane, K.; Talawar, B.; Prabhu Prasad, P.In this paper, we present an FPGA based NoC simulation framework, YaNoC, that supports the creation of standard and custom topologies, design of routing algorithms, generation of various synthetic traffic patterns, and exploration of a full set of microarchitectural parameters. The framework supports all standard minimal routing algorithms for conventional NoCs and implements table based routing to support the creation of new routing algorithm. A custom topology called Diagonal Mesh (DMesh) has been evaluated using table based and a modified version of the XY routing algorithm. Mesh and DMesh topologies saturate at the injection rates of 45 % and 55 %. We find that the Table based routing implementation consumes 0.98× fewer hardware resources than the conventional XY routing. We observed the speedup of 2548× compared to the Booksim software simulator. YaNoC achieves speedup of 2.54× and 25× with respect to CONNECT and DART FPGA based NoC simulators. © 2018 IEEE.Item Trace-Driven Simulation and Design Space Exploration of Network-on-Chip Topologies on FPGA(Institute of Electrical and Electronics Engineers Inc., 2018) Sangeetha, G.S.; Radhakrishnan, V.; Prabhu Prasad, P.; Parane, K.; Talawar, B.Networking On Chips is now becoming an extremely important part of the present and future of electronic technology. It is extensively used in Multiprocessor System-on-Chips and in Chip Multiprocessors. Using an NoC, the backend wiring involved has drastically reduced in an SoC. Further, SoCs with NoC interconnect operates at a higher operating frequency, mainly because the hardware required for switching and routing are simplified. The NoC researchers have relied on simulators based on performance and power to study the different factors of NoC such as algorithm in place, the topology, the buffer management and location schemes, the flow control and routing among others. In this paper, we present a trace-driven NoC architecture that gives the user access to realistic details about the resource utilization of NoC architectures and their individual components. This includes exploration of various design decision parameters of NoC by modeling them on a FPGA. The paper also presents the performance of these architectures by conducting trace-driven simulations using benchmarks like PARSEC. Different topologies are considered for experimentation purposes with different routing algorithms. © 2018 IEEE.Item FPGA based NoC Simulation Acceleration Framework Supporting Adaptive Routing(Institute of Electrical and Electronics Engineers Inc., 2018) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.In this paper, we present fast and param-eterized FPGA based Network-on-Chip (NoC) simu-lation acceleration framework with automated HDL generation engine. The framework supports the NoC architecture design parameters such as topology, rout-ing algorithms, link width, buffer size, flow control and traffic patterns. The parameterized, high perfor-mance and lightweight nature of proposed NoC based framework makes the ideal choice for NoC research studies. The Mesh based topologies have been con-sidered for the experimentation purpose. A congestion aware adaptive routing has been proposed along with the conventional XY routing. Also, parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on the NoC behavior. The adaptive routing algorithm for Mesh based topologies has negligible FPGA area overhead compared to the conventional XY routing. Employing the adaptive routing algorithm, the average packet latency is reduced by 55 % under transpose traffic pattern when compared to the XY routing algorithm. The speedup of 2548x has been observed compared to Booksim software simulator. The proposed framework is 2.54x and 25x times faster compared to CONNECT and DART FPGA based simulators respectively. © 2018 IEEE.Item High-performance NoC simulation acceleration framework employing the xilinx DSP48E1 blocks(Institute of Electrical and Electronics Engineers Inc., 2019) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.An FPGA based Network on Chip (NoC) simulation acceleration framework is presented in this paper. The functionality of the crossbar switch of the NoC router is embedded in the hard multiplexers of the Xilinx DSP48E1 slices. A significant reduction in the soft logic (LUT+FF) utilization of the FPGA implementation of the 6 × 6 Torus topology has been observed by employing the hard multiplexers of the DSP48E1 slices in the proposed work. DSP based crossbar implementation of the 6 × 6 Torus topology consumes 38% fewer LUTs and 45% fewer FFs than the LUT based crossbar implementation. 35% less power consumption has been observed in the DSP based implementation. The proposed work utilizes 76% fewer LUTs compared to the state-of-the-art CONNECT NoC generation tool. Buffered, bi-directional Torus topology with XY routing has been considered in the proposed DSP based implementation compared to the Hoplite-DSP which implements the bufferless, unidirectional Torus topology with deflective routing algorithm. The proposed framework achieves the speed up of 2.02× and 2.9× with respect to the LUT only and the CONNECT NoCs. © 2019 IEEE.Item Design of an adaptive and reliable network on chip router architecture using FPGA(Institute of Electrical and Electronics Engineers Inc., 2019) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.We propose an adaptive, low cost, reliable and high performance router implemented based on a conventional two stage pipeline. The proposed Adaptive routing operates in adaptive mode as soon as the congestion is detected in network. We employ fault tolerant strategies for different components of routers such as input buffer, route compute unit, virtual channel allocation, switch allocation, and crossbar unit. The proposed router architecture differs from existing reliable routers, our implementation maintains the performance of fault tolerance router under massive network workloads by influencing the features of a crossbar, routing algorithm and router pipeline optimization. Our designed router is highly reliable than current fault receptive routers such as Wang[1], Vicis[2], BulletProof[3], RoCo[4] and Poluri[5]. The average latency is reduced by 0.69% and increased by 2.0% compared to fault tolerant and conventional router. © 2019 IEEE.Item High-Performance NoCs Employing the DSP48E1 Blocks of the Xilinx FPGAs(IEEE Computer Society help@computer.org, 2019) Prabhu, P.B.M.; Parane, K.; Talawar, B.The hard multiplexers of the Xilinx DSP48E1 slices have been employed to support the functionality of crossbar switch of the buffered five port Network-on-Chip (NoC) routers. This is possible due to the dynamic mode operation of the DSP48E1 slices per clock cycle based on the multiplexer control signals. As a result of this, a significant reduction in the soft logic (LUT+FF) utilization of the FPGA implementation of the 6× 6 Mesh topology has been observed. DSP based crossbar implementation of the 6× 6 Mesh topology consumes 36% fewer LUTs and 40% fewer FFs than the LUT based crossbar implementation. 38% less power consumption has been observed in the DSP based implementation. The proposed work utilizes 41% fewer LUTs compared to the state-of-the-art CON-NECT NoC generation tool. The latency reductions of 31% and 38% have been achieved by the proposed DSP48E1 based crossbar implementation over the LUT crossbar implementation of 8× 8 Mesh topology under the Uniform and Transpose traffic patterns. Also, the proposed DSP48E1 based implementation achieves the saturation throughput improvements of 1.4× and 1.6× over the LUT based implementation under Uniform and Transpose traffic patterns respectively. © 2019 IEEE.Item Hy-BTree: An efficient Tree based topology for FPGA based NoC implementation(Institute of Electrical and Electronics Engineers Inc., 2021) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.Due to their hierarchical structure, Binary Tree (BTree) topology can be employed in Network-on-Chip (NoC) applications. Because of its lower bisection bandwidth, the performance degradation is observed in communication intensive applications. The Fat tree topology has been proposed to overcome the disadvantages of the BTree topology. But, the complexity of the Fat Tree topology's router becomes more complicated as we move towards the root node of the tree and occupying a huge amount of hardware resources compared to the BTree variant. Instead of going for Fat Tree topology, the number of hops taken by a packet in the BTree topology can be reduced by introducing new links in the network with an increase in the bisection bandwidth. In this work, we propose a variant of BTree topology called Hy-BTree by introducing additional links at the intermediate levels of the network to reduce the number of hops taken for the communication. The proposed design is implemented on the FPGA and compared with the other topologies from state-of-the-art the FPGA based NoC architectures. A reduction in average latency and an improvement in throughput have been observed in Hy-BTree with respect to the BTree network with negligible overhead. © 2021 IEEE.Item Smart Attendance Management System using IoT(Institute of Electrical and Electronics Engineers Inc., 2022) Patil, M.A.; Parane, K.; Sivaprasad, D.D.; Poojara, S.; Lamani, M.R.Taking student attendance is mandatory in an educational organization, and maintaining those attendance plays a vital role. The conventional way of taking student attendance in any institution is time-consuming and challenging, because in the conventional procedure taking attendance/Roll call is performed manually by calling student names as per their roll numbers and marking 'absent(A)' or 'present(P)' on the attendance/logbook accordingly in every class per day. To improve teaching efficiency/teaching time in classrooms by reducing the time required for Roll call's, we have proposed a biometric student attendance system based on IoT. The proposed system records students' attendance using the facial-based biometric system and stores the attendance details on the server through the internet. In this system, the Raspberry pi camera captures the student face images and compares them with the stored images in the database. If the captured image is comparable with the stored image, then the student's attendance is recorded on the remote server as a present(P) in class; otherwise, attendance is recorded as absent (A). The developed system has been tested for sample classes, and the results proved that the system is simple, cost-effective, and portable for managing students' attendance. © 2022 IEEE.Item Hybrid Deep Learning-Based Potato and Tomato Leaf Disease Classification(Springer Science and Business Media Deutschland GmbH, 2024) Patil, M.A.; Manur, M.; Laxuman, C.; Parane, K.; Dodamani, B.M.; Sunkad, G.Predicting potato and tomato leaf disease is vital to global food security and economic stability. Potatoes and tomatoes are among the most important staple crops, providing essential nutrition to millions worldwide. However, many tomato and potato leaf diseases can seriously reduce food productivity and yields. We are proposing a hybrid deep learning model that combines optimized CNN (OCNN) and optimized LSTM (OLSTM). The weight values of LSTM and CNN models are optimized using the modified raindrop optimization (MRDO) algorithm and the modified shark smell optimization (MSSO) algorithm, respectively. The OCNN model is used to extract potato leaf image features and then fed into the OLSTM model, which handles data sequences and captures temporal dependencies from the extracted features. Precision, recall, F1-score, and accuracy metrics are used to analyze the output of the proposed OCNN-OLSTM model. The experimental performance is compared without optimizing the CNN-LSTM model, individual CNN and LSTM models, and existing MobileNet and ResNet50 models. The presented model results are compared with existing available work. We have received an accuracy of 99.25% potato and 99.31% for tomato. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
