Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
7 results
Search Results
Item CSPR: Column only SPARSE matrix representation for performance improvement on GPU architecture(2011) Bayyapu, B.; Raghavendra, P.S.General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under consideration. One such very widely used computation intensive kernel is sparse matrix vector multiplication (SPMV) in sparse matrix based applications. Most of the existing data format representations of sparse matrix are developed with respect to the central processing unit (CPU) or multi cores. This paper gives a new format for sparse matrix representation with respect to graphics processor architecture that can give 2x to 5x performance improvement compared to CSR (compressed row format), 2x to 54x performance improvement with respect to COO (coordinate format) and 3x to 10 x improvement compared to CSR vector format for the class of application that fit for the proposed new format. It also gives 10% to 133% improvements in memory transfer (of only access information of sparse matrix) between CPU and GPU. This paper gives the details of the new format and its requirement with complete experimentation details and results of comparison. © 2011 Springer-Verlag.Item Hybrid heuristic shifting bottleneck procedure for parallel-machine job-shop scheduling using GPU(Institute of Electrical and Electronics Engineers Inc., 2015) Vilasagarapu, S.; Guddeti, G.In this paper, we implement the Parallel-Machine Job Shop Scheduling Problem (JSSP) using the modified shifting bottleneck procedure along with the heuristic Tabu search algorithm. JSSP has several real-time applications such as product manufacturing units, real-world train scheduling problem etc., Since JSSP is an NP hard problem, an optimal solution may not exist hence with the help of heuristic algorithm we try to find the approximate solution. Experimental results demonstrate that the modified shifting bottleneck procedure (SBP) shows better results than the existing SBP. And also with the help of meta-heuristic algorithm, the JSSP for larger instances can be solved easily. Results also demonstrate that GPU based Tabu search is on an average 1.8 times faster than CPU based Tabu search. © 2015 IEEE.Item A modified secure version of the Telegram protocol (MTProto)(Institute of Electrical and Electronics Engineers Inc., 2016) Job, J.; Naresh, V.; Chandrasekaran, K.The advent of mobile phones and the spread of the internet have caused a substantial increase in the utilization of these technologies for personal communication. A wide range of mobile applications exist, most of which use their own proprietary protocol. Reports of snooping attacks have prompted the parent organizations and users to guarantee that the encrypted data sent over a public network is decrypted only by the intended recipient. Smart phone operating systems provide GPS data to these applications so that users can tag photos with this information. As these applications mostly run a daemon or service in the background to automatically receive messages, an unattended switched on location service coupled with a weak protocol leaves the user highly vulnerable of being tracked by eavesdroppers. These applications are known to, by observing their behaviour, upload the user's contact list to the server so as identify those contacts using the same application. These are but just two important data that need to be protected by tough security measures during transit. Any loop hole in security protocols will leave the user vulnerable to attacks, even outside the digital world. Online chat protocols such as the Telegram protocol ensure end-to-end security of data. Although the protocol itself has been explained in much detail by the designers, this protocol is disfavored because of its performance drawbacks and its susceptibility to man-in-the-middle attacks. In this paper, we modify the Telegram protocol in an attempt to make it more efficient and secure. © 2015 IEEE.Item Near Optimal Solution for Traveling Salesman Problem using GPU(Institute of Electrical and Electronics Engineers Inc., 2018) Yelmewad, P.; Talawar, B.Traveling Salesman Problem (TSP) is an NP-complete, a combinatorial optimization problem. Finding an optimal solution is intractable due to its time complexity. Therefore, approximation approaches have great importance which gives a good quality solution in a reasonable time. This paper presents the importance of constructing the initial solution using construction heuristic rather than setting up arbitrarily. Proposed GPU based Parallel Iterative Hill Climbing (PIHC) algorithm solves large TSPLIB instances. We demonstrate the efficiency of PIHC approach with the state-of-the-art approximation based and GPU based TSP solvers. PIHC produces 181× speedup over its sequential counterpart and 251× over the state-of-the-art GPU based TSP solver. Moreover, PIHC receives a better cost quality than the state-of-the-art GPU based TSP solvers which has gap rate in range of 0.72 % - 8.06%. © 2018 IEEE.Item MMAS on GPU for Large TSP Instances(Institute of Electrical and Electronics Engineers Inc., 2019) Yelmewad, P.; Kumar, A.; Talawar, B.This paper presents a GPU-based Ant Colony Optimization (ACO) algorithm to solve the large-size Traveling Salesman Problem (TSP). The aim of this study is to provide an effective and efficient GPU-based parallel strategy for the ACO algorithm to solve larger size TSPLIB instances in a reasonable time, preserving its solution quality. Till the date, all GPU-based parallel implementations of the ACO algorithm is only available for TSPLIB instances up to 2392 cities. We have studied the limiting factors of the existing GPU implementations to solve larger instances and designed the parallel model which solves TSPLIB instances up to 33810 cities. Our parallel approach receives speedup up to 60× over its sequential counterpart. In terms of solution quality, our parallel ACO produces the local optimal solution which has error rates in range of 0.52% - 4.97% compared to the optimal solution. © 2019 IEEE.Item GPU-based Parallel Heuristics for Capacited Vehicle Routing Problem(Institute of Electrical and Electronics Engineers Inc., 2020) Yelmewad, P.; Talawar, B.This paper presents the novel GPU-based parallel strategy for the heuristic algorithms to solve the large-scale Capacited Vehicle Routing Problem (CVRP). A combination of five improvement heuristic approaches has been used to improve the constructed feasible solution. It is noticed that a large amount of CPU time is spent in the solution improvement phase while improving a feasible solution. We aim to discover an independent part of the improvement heuristic approaches and make it run over the GPU platform simultaneously. The proposed parallel version has been tested on large-scale instances of up to 20000 customers. The parallel version offers speedup up to 176.12 × compared to the corresponding sequential version. © 2020 IEEE.Item Hummingbird: Leveraging Heterogeneous System Architecture for deploying dynamic NFV chains(Institute of Electrical and Electronics Engineers Inc., 2022) Chaurasia, A.K.; Raman, B.; Gupta, P.K.; Prabhu, O.; Shashank, P.; Garg, A.Network Function Virtualization has gained traction as a network function deployment alternative due to its flexibility and cost benefits. The telecommunication (telecom) operators and infrastructure providers are looking for high throughput, low latency NFV deployment model to avail the benefits of NFV. Moreover, NFV is one of the core technology for the next-generation communication network such as 5G. Furthermore, telecom operators employ groups of network functions(NFs) that process packets in linear order so that the output of one NF becomes an input for another, thus forming the network function chain (NFC). However, these NFCs should be flexible, as all telecom packets do not necessarily need to be processed by the same set of NFs. It has been earlier shown that GPU increases the throughput of NFV chains. To the best of our knowledge, none of the GPU-based frameworks supports dynamic NFV chains. Furthermore, discrete GPUs are expensive and consume a fair amount of energy. This paper presents the design and evaluation of Hummingbird, a framework to support high throughput, dynamically routed NFV chain on Heterogeneous System Architecture (HSA). Though HSAs are affordable and power-efficient, they lack high throughput GPU-CPU synchronization. Furthermore, current technology does not provide a zero-copy mechanism for network IO between GPU and NIC for HSAs. Hummingbird addressed those challenges. As per our knowledge, this is the first such framework that provides high throughput dynamic NFV chaining, with NFs chained across GPU and CPU and designed in conformance to OpenCL 2.0 standard. Hummingbird achieves 6x throughput per-core and 3.5x throughput per unit of energy consumption compared to state-of-the-art NFV deployment framework G-net, which uses powerful and costly discrete GPU. © 2022 IEEE.
