Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
9 results
Search Results
Item Deep learning architecture for big data analytics in detecting intrusions and malicious URL(Institution of Engineering and Technology, 2019) Harikrishnan, N.B.; Ravi, R.; Padannayil, K.P.; Poornachandran, P.; Annappa, A.; Alazab, M.Security attacks are one of the major threats in today’s world. These attacks exploit the vulnerabilities in a system or online sites for financial gain. By doing so, there arises a huge loss in revenue and reputation for both government and private firms. These attacks are generally carried out through malware interception, intrusions, phishing uniform resource locator (URL). There are techniques like signature-based detection, anomaly detection, state full protocol to detect intrusions, blacklisting for detecting phishing URL. Even though these techniques claim to thwart cyberattacks, they often fail to detect new attacks or variants of existing attacks. The second reason why these techniques fail is the dynamic nature of attacks and lack of annotated data. In such a situation, we need to propose a system which can capture the changing trends of cyberattacks to some extent. For this, we used supervised and unsupervised learning techniques. The growing problem of intrusions and phishing URLs generates a need for a reliable architectural-based solution that can efficiently identify intrusions and phishing URLs. This chapter aims to provide a comprehensive survey of intrusion and phishing URL detection techniques and deep learning. It presents and evaluates a highly effective deep learning architecture to automat intrusion and phishing URL Detection. The proposed method is an artificial intelligence (AI)-based hybrid architecture for an organization which provides supervised and unsupervised-based solutions to tackle intrusions, and phishing URL detection. The prototype model uses various classical machine learning (ML) classifiers and deep learning architectures. The research specifically focuses on detecting and classifying intrusions and phishing URL detection. © The Institution of Engineering and Technology 2020.Item Extracting Emotion Quotient of Viral Information Over Twitter(Springer Science and Business Media Deutschland GmbH, 2022) Kumar, P.; Reji, R.E.; Singh, V.In social media platforms, a viral information or trending term draws attention, as it asserts potential user content towards topic/terms and sentiment flux. In real-time sentiment analysis, this viral information deliver potential insights, as encompass sentiment and co-located ranges of emotions be useful for the analysis and decision support. A traditional sentiment analysis tool generates the level of predefined sentiments over social media content for the defined duration and lacks in the extraction of emotional impact created by the same. In these settings, it is a multifaceted task to estimate precisely the emotional quotient viral information creates. The proposed novel algorithm aims, to (i) extract the sentiment and co-located emotions quotient of viral information and (ii) utilities for comprehensive comparison on co-occurring viral informations, and sentiment analysis over Twitter text data. The generated emotion quotients and micro-sentiment reveals several valuable insight of a viral topic and assists in decision support. A use-case analysis over real-time extracted data asserts significant insights, as generated sentiments and emotional effects reveals co-relations caused by viral/trending information. The algorithm delivers an efficient, robust, and adaptable solution for the sentiment analysis also. © 2022, Springer Nature Switzerland AG.Item An Adaptive Algorithm for Emotion Quotient Extraction of Viral Information Over Twitter Data(Springer Science and Business Media Deutschland GmbH, 2022) Kumar, P.; Reji, R.E.; Singh, V.In social media platforms, a viral information or trending term draws attention, as it asserts the impact of user content towards topic/terms. In real-time sentiment analysis, these viral terms could deliver potential insights for the analysis and decision support. A traditional sentiment analysis tool generates the level of predefined sentiments over social media content for the defined duration and lacks in the extraction of emotional impact created by the same. In these settings, it is a multifaceted task to estimate precisely the emotional quotient viral information creates. A novel algorithm is proposed, to (i) extract the sentiment and emotions quotient of current viral information over twitter, (ii) compare co-occurring trending/viral information, (iii) in-depth analysis of potential Twitter text data. The generated emotion quotients and micro-sentiment reveals several valuable insight of a viral/trending topic and assists in decision support. A use-case analysis over real-time extracted data asserts significant insights, as generated sentiments and emotional effects reveals co-relations caused by viral/trending information. The algorithm delivers an efficient, robust, and adaptable solution for the sentiment analysis also. © 2022, Springer Nature Switzerland AG.Item An Approach for Efficient Graph Mining from Big Data Using Spark(Springer Science and Business Media Deutschland GmbH, 2023) Gupta, R.K.; Shetty D, D.; Chakraborty, S.Huge amount of data is generated and accumulated over the last decade, and therefore, the use of data mining techniques is required to extract usable information from these massive data sets. Gaining important connections between data helps in getting useful insights. Depiction of relationships between the data using graphical approach is observed to be a helpful method. It provides an effective technique for demonstrating the working in a variety of situations, including biological networks, social networks, Web networks, and so on. Clustering techniques used in graph mining can be helpful for accumulating significant information. In this paper, an approach for graph mining from big data in Spark (AGMBS) is proposed on the basis of label propagation. The suggested technique enhances the efficiency of the conventional label propagation algorithm by making it more resilient. In addition to this, AGMBS employs a sparse matrix as its primary data structure, resulting in quicker performance. Thereafter, GraphX is used for managing the processing of the graphical data. The experiments were conducted on two graph data sets from the real world, and it is observed that the suggested AGMBS gives faster results as compared to the best available clustering algorithms. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Cloud Computing Enabled Big Multi-Omics Data Analytics(SAGE Publications Inc., 2021) Koppad, S.; B, A.; Gkoutos, G.V.; Acharjee, A.High-throughput experiments enable researchers to explore complex multifactorial diseases through large-scale analysis of omics data. Challenges for such high-dimensional data sets include storage, analyses, and sharing. Recent innovations in computational technologies and approaches, especially in cloud computing, offer a promising, low-cost, and highly flexible solution in the bioinformatics domain. Cloud computing is rapidly proving increasingly useful in molecular modeling, omics data analytics (eg, RNA sequencing, metabolomics, or proteomics data sets), and for the integration, analysis, and interpretation of phenotypic data. We review the adoption of advanced cloud-based and big data technologies for processing and analyzing omics data and provide insights into state-of-the-art cloud bioinformatics applications. © The Author(s) 2021.Item Bus passenger demand modelling using time-series techniques-big data analytics(Bentham Science Publishers B.V. P.O. Box 294 Bussum 1400 AG, 2019) Cyril, A.; Mulangi, R.H.; George, V.Background: Public transport demand forecasting is the fundamental process of transport planning activity. It plays a pivotal role in the decision making, policy formulations and urban transport planning procedures. In this paper, public bus passenger demand forecasting model is developed using a novel approach. The empirical passenger demand for a bus depot is modelled and forecasted using a data-driven method. The big data generated by Electronic Ticketing Machines (ETM) used for issuing tickets and collecting fares is sourced as the data for demand modelling. This big data is time indexed and hence has the potential for use in time-series applications which were not previously explored. Objectives: This paper studies the application of time-series method for forecasting public bus passenger demand using ETM based time-series data. The time-series approach used is the four Holt-Winters’ modeling methods. Holt-Winters’ additive and multiplicative models with and without damping have been empirically compared in this study using the data from the inter-zonal buses. The data used in the study is a part of the transaction on ticket sales by Kerala State Road Transport Corporation (KSRTC) maintained at the Trivandrum City depot of an Indian state Kerala, for the period between 2010 and 2013. The forecasting performance of four time-series models is compared using Mean Absolute Percentage Error (MAPE) and the model goodness of fit is determined using information criteria. Conclusion: The forecasts indicate that multiplicative models with and without damping, which better account for seasonal variations, outperform the additive models. © 2019 Cyril et al.Item Real-time big data analytics framework with data blending approach for multiple data sources in smart city applications(West University of Timisoara, 2020) Manjunatha, S.; Annappa, A.Advancement in Information Communication Technology (ICT) and the Internet of Things (IoT) has to lead to the continuous generation of a large amount of data. Smart city projects are being implemented in various parts of the world where analysis of public data helps in providing a better quality of life. Data analytics plays a vital role in many such data-driven applications. Real-time analytics for finding valuable insights at the right time using smart city data is crucial in making appropriate decisions for city administration. It is essential to use multiple data sources as input for the analysis to achieve better and more accurate data-driven solutions. It helps in finding more accurate solutions and making appropriate decisions. Public safety is one of the major concerns in any smart city project in which real-time analytics is much useful in the early detection of valuable data patterns. It is crucial to find early predictions of crime-related incidents and generating emergency alerts for making appropriate decisions to provide security to the people and safety of the city infrastructure. This paper discusses the proposed real-time big data analytics framework with data blending approach using multiple data sources for smart city applications. Analytics using multiple data sources for a specific data-driven solution helps in finding more data patterns, which in turn increases the accuracy of analytics results. The data preprocessing phase is a challenging task in data analytics when data being ingested continuously in real-time into the analytics system. The proposed system helps in the preprocessing of real-time data with data blending of multiple data sources used in the analytics. The proposed framework is beneficial when data from multiple sources are ingested in real-time as input data and is also flexible to use any additional data source of interest. The experimental work carried out with the proposed framework using multiple data sources to find the crime-related insights in real-time helps the public safety solutions in the smart city. The experimental outcome shows that there is a significant increase in the number of identified useful data patterns as the number of data sources increases. A real-time based emergency alert system to help the public safety solution is implemented using a machine learning-based classification algorithm with the proposed framework. The experiment is carried out with different classification algorithms, and the results show that Naive Bayes classification performs better in generating emergency alerts. © 2020 SCPE.Item Real-time emergency event detection system for public safety using multi-source data(Science and Engineering Research Support Society ijbsbt@sersc.org PO Box 5014Sandy Bay TAS 7005 Tasmania, 2020) Manjunatha, S.; Annappa, A.Public safety is an essential service offered in smart city projects to provide better safety and security for individuals and city infrastructure. The advancement in the field of Information Technology and the Internet of Things created much scope for using smart applications in the city to enhance the quality of service, leading to a better life in cities. This digitization generates a large amount of data within the city from distinct sources like social media, IoT, sensors, any user-generated content from smart applications. The data generated within the city are analyzed to discover valuable insights for producing better data-driven decisions and predictions, that are more crucial for efficient city administration. Making quick decisions and early predictions of crimes by real-time analysis of data help the smart policing system to provide better services in the city. This paper describes the scope of real-time big data analytics for finding appro-priate predictions and making quick decisions for public safety. A real-time big data analytics framework using multiple data sources is proposed for the smart policing service in the smart city environment. The framework is used to design a real-time emergency events detection system to help city administrators in taking quick actions for the safety of people and city infrastructure. The proposed system achieved an average accuracy of 73% for emergency event classification. © 2020 SERSC.Item Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2020) Jeyaraj, R.; Ananthanarayana, V.S.; Paul, A.Big data overwhelmed industries and research sectors. Reliable decision making is always a challenging task, which requires cost-effective big data processing tools. Hadoop MapReduce is being used to store and process huge volume of data in a distributed environment. However, due to huge capital investment and lack of expertise to set up an on-premise Hadoop cluster, big data users seek cloud-based MapReduce service over the Internet. Mostly, MapReduce on a cluster of virtual machines is offered as a service for a pay-per-use basis. Virtual machines in MapReduce virtual cluster reside in different physical machines and co-locate with other non-MapReduce VMs. This causes to share IO resources such as disk and network bandwidth, leading to congestion as most of the MapReduce jobs are disk and network intensive. Especially, the shuffle phase in MapReduce execution sequence consumes huge network bandwidth in a multi-tenant environment. This results in increased job latency and bandwidth consumption cost. Therefore, it is essential to minimize the amount of intermediate data in the shuffle phase rather than supplying more network bandwidth that results in increased service cost. Considering this objective, we extended multi-level per node combiner for a batch of MapReduce jobs to improve makespan. We observed that makespan is improved up to 32.4% by minimizing the number of intermediate data in shuffle phase when compared to classical schedulers with default combiners. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
