Heterogeneous data format integration and conversion (HDFIC) using machine learning and IBM-DFDL for IoT

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Nature

Abstract

The future of the Internet of Things (IoT) demands the integration of synergetic applications to cater to societal needs. Examples of IoT-based confederated applications include Ambient Assisted Living with Active Healthy Ageing, CasAware with Smart Energy, Smart Gas Distribution Networks with GIS systems, and more. However, the data heterogeneity hinders integration, as these systems follow different standards, data formats, semantic models, and representations. Further, this leads to data interoperability issues in IoT. The major concern of academia and industry in the smooth integration of heterogeneous applications is interpreting different data formats and representing them in a common schema for further analysis. Existing solutions, such as message payload translation, middleware/cloud format, and Inter-IoT, are complex, time-consuming, and ineffective. Hence, this paper proposes the heterogeneous data format integration and conversion (HDFIC), a machine learning-based system to identify data formats using a Random Forest classifier and integrate them using the Data Format Description Language (DFDL). The content-based data format identification in the proposed HDFIC is trained with the standard features defined in RFC 7111, 8259, and 8996. Subsequently, the data is integrated into a single XML Schema Definition and converted into the required data format using the IBM App Connect Enterprise tool and DFDL. Finally, the performance of HDFIC is evaluated with the synergetic patient body vitals and room ambiance dataset for accuracy, data integration time, and conversion efficiency. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

Description

Keywords

Data integration, Interoperability, Machine learning, Middleware, Semantics, Ambient assisted living, Data format, Description languages, Gas distribution network, GIS systems, Heterogeneity, Heterogeneous data, Machine-learning, Smart energies, Synergetics, Internet of things

Citation

Evolving Systems, 2024, 15, 2, pp. 375-396

Collections

Endorsement

Review

Supplemented By

Referenced By