An Approach for Handling Concept Drift and Model Simpli cation in Log-Based Process Analysis
Date
2018
Authors
Vasantha Kumar, Manoj Kumar Muttyal
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Nowadays process-aware information systems supporting operational processes are in
the mainstream. Examples are work- ow management systems, enterprise application
integration systems, enterprise resource planning systems, web services, etc. These
systems are recording detailed information about the history of processes execution
in the form of events. Events may range from the withdrawal of cash from an ATM,
a doctor adjusting an X-ray machine, a citizen applying for a driver license, the
submission of a tax declaration, and the receipt of an e-ticket number by a traveler.
The challenge is to extract knowledge out of event data for improving the original
processes in a meaningful way.
A young research discipline named process mining o ers a spectrum of techniques
for analyzing event data generated in information systems. These methods can be seen
as the amalgamation of computational intelligence and data mining on the one hand,
and process modeling and analysis on the other hand. It o ers a variety of techniques
for discovering, monitoring and improving processes in numerous application domains.
Through this research work, most signi cant issues present in process mining are
identi ed and the practical solutions are proposed. The techniques addressing these
issues are classi ed into following three broad categories.
First category of techniques concentrate on addressing non-stationary learning
problem known as concept drift. Concept drift is a phenomenon when process changes
dynamically during the period of execution and/or analysis. Due to this, state-ofthe-art process mining techniques generate inconsistent and obsolete analysis results.
Therefore, it is required to design and implement the methods which can e ciently
address concept drift. We have proposed a Multiple Trace Alignment method for
detecting and localizing concept drift in control- ow perspective of the operational
process. The proposed method has been tested on real-life event log and compared
iwith the existing methods for handling concept drift.
Second category of techniques concentrate on developing a notation named trace
logo for visualizing control- ow perspective. Traditional control- ow discovery algorithms in process mining can generate process model consisting of activities and
transitions and ignore all other information. But, the trace logo overcomes the drawbacks of traditional approaches, and it is capable of visualizing activities, transitions,
the consensus of the traces, order of prevalence between activity, relative occurrences
of every activity, information scores, set of conserved and shared activities/sequences
in a single compact graphic.
Third category of techniques are centered on path discovery and complexity reduction in structured and unstructured processes. If the process is Lasagna (structured),
feature set capturing the control- ow properties are extracted and used. If the process
is of Spaghetti (unstructured), it is reduced to Lasagna and features capturing the
control- ow properties are obtained. Feature sets are systematically analyzed to nd
the details like frequent, infrequent, possible, and impossible paths of executions in
the process.
All the proposed methods in this thesis are evaluated on real-life event log taken
from standard repository and results are presented in subsequent chapters. Through
this research work, a most sincere and prompt e ort has been made to leverage the existing process mining practices by addressing the diverse categories of most signi cant
issues.
Description
Keywords
Department of Computer Science & Engineering