Graduate Exam Abstract - Electrical and Computer Engineering

Shashika Muramudalige

Ph.D. Final
Feb 18, 2022, 9:30 am - 11:30 am
Teams: see email to ece_grad,ece_fac for link
AUTOMATING INVESTIGATIVE PATTERN DETECTION USING MACHINE LEARNING & GRAPH PATTERN MATCHING TECHNIQUES

Abstract: Identification and analysis of latent and emergent behavioral patterns are core tasks in investigative domains such as homeland security, counterterrorism, and crime prevention. Development of behavioral trajectory models associated with radicalization and tracking individuals and groups based on such trajectories are critical for law enforcement investigations, but these are hampered by sheer volume and nature of data that need to be mined and processed. Dynamic and complex behaviors of extremists and extremist groups, missing or incomplete information, and lack of intelligent tools further obstruct counterterrorism efforts. Our research is aimed at developing state-of-the-art computational tools while building on recent advances in machine learning, natural language processing (NLP), and graph databases.

In this work, we address the challenges of investigative pattern detection by developing algorithms, tools, and techniques primarily aimed at behavioral pattern tracking and identification for domestic radicalization. The methods developed are integrated in a framework, Investigative Pattern Detection Framework for Counterterrorism (INSPECT). INSPECT includes components for extracting information using NLP techniques, information networks to store in appropriate databases while enabling investigative graph searches, and data synthesis via generative adversarial techniques to overcome limitations due to incomplete and sparse data. These components enable streamlining investigative pattern detection while accommodating various use cases and datasets. While our outcomes are beneficial for law enforcement and counterterrorism applications to counteract the threat of violent extremism, as the results presented demonstrate, the proposed framework is adaptable to diverse behavioral pattern analysis domains such as consumer analytics, cybersecurity, and behavioral health.

Information on radicalization activity and participant profiles of interest to investigative tasks are mostly found in disparate text sources. We integrate NLP approaches such as named entity recognition (NER), coreference resolution, and multi-label text classification to extract structured information regarding behavioral indicators, temporal details, and other metadata. We further use multiple text pre-processing approaches to improve the accuracy of data extraction. Our training text datasets are intrinsically smaller and label-wise imbalanced, which hinders direct application of NLP techniques for better results. We use a transfer learning-based, pre-trained NLP model by integrating our specific datasets and achieve noteworthy improvement in information extraction.

The extracted information from text sources represents a rich knowledge network of populations with various types of connections that needs to be stored, updated, and repeatedly inspected for emergence of patterns in the long term. Therefore, we utilize graph databases as the foremost storage option while maintaining the reliability and scalability of behavioral data processing. To query suspicious and vulnerable individuals or groups, we implement investigative graph search algorithms as custom stored procedures on top of graph databases while verifying the ability to operate at scale. We use datasets in different contexts to demonstrate the wide-range applicability and the enhanced effectiveness of observing suspicious or latent trends using our investigative graph searches.

Investigative data by nature is incomplete and sparse, and the number of cases that may be used for training investigators or machine learning algorithms is small. This is an inherent concern in investigative and many other contexts where the data collection is tedious, available data is limited and also may be subjected to privacy concerns. Having large datasets is beneficial to social scientists and investigative authorities to enhance their skills, and to achieve more accuracy and reliability. A not so small training data volume is also essential for application of the latest machine learning techniques for improved classification and detection.
In this work, we propose a generative adversarial network (GAN) based approach with novel feature mapping techniques to synthesize additional data from a small and sparse data set while preserving the statistical characteristics of datasets. We also compare our proposed method with two likelihood approaches. i.e., multi-variate Gaussian and regular-vine copulas. We verify the robustness of the proposed technique via a simulation and real-world datasets representing diverse domains.

The proposed GAN-based data generation approach is applicable to other domains as demonstrated with two applications. Initially, we extend our data generation approach by contributing to a computer security application resulting in improved phishing websites detection with synthesized datasets. We merge measured datasets with synthesized samples and re-train models to improve the performance of classification models and mitigate vulnerability against adversarial samples. The second was related to a video traffic classification application in which to the data sets are enhanced while preserving statistical similarity between the actual and synthesized datasets. For the video traffic data generation, we modified our data generation technique to capture the temporal patterns in time series data. In this application, we integrate a Wasserstein GAN (WGAN) by using different snapshots of the same video signal with feature-mapping techniques. A trace splitting algorithm is presented for training data of video traces that exhibit higher data throughput with high bursts at the beginning of the video session compared to the rest of the session. With synthesized data, we obtain 5 - 15% accuracy improvement for classification compared to only having actual traces.

The INSPECT framework is validated primarily by mining detailed forensic biographies of known jihadists, which are extensively used by social/political scientists. Additionally, each component in the framework is extensively validated with a Human-In-The-Loop (HITL) process, which improves the reliability and accuracy of machine learning models, investigative graph algorithms, and other computing tools based on feedback from social scientists. The entire framework is embedded in a modular architecture where the analytical components are implemented independently and adjustable for different requirements and datasets. We verified the proposed framework's reliability, scalability, and generalizability with datasets in different domains. This research also makes a significant contribution to discrete and sparse data generation in diverse application domains with novel generative adversarial data synthesizing techniques.

Adviser: Anura Jayasumana
Co-Adviser: N/A
Non-ECE Member: Haonan Wang, Statistics
Member 3: Kim Ryan, ECE
Addional Members: Ray Indrakshi, ECE

Publications:
C. M. Kattadige, S. R. Muramudalige, G. Jourjon, H. Wang, A. P. Jayasumana, and K. Thilakarathna “VideoTrain++: GAN-Based Adaptive Framework for Synthetic Video Traffic Generation,” Computer Networks, Elsevier, Vol 206, 2022. doi : 10.1016/j.comnet.2022.108785

C. M. Kattadige, S. R. Muramudalige, K. N. Choi, G. Jourjon, H. Wang, A. P. Jayasumana, K. Thilakarathna “VideoTrain: A Generative Adversarial Framework for Synthetic Video Traffic Generation,” in Proc. 22nd IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM 2021), pp. 209-218, Pisa, Italy, Jun. 2021. doi : 10.1109/WoWMoM51794.2021.00034

S. R. Muramudalige, B. W. K. Hung, A. P. Jayasumana, I. Ray, and J. Klausen, “Enhancing investigative pattern detection via inexact matching and graph databases,” IEEE Transactions on Services Computing, 2021. doi : 10.1109/TSC.2021.3073145

S. R. Muramudalige, B. W. K. Hung, A. P. Jayasumana, J. Klausen, E.Moloney, and R. Libretti, “Developing and Detecting Extremist Radicalization Trajectories with Machine Learning Techniques” accepted for panel at the 2020 American Society of Criminology (ASC) Annual Meeting, Washington, D.C., USA, Nov. 2020.

H. Shirazi, S. R. Muramudalige, I. Ray, and A. P. Jayasumana, “Improved Phishing Detection Algorithms using Adversarial Autoencoder Synthesized Data,” in Proc. IEEE 45th Conference on Local Computer Networks (LCN2020), Sydney, Australia, Nov. 2020. [Best paper finalist] doi : 10.1109/LCN48667.2020.9314775

B. W. K. Hung, S. R. Muramudalige, A. P. Jayasumana, J. Klausen, R. Libretti, E. Moloney, and P. Renugopalakrishnan, “Recognizing Radicalization Indicators in Text Documents Using Human-in-the-Loop Information Extraction and Natural Language Processing Techniques,” in Proc. 2019 IEEE International Symposium on Technologies for Homeland Security (HST), Woburn, MA USA, Nov. 2019. doi : 10.1109/HST47167.2019.9032956

S. R. Muramudalige, B.W. K. Hung, A.P. Jayasumana and I. Ray, “Investigative Graph Search using Graph Databases,” in Proc. 2019 First International Conference on Graph Computing (GC 2019), pp. 60-67, Laguna Hills, CA, Sept. 2019. doi : 10.1109/GC46384.2019.00017

S.R. Muramudalige, H.M.N.D. Bandara, “Automated Driver Scheduling for Vehicle Delivery”. In: Kováčiková T., Buzna Ľ., Pourhashem G., Lugano G., Cornet Y., Lugano N. (eds) Intelligent Transport Systems – From Research and Development to the Market Uptake. INTSYS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 222. Springer, Cham, 2018. doi : 10.1007/978-3-319-93710-6_23

M. Amarasinghe, S. R. Muramudalige, S. Kottegoda, A.L. Arachchi, H.M.N.D. Bandara, and A. Azeez, “Cloud-based Driver Monitoring and Vehicle Diagnostic with OBD2 Telematics”, International Journal of Handheld Computing Research (IJHCR), vol. 6(4), pp. 59-75, 2015. doi : 10.4018/IJHCR.2015100104

S. R. Muramudalige, B. W. K. Hung, R. Libretti, J. Klausen, and A. P. Jayasumana, “INSPECT: An Investigative Pattern Detection Framework for Counterterrorism,” To be submitted

S. R. Muramudalige, A. P. Jayasumana, and H. Wang, “A comparative study of complex data object generation with likelihood and deep generative approaches,” To be submitted.

H. Shirazi, S. R. Muramudalige, I. Ray, A. P. Jayasumana, and H. Wang, “Adversarial Autoencoder Data Synthesis for Enhancing Machine Learning-based Phishing Detection Algorithms,” Under review

Program of Study:
ECE-656
MATH-580A3
ECE-795
ECE-658
CS-533
ECE-514
N/A
N/A