Graduate Exam Abstract

Vidarshana W. Bandara

Ph.D. Preliminary
March 23, 2012, 09:00am
Engineering B3
Characterizing and Detecting Baselines and Anomalies of Network Data

Abstract: A framework for network data decomposition, modeling, analysis and synthesis is proposed. The framework views network data as composed of three components: baseline, anomalies and a residual. Baseline component represents the behavior trends under nominal operational conditions, anomalies are deviations of interest, and the residual component accommodates the traffic variations not captured by the baseline and anomalies such as noise. Each of these components represents a key aspect of network behaviors and such a decomposition is of interest to a wide range network applications including analysis, design, control and forensics as well as to many applications beyond the network domain. However, lack of a formal universally agreed definition for any of the components is a major challenge. Nevertheless a host of research work is available for extracting, analyzing and regenerating network traffic components. They employ a wide range of mathematical tools that provides decompositions and alternative representations for data. But a decomposition that readily provides a separation of a baseline, anomalies and residual is lacking. A single decomposition tool extracting all the components would reveal their individual features as well as relationships among these features. <br> The goal of this work is to develop, innovate and improvise mathematical tools for network data decompositions and demonstrate their utility in applications such as traffic modeling, compressive representation of network traffic, anomaly detection, and network monitoring. The proposed analysis framework builds comprehensive characterization of each component. Despite the lack of a formal definition for data components, features of baselines and anomalies can be captured by mathematical properties. For example common features of a dataset containing multiple traffic traces from different network links, form a low rank component of the dataset. Such characterizations, for example, facilitate re- synthesis of realistic traffic. Based on the nature of these characteristics, representations and models for traffic components may be developed. The obtained characterizations further enable procedures for extraction and efficient storage of traffic traces. They also facilitate extensions such as novel monitoring systems. By observing relationships among all traffic elements a complete analysis and synthesis framework is realized. Such a complete framework will provide traffic characterizations demanded by applications, along with relationships between components, beyond what is currently feasible. <br> Host of mathematical tools are available for decomposing based on mathematically identifiable features. For example, Fourier analysis transforms data into its frequency spectrum, wavelet decomposition represents data in a set of scaled and shifted version of a mother wavelet, Principal Component Analysis (PCA) separates data into orthogonal components of decaying variance, and Robust PCA (RPCA) decomposes data into a low rank component and a sparse component. Though these decompositions do not necessarily provide the baseline, anomalies and the residual, such mathematical tools can be tailored to be suitable extractors for components with a proper characterization. Results presented show how different characteristics are related to and extracted by different mathematical tools. Frequency components with large amplitudes and low rank component of RPCA capture prominent common behaviors in dataset, which are typically the characteristics of a baseline. Scores based on principal components, thresholded sparse component, and large deviations in certain frequency bands capture sudden significant changes in data. These are useful in detecting anomalies. Sparsity-inducing norms permit imposing constraints based on patterns. A set of tools are proposed catering to datasets with known anomaly structures based on sparsity-inducing norms. <br> Real datasets and measurements from sources such as Internet2, Planet-lab, DARPA, CAIDA, and Lobster-sensors are used to evaluate the developed techniques. For example, baseline and anomaly behaviors of Internet2 are modeled and characterized. Methods based on PCA and RPCA are used to filter baseline components. A subspace representation of baseline in the vector space representation of data is under investigation. Here, the baseline component is expected to form a concentration ellipsoid in a lower dimensional subspace. Projection methods for baseline extraction, and distance/angle based techniques for anomaly detection are proposed. A modularized approach for anomaly behavioral modeling is also proposed. Here, Fourier analysis is used to extract anomalies, and wavelet coefficients are used in summarizing network activities. Based on summaries such as correlations and wavelet coefficients a new breed of monitoring systems that is informed of network-wide anomaly activities rather than limited to local measurements is developed. <br> Component-wise data analysis provides insight into various aspects of data, and provides a formal path for modeling, compression and storage. A complete suite of methods, tools and characterizations are brought together in this work to effectively investigate, analyze, characterize and model network traffic data. Each constituent component is comprehensively described and related to other components. Decomposing data into components of interest and re-synthesis of realistic traces with specific properties is made possible thru this work.

Adviser: Prof. Anura P. Jayasumana
Co-Adviser: Prof. Ali Pezeshki
Non-ECE Member: Prof. Indrajit Ray, Computer Science
Member 3: Prof. J. Rockey Luo, Electrical and Computer Engineering
Addional Members: N/A

Vidarshana Bandara, Ali Pezeshki, Anura, P. Jayasumana, Modeling spatial and temporal behavior of Internet traffic anomalies, IEEE 35th Conference on Local Computer Networks (LCN), pp.384-391, 2010.
V. Bandara, and A. P. Jayasumana, "Extracting Baseline Patterns in Internet Traffic Using Robust Principal Components," Proc. 36th Annual IEEE Conference on Local Computer Networks (LCN), Bonn, Germany, Oct. 2011.
V. Bandara, A. P. Jayasumana, A. Pezeshki, T. H. Illangasekare and K. Barnhardt, "Subsurface Plume Tracking Using Sparse Wireless Sensor Networks," Electronic Journal of Structural Engineering (EJSE) - Special Issue: Wireless Sensor Networks and Practical Applications, Dec. 2010
Thoshitha Gamage, Jayantha Herath, Arjun Roy, and Vidarshana Bandara "Performance Comparison of Recent Random Number Generators," in Journal of Global Information Technology, 2009 3(1 and 2), pp. 1-14.
V. W. Bandara, A. C. Vidanapathirana, and S. G. Abeyratne, "Contouring with DC Motors - a Practical Experience," Proc. First International Conference on Industrial and Information Systems, Aug. 2006, pp474-479.
Asiri Nanayakkara and Vidarshana Bandara, "Asymptotic Behavior of the Eigenenergies of Anharmonic Oscillators V(x)=x2N + bx2," Canadian Journal of Physics/Revue Canadienne de Physique, vol. 80, issue 9, Sep 2002, p959.
Characterizing spatial - temporal features of Internet traffic anomalies (in preparation)
Generalized Bounds for Compressive Sensing Based Recovery (in preparation)

Program of Study: