Gunjan Mahindre

Ph.D. FinalSep 29, 2021, 8:00 am - 10:00 am

ECE Conference Room

Efficient Representation, Measurement, and Recovery of Spatial and Social Networks

Abstract: Massive penetration of networks such as social and communication networks and the growth of networked systems are leading to a colossal number of connections and data underneath. Techniques are needed for analysis of network measurements to extract network features and patterns of interest. Network analytics can be used, for example, to predict how a network evolves for customer interaction analysis to provide a better user experience, and to make effective business decisions based on client engagements. Ideally, one should have the complete set of measurements associated with a network so that the analysis would lead to results that are applicable to real-world networks. However, computational, communication, and storage limitations do not allow that for large-scale networks of interest. Thus, a network needs to be sampled for its connectivity or substructures to facilitate analysis. However, extensive sampling can be computationally expensive to carry out for analysis or even unviable due to access restrictions, e.g., due to private or censored nodes. Also, currently practiced sampling techniques such as random-walk are susceptible to slow-mixing problem. Sparse or locally targeted sampling techniques do not allow the recovery of the complete information of the original network while missing information may lead to altered network characteristics and biased research results.

The limitations identified above give rise to the need for techniques to correctly predict missing connectivity data as well as to represent and store networks in coherent yet compressed ways that preserve the original network characteristics for unbiased network experiments. As certain network features are derived from the complete network measurements, missing data or local samples affect the accuracy of such estimates. Efficient prediction and graph representation techniques could facilitate network analytics from smaller sets of measurements with improved accuracy, low storage, ease of access as well as manageable computational costs.

Our goal is to make the most use of the available network measurements and extract information about network characteristics, connectivity, and constraints to make informed predictions about the missing measurements. Accordingly, this research is aimed at deriving and demonstrating novel techniques to efficiently measure and represent networks, and for accurate recovery of network topology from partially measured network structures. The techniques need to be scalable in their computation, and graceful in their degradation with limited measurements.

We tackle the network analysis problem from two sides: a) efficiently representing complete graph data in compact formats and b) predicting missing network measurements using partially observed distances. Specifically, we introduce the novel concepts of reconstruction set and link dimension of graphs for lossless compression and reconstruction. This helps to represent graphs with a small set of path measurements rather than the complete adjacency or distance matrices. We also develop models to learn the network topology via path measurements and explore distance vector properties of the network. This network information is then leveraged to make informed predictions about network connectivity. This helps to retain original network characteristics even while completing graphs from partial data, in terms of links and distances, using deterministic methods such as low-rank matrix completion as well as non-deterministic prediction methods such as neural networks.

We present link dimension, a novel graph dimension based on a subset of nodes for defining distance vectors that completely capture the graph topology. Several definitions for dimensions of a graph exist, such as metric dimension and edge dimension, to identify the least integer ‘n’ such that there exists a representation of the graph in n dimensions. Link dimension is however the only dimension that captures the complete topology of the graph and allows for its exact reconstruction. We also propose a greedy algorithm to find such defining nodes in the network by identifying the nodes that capture most of the information about nodes as well as the links. Several interesting properties of link dimension are also derived along with the bounds for link dimension for several graph families.

Ability to recover or estimate the network topology from a small fraction of pairwise distances among nodes or even a few random distance measurements is essential when the network measurements are expensive or infeasible. We use low-rank matrix completion to recover topology of spatial networks from a very small fraction of distance measurements. We present results for networks of different shapes, with a range of 20% to 80% of observed measurements. This technique is especially useful in sensor networks which are constrained with respect to their storage and computational capabilities, each node has access to only partial information about the network. It is helpful to know the connectivity, boundaries, and the overall topology of the sensor networks especially in applications such as routing, segmentation, and anchor selection.

Not all networks are embedded in 2D or 3D physical spaces. Graphs such as friendship networks, product graphs, and software module connectivity are non-spatial. Such graphs can be measured in terms of inter-node distances, i.e., connected nodes are said to be at one hop distance. We leverage triangle inequality as applied to distances on a graph to compute bounds on the missing distance entries and perform bounded matrix completion on directed as well as undirected social networks. The proposed prediction techniques are evaluated for real-world social networks (such as Facebook, Email, Collaboration networks, Twitter). The low-rank matrix completion based prediction technique is evaluated for 20% to 80% missing distances in a given network. Results for low-rank matrix completion show that even at 40% of missing measurements, the network distances can be accurately predicted while preserving the original network characteristics.

Many real-world networks are known to exhibit scale-free phenomenon, where the degree distribution follows the power law, at least asymptotically. In order to learn this complex relationship between node distances, we turn to neural networks as a foundation of a new prediction model. However, neural networks need complete data to train on which is not available when only a fraction of distances is measured in a network. We use the concepts of domain adaptation to employ a novel technique of using synthetic networks to pre-train the autoencoder. We make use of the scale free phenomenon observed in real-world networks and generate artificial networks that also embody this power law in their degree distribution. Training on these preferentially attached synthetic networks helps in learning about the scale-free networks, even if the training data was not derived from the real-world networks. This helps to make accurate predictions on real-world networks when only ultra-sparse measurements are observed. This aids in two ways: a) we can generate ample amounts of training data and b) we can make sure that the training data is similar in characteristics with the real-world network we want to eventually predict on. This helps to improve the prediction accuracy of our non-deterministic approach over the deterministic one when the fraction of observed measurements decreases. The neural network-based model achieves efficient prediction performance with graceful degradation over ultra-sparsely sampled networks. While Low-rank Matrix Completion predicts missing network distances within 15% of error when 20% of distances are missing, neural network-based model gives an estimation error of within 20% when 90% of entries are missing.

In summary, the lossless graph representation can aid faster processing, compact storage, and novel solutions for networking algorithms. Conversely, the ability to predict missing connectivity information from a relatively small set of distance measurements enables inference of hidden connectivity information and provides a foundation for developing novel network mining algorithms.

The limitations identified above give rise to the need for techniques to correctly predict missing connectivity data as well as to represent and store networks in coherent yet compressed ways that preserve the original network characteristics for unbiased network experiments. As certain network features are derived from the complete network measurements, missing data or local samples affect the accuracy of such estimates. Efficient prediction and graph representation techniques could facilitate network analytics from smaller sets of measurements with improved accuracy, low storage, ease of access as well as manageable computational costs.

Our goal is to make the most use of the available network measurements and extract information about network characteristics, connectivity, and constraints to make informed predictions about the missing measurements. Accordingly, this research is aimed at deriving and demonstrating novel techniques to efficiently measure and represent networks, and for accurate recovery of network topology from partially measured network structures. The techniques need to be scalable in their computation, and graceful in their degradation with limited measurements.

We tackle the network analysis problem from two sides: a) efficiently representing complete graph data in compact formats and b) predicting missing network measurements using partially observed distances. Specifically, we introduce the novel concepts of reconstruction set and link dimension of graphs for lossless compression and reconstruction. This helps to represent graphs with a small set of path measurements rather than the complete adjacency or distance matrices. We also develop models to learn the network topology via path measurements and explore distance vector properties of the network. This network information is then leveraged to make informed predictions about network connectivity. This helps to retain original network characteristics even while completing graphs from partial data, in terms of links and distances, using deterministic methods such as low-rank matrix completion as well as non-deterministic prediction methods such as neural networks.

We present link dimension, a novel graph dimension based on a subset of nodes for defining distance vectors that completely capture the graph topology. Several definitions for dimensions of a graph exist, such as metric dimension and edge dimension, to identify the least integer ‘n’ such that there exists a representation of the graph in n dimensions. Link dimension is however the only dimension that captures the complete topology of the graph and allows for its exact reconstruction. We also propose a greedy algorithm to find such defining nodes in the network by identifying the nodes that capture most of the information about nodes as well as the links. Several interesting properties of link dimension are also derived along with the bounds for link dimension for several graph families.

Ability to recover or estimate the network topology from a small fraction of pairwise distances among nodes or even a few random distance measurements is essential when the network measurements are expensive or infeasible. We use low-rank matrix completion to recover topology of spatial networks from a very small fraction of distance measurements. We present results for networks of different shapes, with a range of 20% to 80% of observed measurements. This technique is especially useful in sensor networks which are constrained with respect to their storage and computational capabilities, each node has access to only partial information about the network. It is helpful to know the connectivity, boundaries, and the overall topology of the sensor networks especially in applications such as routing, segmentation, and anchor selection.

Not all networks are embedded in 2D or 3D physical spaces. Graphs such as friendship networks, product graphs, and software module connectivity are non-spatial. Such graphs can be measured in terms of inter-node distances, i.e., connected nodes are said to be at one hop distance. We leverage triangle inequality as applied to distances on a graph to compute bounds on the missing distance entries and perform bounded matrix completion on directed as well as undirected social networks. The proposed prediction techniques are evaluated for real-world social networks (such as Facebook, Email, Collaboration networks, Twitter). The low-rank matrix completion based prediction technique is evaluated for 20% to 80% missing distances in a given network. Results for low-rank matrix completion show that even at 40% of missing measurements, the network distances can be accurately predicted while preserving the original network characteristics.

Many real-world networks are known to exhibit scale-free phenomenon, where the degree distribution follows the power law, at least asymptotically. In order to learn this complex relationship between node distances, we turn to neural networks as a foundation of a new prediction model. However, neural networks need complete data to train on which is not available when only a fraction of distances is measured in a network. We use the concepts of domain adaptation to employ a novel technique of using synthetic networks to pre-train the autoencoder. We make use of the scale free phenomenon observed in real-world networks and generate artificial networks that also embody this power law in their degree distribution. Training on these preferentially attached synthetic networks helps in learning about the scale-free networks, even if the training data was not derived from the real-world networks. This helps to make accurate predictions on real-world networks when only ultra-sparse measurements are observed. This aids in two ways: a) we can generate ample amounts of training data and b) we can make sure that the training data is similar in characteristics with the real-world network we want to eventually predict on. This helps to improve the prediction accuracy of our non-deterministic approach over the deterministic one when the fraction of observed measurements decreases. The neural network-based model achieves efficient prediction performance with graceful degradation over ultra-sparsely sampled networks. While Low-rank Matrix Completion predicts missing network distances within 15% of error when 20% of distances are missing, neural network-based model gives an estimation error of within 20% when 90% of entries are missing.

In summary, the lossless graph representation can aid faster processing, compact storage, and novel solutions for networking algorithms. Conversely, the ability to predict missing connectivity information from a relatively small set of distance measurements enables inference of hidden connectivity information and provides a foundation for developing novel network mining algorithms.

Adviser: Dr. Anura Jayasumana

Co-Adviser: n/a

Non-ECE Member: Michael Kirby, Mathematics

Member 3: Randy Paffenroth

Addional Members: Anthony Maciejewski

Co-Adviser: n/a

Non-ECE Member: Michael Kirby, Mathematics

Member 3: Randy Paffenroth

Addional Members: Anthony Maciejewski

Publications:

• G. Mahindre, R. Karkare, R. Paffenroth, & A. Jayasumana, "Inference in Social Networks from Ultra-Sparse Distance Measurements via Pretrained Hadamard Autoencoders," 2020 IEEE 45th Conference on Local Computer Networks (LCN), Australia, 2020.

• G. S. Mahindre & A. P. Jayasumana, "Link Dimension for Complete and Unique Reconstruction of Graphs?", Journal of Discrete Applied Mathematics, submitted July 2020, revised June 2021 (in review).

• A. P. Jayasumana, R. Paffenroth, G. S. Mahindre, S. Ramasamy, & K. Gajamannage, "Network topology mapping from partial virtual coordinates and graph geodesics," IEEE/ACM Transactions on Networking, 2019.

• G. S. Mahindre, A. P. Jayasumana, K. Gajamannage & R. Paffenroth, "On Sampling and Recovery of Topology of Directed Social Networks: A Low-Rank Matrix Completion Based Approach," IEEE 44th Conference on Local Computer Networks (LCN), Germany, 2019.

• G. S. Mahindre, & A. P. Jayasumana. "Efficient Representation, Measurement, and Recovery of Large-scale Networks." Ninth International Green and Sustainable Computing Conference (IGSC). IEEE, 2018.

• G. S. Mahindre & A. P. Jayasumana, "Post failure recovery of virtual coordinates in wireless sensor networks," 7th International Conference on Information and Automation for Sustainability, Colombo, 2014.

• R. Karkare, R. Paffenroth, & G. Mahindre, "Blind Image Denoising and Inpainting Using Robust Hadamard Autoencoders." arXiv preprint arXiv:2101.10876 (2021).

• G. S. Mahindre, R. Karkare, R. Paffenroth, & A. P. Jayasumana, "Optimal Pre-training of Autoencoders to Predict Distances in Social Networks. " | to be submitted, 2021

• G. Mahindre, R. Karkare, R. Paffenroth, & A. Jayasumana, "Inference in Social Networks from Ultra-Sparse Distance Measurements via Pretrained Hadamard Autoencoders," 2020 IEEE 45th Conference on Local Computer Networks (LCN), Australia, 2020.

• G. S. Mahindre & A. P. Jayasumana, "Link Dimension for Complete and Unique Reconstruction of Graphs?", Journal of Discrete Applied Mathematics, submitted July 2020, revised June 2021 (in review).

• A. P. Jayasumana, R. Paffenroth, G. S. Mahindre, S. Ramasamy, & K. Gajamannage, "Network topology mapping from partial virtual coordinates and graph geodesics," IEEE/ACM Transactions on Networking, 2019.

• G. S. Mahindre, A. P. Jayasumana, K. Gajamannage & R. Paffenroth, "On Sampling and Recovery of Topology of Directed Social Networks: A Low-Rank Matrix Completion Based Approach," IEEE 44th Conference on Local Computer Networks (LCN), Germany, 2019.

• G. S. Mahindre, & A. P. Jayasumana. "Efficient Representation, Measurement, and Recovery of Large-scale Networks." Ninth International Green and Sustainable Computing Conference (IGSC). IEEE, 2018.

• G. S. Mahindre & A. P. Jayasumana, "Post failure recovery of virtual coordinates in wireless sensor networks," 7th International Conference on Information and Automation for Sustainability, Colombo, 2014.

• R. Karkare, R. Paffenroth, & G. Mahindre, "Blind Image Denoising and Inpainting Using Robust Hadamard Autoencoders." arXiv preprint arXiv:2101.10876 (2021).

• G. S. Mahindre, R. Karkare, R. Paffenroth, & A. P. Jayasumana, "Optimal Pre-training of Autoencoders to Predict Distances in Social Networks. " | to be submitted, 2021

Program of Study:

ECE516

ECE421

ECE658

ECE531

STAT511

MATH532

ECE666

ECE656

ECE516

ECE421

ECE658

ECE531

STAT511

MATH532

ECE666

ECE656