Graduate Exam Abstract

Herath Bandara

Ph.D. Final
October 17, 2012, 1:00 pm
Lory Student Center 211E
Enhancing Collaborative Peer-To-Peer Systems Using Resource Aggregation and Caching: A Multi-Attribute Resource and Query Aware Approach

Abstract: Resource-rich computing devices, decreasing communication cost, and Web 2.0 technologies are fundamentally changing the way distributed applications communicate and collaborate. With these changes, we envision Peer-to-Peer (P2P) systems that will allow for the integration and collaboration of peers with diverse capabilities to a virtual community thereby empowering it to engage in greater tasks beyond what can be accomplished by individual peers, yet are beneficial to all the peers. Collaborations involving application-specific resources and dynamic quality of service goals will stress current P2P architectures that are designed for best-effort environments with pair-wise interactions among nodes with similar resources. These systems will share a variety of resources such as processor cycles, storage capacity, network bandwidth, sensors/actuators, services, middleware, scientific algorithms, and data. However, very little is known about the specific characteristics of real-world resources and queries as well as their impact on resource aggregation in these collaborative P2P systems. We bridge this gap by first characterizing the real-world resources and queries, and then by developing resource aggregation solutions that are more suitable for collaborative P2P systems. The contributions of this research are: (1) a detailed analysis of real-world resource, query, and user characteristics and their impact on P2P-based resource discovery solutions, (2) a tool to generate large synthetic traces of multi-attribute resources and range queries, (3) resource and query aware P2P-based multi-attribute resource discovery solution that is both efficient and load balanced, (4) a community-based caching solution that enhances both the communitywide and system-wide lookup performance in large-scale P2P systems, and (5) demonstrate the applicability of NDN (Named Data Networking) for DCAS (Distributed Collaborative Adaptive Sensing) systems by developing a distributed multi-user, multi-application, and multi-sensor data fusion solution based on CASA (Collaborative Adaptive Sensing of the Atmosphere). Proposed solutions and the analysis are applicable to a wide variety of contexts such as DCAS systems, P2P clouds, grid/cloud computing, GENI (Global Environment for Network Innovations), and mobile social networks. Next, each of the five contributions is described briefly.
First, we derived an equation to capture the cost of multi-attribute resource advertising and querying. The nature of parameters in the equation under different systems is determined by analyzing datasets from PlanetLab, SETI@home, EGI grid, and a distributed campus computing facility. These datasets exhibit several noteworthy features that affect the performance. The attributes of both the resources and queries are highly skewed and correlated. Attribute values have different marginal distributions and change at different rates. Queries are less specific where they tend to specify only a small subset of the available attributes and large ranges of attribute values. These properties of resources and queries are then used to qualitatively and quantitatively evaluate the fundamental design choices for P2P-based multi-attribute resource discovery. Design choices are evaluated based on the cost of advertising/querying, load balancing, and routing table size. Compared to uniform queries, real-world queries are relatively easier to resolve using unstructured, superpeer, and single-attribute-dominated-query-based structured P2P solutions. However, they introduce significant load balancing issues to existing designs. Cost of RD in structured P2P systems is effectively O(N) (N is the number of nodes) as most range queries are less specific.
Second, a set of mechanisms is presented to generate realistic synthetic traces of multi-attribute resources (with both static and dynamic attributes) and range queries using the statistical behavior learned from real-world datasets. Such traces are useful in large-scale performance studies of resource discovery solutions, job schedulers, etc., in collaborative P2P systems as well as grid, cloud, and volunteer computing. Random vectors of static attributes are generated using empirical copulas that capture the entire dependence structure of multivariate distribution of attributes. Time series of dynamic attributes are randomly drawn from a library of multivariate time series segments extracted from the datasets. These segments are identified by detecting the structural changes in time series corresponding to a selected attribute. Time series corresponding to rest of the attributes are split at the same breakpoints and randomly drawn together to preserve their contemporaneous correlation. Correlation among static and dynamic attributes is preserved by grouping the time series segments based on their static attributes. Multi-attribute range queries are generated using a probabilistic finite state machine that preserves the popularity of attributes and correlations among attribute values. A tool is developed to automate the synthetic data generation process. It is independent of the dataset hence data from any other platform may be used as the basis for trace statistics.
Third, a resource and query aware P2P-based multi-attribute resource discovery solution is presented that is both efficient and load balanced. The solution consists of five heuristics that can be executed independently and distributedly. The first heuristic tries to maintain a minimum number of nodes in the overlay while pruning nodes that do not significantly contribute to the range-query resolution. Removing nonproductive nodes reduces the cost (e.g., hops and latency) of advertising resources and resolving queries. The second and third heuristics dynamically balance the key and query load distributions by transferring keys to neighbors as well as by adding new neighbors when existing ones are insufficient. The last two heuristics, namely fragmentation and replication, form cliques of nodes that are placed orthogonal to the overlay ring. Such a node placement dynamically balances the highly skewed key and query loads while reducing the query cost. By applying these heuristics in the presented order, a resource discovery solution that better responds to real-world resource and query characteristics is developed. Efficacy of the solution is demonstrated using a simulation-based analysis under a variety of single and multi-attribute resource and query distributions derived from real workloads.
Fourth, we developed a distributed caching solution that exploits P2P communities to improve the communitywide and system-wide lookup performance. The solution consists of a sub-overlay formation scheme and a Local-Knowledge-based Distributed Caching (LKDC) algorithm. Sub-overlays enable communities to forward queries through their members. While queries are forwarded, LKDC algorithm causes members to identify and cache resources of interests to them, resulting in faster resolution of queries for popular resources within each community. Distributed local caching requires global information (e.g., hop count and popularity of contents) that is difficult and costly to obtain. Moreover, the problem is NP-complete when size of contents/resources varies. However, by relaxing the content size constraint (which is acceptable for improving lookup performance), and by means of an analysis of globally optimal behavior and structural properties of the overlay, we developed the LKDC algorithm that not only relies on purely local information but also provides close-to-optimal caching performance. The caching solution automatically adapts to changing popularity and user interests. It works with any skewed distribution of queries in addition to introducing minimal modifications and overhead to the overlay network. For example, simulations based on Chord overlay show a 40% reduction in average path length using only 20 cache entries per node, and individual communities gained a 17-24% improvement compared to system-wide caching.
Fifth, we present a proof of concept solution that demonstrates the applicability of NDN for multi-user, multi-application, and multi-sensor DCAS systems such as CASA. In this example, a network of weather radars name data based on their geographic location and weather feature (e.g., reflectivity of clouds or wind velocity) independent of the radar(s) that generated them. This enables end users to specify an area of interest for a particular weather feature while being oblivious to the placement of radars and associated computing facilities. Conversely, the DCAS system can use its knowledge about the underlying system to decide the best radar scanning and data processing strategies. Such sensor-independent names also enhance resilience, enable processing data close to the source, and benefit from NDN features such as in-network caching and duplicate query suppression consequently reducing the bandwidth requirements of the DCAS system. We also present two mechanisms to support sensor-specific and event-specific names that are also important in DCAS systems. The solution is implemented as an overlaid NDN enabling the benefits of both the NDN and overlay networks. Simulation-based analysis using reflectivity data from an actual weather event showed 87% reduction in average bandwidth consumption of radars and 95% reduction in query resolution latency.

Adviser: Prof. Anura P. Jayasumana
Co-Adviser: N/A
Non-ECE Member: Dr. Daniel F. Massey, Computer Science
Member 3: Prof. V. Chandrasekar, Electrical and Computer Engineering
Addional Members: Indrajit Ray, Computer Science

Publications:
1. H. M. N. D. Bandara and A. P. Jayasumana, “Community-based caching for enhanced lookup performance in P2P systems,” IEEE Transactions on Parallel and Distributed Systems, 2012, doi: 10.1109/TPDS.2012.270, In press.
2. H. M. N. D. Bandara, A. P. Jayasumana, and M. Zink, “Radar networking in collaborative adaptive sensing of atmosphere: State of the art and research challenges,” In Proc. IEEE GLOBECOM workshop on Radar and Sonar Networks, Dec. 2012, In press.
3. H. M. N. D. Bandara and A. P. Jayasumana, “Collaborative applications over peer-to-peer systems – Challenges and solutions,” Peer-to-Peer Networking and Applications, Springer, 2012, In press.
4. H. M. N. D. Bandara and A. P. Jayasumana, “Resource and query aware, peer-to-peer-based multi-attribute resource discovery,” In Proc. 37th IEEE Conf. on Local Computer Networks (LCN '12), Oct. 2012, In press.
5. P. Lee, A. P. Jayasumana, H. M. N. D. Bandara, S. Lim, and V. Chandrasekar, “A peer-to-peer collaboration framework for multi-sensor data fusion,” Journal of Network and Computer Applications, vol. 35, no. 2, May 2012, pp. 1052-1066.
6. H. M. N. D. Bandara and A. P. Jayasumana, “Evaluation of P2P resource discovery architectures using real-life multi-attribute resource and query characteristics,” In Proc. IEEE Consumer Communications and Networking Conf. (CCNC '12), Jan. 2012.
7. H. M. N. D. Bandara and A. P. Jayasumana, “On characteristics and modeling of P2P resources with correlated static and dynamic attributes,” In Proc. IEEE GLOBECOM '11, Dec. 2011.
8. H. M. N. D. Bandara and A. P. Jayasumana, “Characteristics of multi-attribute resources/queries and implications on P2P resource discovery,” In Proc. 9th ACS/IEEE Int. Conf. On Computer Systems And Applications (AICCSA 2011), Dec. 2011.
9. H. M. N. D. Bandara, A. P. Jayasumana, and T. H. Illangasekare, “A top-down clustering and cluster-tree based routing scheme for wireless sensor networks,” Int. Journal of Distributed Sensor Networks, vol. 2011.
10. H. M. N. D. Bandara and A. P. Jayasumana, ”Exploiting communities for enhancing lookup performance in structured P2P systems,” In Proc. IEEE Int. Conf. on Communications (ICC '11), June 2011.
11. H. M. N. D. Bandara, A. P. Jayasumana, and T. H. Illangasekare, “Cluster tree based self organization of virtual sensor networks,” In Proc. IEEE GLOBECOM workshop on Wireless Mesh and Sensor Networks: Paving the Way to the Future or yet Another...?, Nov. 2008.
12. H. M. N. D. Bandara, A. P. Jayasumana, and I. Ray, “Key pre-distribution based secure backbone design for wireless sensor networks,” In Proc. 3rd IEEE Int. Workshop on Practical Issues in Building Sensor Network Applications (SenseApp), Oct. 2008, pp. 786-793.
13. H. M. N. D. Bandara and A. P. Jayasumana, “An enhanced top-down cluster and cluster tree formation algorithm for wireless sensor networks,” 2nd Int. Conf. on Industrial and Information Systems (ICIIS 2007), Aug. 2007, pp. 565-570.
14. H. M. N. D. Bandara, S. M. R. P. De Silva, and P. W. H. D. Weerasinghe, “The universal biometric system,” 6th Int. Information Technology Conference (IITC 2004), Nov. 2004, ISBN: 955-8974-01-3.
15. H. M. N. D. Bandara and A. P. Jayasumana, "Distributed multi-sensor data fusion over named data networks," In review. 16. P. Lee, A. P. Jayasumana , H. M. N. D. Bandara, S. Doshi, and V. Chandrasekar, "Analysis of multi-sensor, data-fusion latency in Internet-based distributed collaborative adaptive systems," In review.
17. H. M. N. D. Bandara and A. P. Jayasumana, “Multi-attribute resource and query characteristics of real-world systems and implications on peer-to-peer-based resource discovery,” In preparation.
18. H. M. N. D. Bandara and A. P. Jayasumana, “On characteristics and generation of multi-attribute resources and queries with correlated attributes,” In preparation.

Program of Study:
ECE 514
ECE 521
ECE 795
CS557
STAT 521
STAT 560
GRAD 792
BUS 620

Colorado State University

Electrical and Computer Engineering

Graduate Exam Abstract

Herath Bandara