Graduate Exam Abstract

Chepchumba Limo

M.S. Final
May 06, 2015, 9:00 am - 11:00 am
LSC 304
Decentralized and Dynamic Community Formation in P2P Networks and Performance of Community Based Caching

Abstract: Distributed Hash Tables (DHT) are commonly used in large Peer-to-Peer networks to increase the efficiently of resolving queries. Minimizing the resource discovery time inP2P networks is highly desirable to improve system-wide performance. Distributed caching is an approach for reducing the look-up time. File sharing P2P networks have shown that there exists nodes/users who share similar interests based on semantics, geography, etc., and a group of nodes that share similar interests are said to form a community. A Community Based Caching (CBC) algorithm where nodes make caching decisions based on personal interests is investigated. A major contribution of CBC is that it alleviates the issue of nodes being limited to caching resources that are most popular in the entire network. Instead, caching decisions are primarily based on a node's community affiliations and interests. Community discovery algorithms that currently exists either need a centralized source to aid in community discovery or require additional messaging and complicated computations to determine whether to join a group or not. In many cases, nodes are also limited to being members of only one community at a time. A dynamic and decentralized community discovery algorithm, Dynamic Group Discovery (DGD), is proposed. DGD also allows nodes to be members of multiple communities at the same time. DGD's behavior and performance is then evaluated in conjunction with the Community Based Caching algorithm. To aid in group discovery during run time (i.e., dynamically), DGD uses special keys with embedded group identification information. Oversim, a flexible overly network simulation framework is used to evaluate the proposed DGD algorithm. Performance of DGD is compared to Chord and Static Group Allocation (SGA), in which group identification is done only once. Performance is evaluated for different network sizes, community sizes, and asymmetry among communities. Performance results are presented and analyzed when queries are resolved using cache data versus when queries are resolved using non-cache data. The analysis shows that DGD generally improves lookup performance when cache data is used to resolved queries. However, when non-cached data is used, DGD occasionally performs slightly worse than Chord and SGA. For example, in a network with 10,000 nodes, asymmetrical communities and no churn, DGD outperforms Chord by approximately half a hop and 0.1 seconds in latency. When churn was introduced to the same network, DGD performance drops by approximately one hop and 0.15 seconds in latency. The results also show that approximately 90% of the queries are resolved using non-cached data and therefore, even though DGD is guaranteed reduced lookup time when asymmetrical communities are present and cache records are to used to resolve queries, it is often not enough to significantly improve overall system performance. The results however confirm that caching resources based on personal interests really does reduced lookup performance when resolving queries using cache records.
which loosely translates to there being few popular queries and many less popular queries. Therefore,
already existing caching schemes would only favor a subset of nodes in the network.
File sharing P2P networks have shown that there exists nodes/users who share similar interests based
on semantics, geography etc., and a group of nodes that share similar interests are said to form a
community. A Community Based Caching (CBC) algorithm where nodes make caching decisions based
on personal interests is investigated. One of CBCâ€™s major contributions is that it alleviates the issue of
nodeâ€™s being limited to caching resources that are popular relative to the entire network. Instead,
caching decisions are primarily based on a nodeâ€™s community affiliations and interests.
Community discovery algorithms that currently exists either need a centralized source(s) to aid in
community discovery or require additional messaging and complicated computations to determine
whether to join a group or not. In many cases, nodes are also limited to being members of only one
community at a time. A dynamic and decentralized community discovery algorithm, Dynamic Group
Discovery (DGD), is proposed. In addition, DGD also allows nodes to be members of multiple
communities at the same time. DGDâ€™s behavior is then evaluated in conjunction with the Community
Based Caching algorithm.
To aid in group discovery during runtime (i.e., dynamically), DGD uses special keys with embedded
group identification information. Oversim, a flexible overly network simulation framework is used to
evaluate the proposed DGD algorithm. DGDâ€™s performance is compared to Chord and Static Group
Allocation (SGA â€“ i.e., where groups identification is done only once and does not change) networks. For
different cases, the size of the network, size of the communities and symmetry of the communities is
varied. Results are presented and analyzed for performance when queries are resolved using cache
data versus when queries are resolved using non-cache data. The analysis shows that using DGD
generally improves lookup performance when cache data is used to resolved queries. However, when
non-cache data is used, DGD occasionally performs slightly worse than Chord and SGA. For example, in
a network with 10,000 nodes, asymmetrical communities and no churn group churn, DGD outperforms
Chord by approximately half a hop and 0.1 seconds in latency. When churn was introduced to the same
network, DGDâ€™s performance drops by approximately one hop and 0.15 seconds in latency. The results
also show that approximately 90% of the queries are resolved using non-cache data and therefore, even
though DGD is guaranteed reduced lookup time when asymmetrical communities are present and cache
records are to used to resolve queries, it is often not enough to significantly improve overall system
performance. The results however confirm that caching resources based on personal interests really
does reduced lookup performance when resolving queries using cache records.

Adviser: Dr. Anura Jayasumana
Co-Adviser: N/A
Non-ECE Member: christos@cs.colostate.edu
Member 3: lqyang@engr.colostate.edu
Addional Members: N/A

Publications:
N/A

Program of Study:
CS 457
CS 557
ECE 421
ECE 501
ECE 514
ECE 658
GRAD 544C
ECE 699

Colorado State University

Electrical and Computer Engineering

Walter Scott, Jr. College of Engineering