Structures et Algorithmes pour la coopération pair-à-pair

Peer-to-peer overlay networks are distributed systems without any centralized control. Peers form a self-organizing overlay network that runs on top of the Internet. These overlays offer various features such as robust routing, efficient search of data items, selection of nearby peers, redundant storage, anonymity, scalability and fault tolerance. A peer-to-peer network does not have the notion of clients or servers but only equal peer nodes that simultaneously function as both “clients” and “servers” to the other nodes of the network. New challenges such as topology maintenance arise, as nodes can join or leave at any time, and efficient content search, as no node has a complete knowledge of the entire topology. Peer-to-peer overlay networks do not arise from the collaboration between established and connected groups of systems, and without a reliable set of resources to share. The basic units are commodity PCs instead of well-provisioned facilities. The peers have autonomy, i.e., they are able to decide about services they wish to offer to other peers. Peers are assumed to have temporary network addresses, and have to be recognized and reachable even if their network address has changed. That is why issues of scale and redundancy become much more important than in traditional (centralized or distributed) systems. The peers cannot necessarily trust each other because of their autonomy. Indeed attacks in peer-to-peer networks are easy to set up. It is possible to join the network with multiple identities, the so called sybils, and affect the routing of messages, and the publication and lookup of content. This way content can be made invisible, eclipsed, for benign participants. Moreover content can be polluted by bogus publications. Peer-to-peer networks have three main principles, which specify a fully-distributed, cooperative network design with peers building a self-organizing system:
• The principle of sharing resources: all peer-to-peer systems involve an aspect of resource sharing, where resources can be physical resources, such as disk space or network bandwidth, as well as logical resources, such as services or different forms of knowledge. By sharing of resources, applications can be realized that could not be set up by a single node. They scale with the number of participants.
• The principle of decentralization: this is an immediate consequence of sharing of resources. Parts of the system or even the whole system are no longer operated centrally. Decentralization is in particular interesting in order to avoid single points of failure or performance bottlenecks in the system.
• The principle of self-organization: when a peer-to-peer system becomes fully decentralized there exists no longer a node that can centrally coordinate its activities, or a central database to store global information about the system. Nodes have to self-organize themselves, based on whatever local information is available and by interacting with locally reachable nodes (neighbors). The global behavior then emerges as the result of all the local behaviors

The Overlay Nodes Management layer covers the management of peers, which include the operations to join and leave the network, the discovery of peers, and routing algorithms. The neighbors of a peer are chosen depending on the look-up algorithm used. So we can say that the look-up algorithm specifies or defines a peer topeer network.

The peer-to-peer Services layer supports the underlying peer-to-peer infrastructure through scheduling of tasks, content and file management. Meta-data describe the content stored across the peers and the location information. The peer-to-peer Application layer is concerned with tools and applications that are implemented with specific functionalities on top of the underlying peer-to-peer overlay infrastructure. File sharing was the first and oldest application for peer-to-peer networks. Today it has been complemented by a wide variety of other applications using a peer-to-peer infrastructure, such as telephony (e.g., Skype), video streaming (e.g., Zattoo), distributed backup (e.g., Wuala), and games. The peer-to-peer overlay network consists of all the participating peers as network nodes. Every peer has a list of other peers he knows. They are used to forward any kind of messages, for routing, publishing, or searching purposes. There are links between any two peers that know each other: i.e., if a participating peer knows the location of another peer in the network, then there is a directed edge from the former node to the latter in the overlay network. Based on how the nodes in the overlay network are linked to each other, we can classify them as unstructured or structured. Unstructured Peer-to-Peer Networks An unstructured peer-to-peer network is formed when the overlay links are established arbitrarily. A new peer that wants to join the network can copy existing links of another node and then form its own links over time. The network uses flooding as the mechanism to send queries across the overlay, with a limited scope. When a peer receives the flood query, it sends back a list of all the content matching the query to the originating peer. While flooding-based techniques are effective for locating highly replicated items and are resilient to peers joining and leaving the system, they are poorly suited for locating rare items. Clearly this approach is not scalable as the load on each peer grows linearly with the total number of queries and the system size. Thus, unstructured peer-to-peer networks face one basic problem: peers become overloaded, therefore, the system does not scale when handling a high rate of aggregate queries and sudden increase in system size. Structured Peer to-Peer Networks In a structured peer-to-peer system, the neighbor relationship between peers and data locations is strictly defined. Among the structured systems, some implement a distributed hash table (DHT) using different data structures. The structure of a DHT can be decomposed into several main components. The foundation is an abstract key space, which typically consists of large inter values, e.g. the range from 0 to 2¹²⁸ − 1. A random unique key (identifier) from this keyspace is assigned to each participant. Each node stores a partial view of the whole distributed system, i.e., it maintains a small routing table consisting of its neighboring peers. Together these links form the overlay network. Based on this information, the routing procedure traverses several nodes, approaching the destination with each hop until the destination node is reached. This style of routing is sometimes called key based routing.

A data item is mapped to the same keyspace as the participating peers by hashing either the filename or the binary file itself using a consistent hash function. Thus, DHTs implement a proactive strategy to retrieve data by structuring the search space in a deterministic way. The way content is assigned to one or more participant(s) differs among the proposed DHTs. This mapping scheme of the data items to the peers defines the routing strategy on how to find the node who stores the data item. The routing is used in the same way for both basic functions: publish and lookup. In most implementations, this will not be a single node, but every data item is replicated several times on different nodes who are close to the data item in the keyspace.

In theory, DHT-based systems can guarantee that any data object can be located on average in O(log N) number of overlay hops, where N is the number of peers in the system. The underlying network path between two peers can be significantly different from the path on the DHT-based overlay network. Therefore, the lookup latency in DHT-based peer-to-peer overlay networks can be quite high and could adversely affect the performance of the applications running on top.

Table des matières

1 Introduction
1.1 Peer-to-peer networks
1.2 Networked virtual environments
1.3 Organisation of the Thesis
I An Augmented Delaunay Overlay for Decentralized Virtual Worlds
2 Introduction
2.1 Networked Virtual Environments
2.2 Contributions
3 Maintaining a Delaunay-based Overlay
3.1 Model and Definitions
3.2 Peer Insertion
3.3 Peer Deletion
4 Underlay Shortcuts
4.1 Principles
4.2 Related Work
4.3 Algorithm
4.4 Simulation Results and Evaluation
4.5 Conclusion
5 Dynamic and Distributed Clustering
5.1 Principles
5.2 Related Work
5.3 Algorithm
5.4 Simulation Results and Evaluation
5.5 Conclusion
6 Conclusion of Part I
II Measurements of real world peer-to-peer networks
7 Introduction
7.1 Contributions
8 Background and Methodology
8.1 Routing in Kademlia
8.2 Two-Level Publishing in Kademlia
8.3 Crawling Peers with Blizzard
8.4 Spying for Content with Mistral
8.5 Azureus
8.6 Vivaldi Network Coordinates
9 Peer behavior in KAD
9.1 Global View of KAD
9.2 Peer View
9.3 Related Work
9.4 Design Implications
9.5 Conclusion
10 Content in KAD
10.1 The Sybil Attack
10.2 The Content Pollution Attack
10.3 Spy Results
10.4 Reducing the Publish Actions
10.5 Related Work
10.6 Conclusion
Conclusion