P2PSIP Working Group                                             J. Peng
Internet-Draft                                                     L. Le
Intended status: Standards Track                            China Mobile
Expires: May 2, 2012                                             K. Feng
                                         Beijing University of Posts and
                                                      Telecommunications
                                                        October 30, 2011


              One Hop Lookups Algorithm Plugin for RELOAD
                  draft-peng-p2psip-one-hop-plugin-01

Abstract

   This document proposes an implementation of overlay plugin algorithm
   which is called one hop lookups to provide examples and references
   for the research of one hop based RELOAD.  With the development of
   the real time communications, there are high demands for the
   improvement of routing efficiency.  In the one hop algorithm, each
   peer maintains complete membership information which can guarantee
   one hop lookups to improve the routing efficiency.  For the RELOAD,
   using the one hop lookups algorithm to construct Topology Plugin with
   the same RELOAD core can have a better support for VoIP applications,
   and the implementation of one hop lookups algorithm plugin is based
   on the methods provided by RELAOD.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 2, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Peng, et al.               Expires May 2, 2012                  [Page 1]


Internet-Draft           One Hop Lookups Plugin             October 2011


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Hash functions . . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Peer data structure  . . . . . . . . . . . . . . . . . . . . .  4
   5.  Routing  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   6.  Joining  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   7.  Updates  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
     7.1.  Update messages definition . . . . . . . . . . . . . . . .  8
       7.1.1.  Update types . . . . . . . . . . . . . . . . . . . . .  9
       7.1.2.  Peer types . . . . . . . . . . . . . . . . . . . . . .  9
       7.1.3.  Event notification types . . . . . . . . . . . . . . .  9
       7.1.4.  PeerContactItem struct . . . . . . . . . . . . . . . .  9
       7.1.5.  EventNotification struct . . . . . . . . . . . . . . . 10
       7.1.6.  DataStructureContent struct  . . . . . . . . . . . . . 13
       7.1.7.  OneHopUpdate message . . . . . . . . . . . . . . . . . 14
     7.2.  Different using strategies . . . . . . . . . . . . . . . . 15
       7.2.1.  One hop lookups  . . . . . . . . . . . . . . . . . . . 15
       7.2.2.  D1HT . . . . . . . . . . . . . . . . . . . . . . . . . 15
       7.2.3.  1h-Calot . . . . . . . . . . . . . . . . . . . . . . . 15
       7.2.4.  Two hop lookups  . . . . . . . . . . . . . . . . . . . 16
   8.  Leaving  . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
   9.  Replication  . . . . . . . . . . . . . . . . . . . . . . . . . 17
   10. Fault tolerance  . . . . . . . . . . . . . . . . . . . . . . . 17
     10.1. First hop fail . . . . . . . . . . . . . . . . . . . . . . 17
     10.2. Leaders fail . . . . . . . . . . . . . . . . . . . . . . . 18
   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 18
   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     14.1. Normative References . . . . . . . . . . . . . . . . . . . 18
     14.2. Informative References . . . . . . . . . . . . . . . . . . 19
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19






Peng, et al.               Expires May 2, 2012                  [Page 2]


Internet-Draft           One Hop Lookups Plugin             October 2011


1.  Introduction

   RELOAD [I-D.ietf-p2psip-base] is a peer-to-peer (P2P) signaling
   protocol for use on the Internet.  The Architecture of RELOAD has a
   routing layer which is called the Topology Plugin.  It is responsible
   for implementing the specific overlay algorithm being used.  In the
   RELOAD base protocol, the Topology Plugin is described by Chord, and
   we define another DHT algorithm in order to provide a high rouging
   efficiency which can be used in real time communications for a better
   performance.

   The real time communication has a high demand on the routing
   efficiency, for example, the VoIP usage on the application level of
   RELOAD, it is no doubt that the routing speed or efficiency must be
   very important, and it depends on the looking up efficiency of the
   Overlay Topology.  There are many kinds of structured peer-to-peer
   overlay network algorithms like Chord, Pastry, CAN and so on.  Most
   of them have one in common is that they have a routing table which
   can maintain a small amount of routing state.  But all these
   algorithms need nearly O (log N) steps to find one peer or resource,
   if there are a large number of peers in the overlay, it costs a lot
   time to locate peer or resource which may have a bad influence on the
   application level of RELOAD.  The one hop lookups algorithm gives one
   way to maintain complete membership information at each node in order
   to increase the routing efficiency.

   Being a kind of Topology Plugin, O (1) protocols achieve low latency
   lookups on small or low-churn networks because lookups take only a
   few hups, but incur high maintenance traffic on large or high-churn
   networks.  On contrast, O (log N) protocols incur less maintenance
   traffic on large or high-churn networks but require more lookup hops
   in small networks.  The experiment results show that the routing
   table maintenance bandwidth using is still acceptable when the number
   of overlay nodes equal 10^5 and 10^6 [SingleHopDHT].  So, weighing
   the advantages and disadvantages, in a relative small and stable
   network like a low-churn telecom core net with VoIP applications, the
   O (1) algorithms are very important as a Topology Plugin of RELOAD.

   In the one hop algorithm, each peer maintains a complete description
   of system memberships, and it presents techniques for the peers to
   maintain this information accurately with lower costs of
   communications.  Different one hop algorithm implementations have
   different ways to keep the accuracy of routing states.  For example,
   the one hop lookups for peer-to-peer overlay [One-Hop-Lookups] uses
   event notification mechanism without broadcast to update all the
   peers' routing table during several seconds; the SandStone
   [SandStone] gives a way to collect update information by SPM (Super
   Peer Maintenance) which is called SPM assisted one hop enhancement.



Peng, et al.               Expires May 2, 2012                  [Page 3]


Internet-Draft           One Hop Lookups Plugin             October 2011


   The SandStone overlay is typically deployed as a two-layered DHT,
   including a global DHT and several regional DHT.  There is at least
   one SPM node in each region, and once one peer's state change is
   detected by its neighboring peer, the neighbor will notify its local
   SPM, and then the SPM will disseminate the event to other SPMs,
   finally each SPM will broadcast this notification to all peers under
   the charge of it.  The D1HT [D1HT] algorithm has an EDRA (Event
   Detection and Reporting Algorithm) to implemnt the event notification
   schema.  In the D1HT, all the peers are ordinary, and all of them use
   the EDRA to make the goal of routing table maintenance.  Like the
   D1HT, 1h-Calot [1h-Calot] is also a kind of one hop algorithms in a
   purely peer-to-peer environment, but it informs the event
   notifications by constructing a multicasting tree.  Besides, there is
   also an algorithm called two hop lookups [TwoHopLookups] which is an
   improvement of the one hop lookups to support a larger size overlay.
   We use the one hop lookups algorithm combined with two hop lookups,
   D1HT and 1h-Calot to construct the RELOAD Topology Plugin because
   there is no SPM like peers and two layered topology in RELOAD base.

   In this draft, we first give a simple overview of one hop lookups
   algorithm with some definitions, then fill the overlay specific data
   structures into the RELOAD frame and implements this algorithm with
   topology messages defined in the RELOAD Topology Plugin; at last, the
   replication and fault tolerance strategies are stated.  All of the
   definitions and implementations based on the requirements and methods
   provided by RELOAD.


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


3.  Hash functions

   In this one hop lookups algorithm topology plugin, the size of the
   node ID and resource ID is 128 bits.  The hash of an ID can be
   computed by SHA-1 [RFC3174] and other hash functions.


4.  Peer data structure

   A peer, or a node, is responsible for a particular Resource-ID which
   is less than or equal to its Node-ID and is greater than the Node-ID
   of the previous peer in the neighborhood.  In order to have a high
   routing accuracy and efficiency, each peer must keep and maintain a



Peng, et al.               Expires May 2, 2012                  [Page 4]


Internet-Draft           One Hop Lookups Plugin             October 2011


   series of data structure.

   In the one hop lookups algorithm, we divide the 128-bit circular
   identifier space into k equal contiguous intervals which are called
   slices.  Each slice has a slice leader, which is chosen as the node
   that is the successor of the mid-point of the slice space (other
   choosing methods are list in Section X).  Similarly, each slice also
   can be divided into u equal-sized intervals which are called units;
   each unit has a unit leader, which is chosen as the successor of the
   mid-point of the unit space.  And of course every peer in the overlay
   should know its unit header and slice header.  The slice leader and
   unit leader are both ordinary peers who are chosen dynamically, and
   they also can be configured at first.  The leader choosing strategies
   will be stated in section 10.

   In the SandStone [SandStone], there is at least one SPM node in each
   region, so when one peer detects the state change of overlay in its
   neighborhood, it will notify its local SPM as soon as possible.  The
   local SPM then disseminate this notification to other SPMs, at last,
   each SPM will broadcast this update to all peers under the charge of
   it.  The SPM is a special peer which can collect update information
   and do some other management tasks.  The SPM does not participate in
   the routing procedures and it is pre-configured.  On the other hand,
   the SPM uses the broadcast to spread the notifications.  Only the
   bootstrap peer in RELOAD has the broadcast function.  Using the
   broadcast is a simple and fast way to inform information, but it
   costs a lot of bandwidths and also may cause the notifications
   redundancy.  Using the event based notification mechanism defined in
   one hop lookups can avoid redundancy in the communications: one peer
   will get information only from its neighbor that is one step closer
   to its unit leader.  This implies that within a unit, information is
   always flowing from the unit leader to the ends of the unit.  But its
   disadvantage is obvious: the notifications are spread too slowly
   which may cause the delay of overlay stabilization.

   In the D1HT, all the peers are same; they use the EDRA mechanism to
   make sure the event can be disseminated to all peers.  To disseminate
   the information about the events, each peer p sends up to m
   propagation messages at each time interval.  The D1HT algorithm uses
   the EDRA to spread event notifications.  The details of EDRA can be
   found in [D1HT].

   In the 1h-Calot, a purely peer-to-peer architecture is preferable, in
   which nodes assume equal roles and share load evently.  It uses
   overlay multicast to efficiently replicate complete O (n) routing
   tables on every node.  And the peer in 1h-Calot also informs other
   peers of its arrival or departure by multicasting a notification
   through a tree.  The details of multicasting tree construction can be



Peng, et al.               Expires May 2, 2012                  [Page 5]


Internet-Draft           One Hop Lookups Plugin             October 2011


   found in [1h-Calot].

   In the two hop algorithm, the architecture is same as one hop
   algorithm with slice leaders and unit leaders.  In addition, every
   slice leader chooses a group of its own nodes for each other slice;
   the group may be chosen randomly.  Then the slice leader sends
   routing information about one group to exactly one other slice
   leader.  The infornation about the group is then disseminated to all
   members of that slice as in the one hop event notification schema.
   After this, every node not only keeps full routing information about
   nodes in its own slice, but also keeps a table of k-1 (k is the
   number of slice) nodes that are close to it, one from every other
   slice.

   From above, we can conclude the information peer should maintain as
   follows.
   Routing table:
       The routing table of each node keeps all of nodes' address, and
       then the source node can find the destination by just one hop at
       most if the items of routing table are right.  So it is very
       important to keep the correctness of each node's full routing
       table.  In order to achieve the goal, the notification of
       membership change events must reach every node in the overlay
       within a specified amount of time.
   Predecessor and successor information:
       The information especially Node-ID and address about peer's
       predecessor and successor must be kept in order to maintain the
       overlay circle.  A peer will communicate with its predecessor and
       successor with keep-alive message (defined in Update message)
       every several seconds.  Of course, this structure can be added as
       mark-ups or identifiers in the routing table.
   Unit leader information:
       The information about the unit leader who is responsible for the
       unit the peer belongs to.  A peer can communicate with its unit
       leader to receive the Update message.  Of course, this structure
       can be added as mark-ups or identifiers in the routing table.
   Slice leader information:
       The information about the slice leader who is responsible for the
       slice the peer belongs to.  A peer can send event notification
       message (defined in Update message) to its slice leader to inform
       events.  Of course, this structure can be added as mark-ups or
       identifiers in the routing table.

   The four kinds of information are maintained in all of peers, but
   there are still some kinds of other data structures should be kept in
   some special peers like unit leader and slice leader.





Peng, et al.               Expires May 2, 2012                  [Page 6]


Internet-Draft           One Hop Lookups Plugin             October 2011


   Unit boundary identifier:
       This identifier does not keep in all the peers, it is stored on
       the peer who is the boundary of the unit in order to make sure
       the Update message just being transferred within unit.
   Unit boundary peer list:
       The unit leader should know the information about the unit's
       boundary peer in order to know its control area.
   Slice leader list:
       Every slice leader should know the information about all the
       slice leaders in the overlay in order to dispatch the event
       notification messages in time.
   Group member list:
       Every slice leader should know the information about all the
       group members in the slice it is responsible in order to exchange
       group messages with other slice leaders.  This list will be used
       in the two hop lookups algorithm.

   Each peer with these data structures can compose a well running
   overlay topology based on the one hop lookups algorithm.


5.  Routing

   The routing table of the peer has full information of all the nodes
   in the overlay.  If the peer is not responsible for a resource whose
   ID is r, it will query the routing table in order to find the first
   peer whose id is greater than r, which means that the peer should be
   responsible for this resource; and then sends a request message to
   the destination node.  If no such node is found, it finds the largest
   Node-ID in the interval between the peer and Resource-ID r.

   In the two hop algorithm, when a peer wants to query the successor of
   a resource, it sends a lookup request to its chosen peer in the slice
   containing the key of this resource.  The chosen peer then examines
   its own routing table to identify the successor of the key and
   forwards the request to that peer.  So, one peer can find the
   resource at most two steps.


6.  Joining

   The Join message defined in [I-D.ietf-p2psip-base] with a Node-ID has
   contained enough information to construct a joining request.

   The joining process for a joining peer (JP) with Node-ID n is as
   follows (using the one hop lookups algorithm as an example).





Peng, et al.               Expires May 2, 2012                  [Page 7]


Internet-Draft           One Hop Lookups Plugin             October 2011


   (1)   JP MUST connect to its chosen or preconfigured bootstrap node.
   (2)   JP SHOULD send an Attach request to its admitting peer (AP) for
         Node-ID n.  The "send_update" flag should be used to acquire
         the routing table and other information for AP.
   (3)   JP MUST send a Join to AP, the AP sends the response to the
         Join.
   (4)   AP MUST do a series of Store requests to JP to store the data
         that JP will be responsible for.
   (5)   JP SHOULD send Attach request to its predecessor in order to
         let it knows the arriving of new peer; the predecessor should
         update its successor information at once.
   (6)   The predecessor detects a change in its routing table for
         example it has a new successor, it MUST send an event
         notification message which is a kind of Update messages to its
         slice leader directly.  Of course, the Update message can also
         be sent by the AD.
   (7)   The slice leader MUST collect all notifications it receives
         from the peers in his slice and sends Update messages to other
         slice leaders every several seconds to inform these events.
   (8)   Other slice leaders MUST dispatch the Update messages they
         received to all the unit leaders in their respective slices.
   (9)   A unit leader SHOULD piggyback this information on its Update
         message to its successor and predecessor.
   (10)  Other nodes propagate this information from their successors;
         they SHOULD send it to their successors and vice versa.
   (11)  Peers at unit boundaries SHOULD NOT send information to their
         neighboring peers outside their unit.

   The Update messages depending on the RELOAD message formats will be
   defined in the next section.  The peer can find its successor by
   sending a Find request with the Resource-ID equals its Node-ID.


7.  Updates

   The update message is the primary overlay-specific maintenance
   message which can be used by the sender to notify other peers the
   current state of overlay.  Every peer in the overlay can maintain and
   keep the correctess of routing table by sending and receiving update
   messages.

7.1.  Update messages definition

   The update message defined in this section will be used by the one
   hop algorithm, D1HT, 1h-Calot and the two hop algorithm.  It is a
   template for one hop like algorithms.





Peng, et al.               Expires May 2, 2012                  [Page 8]


Internet-Draft           One Hop Lookups Plugin             October 2011


7.1.1.  Update types

   enum { eventUpdate (0), dataStructureUpdate (1), (255) }
     UpdateType;


   The UpdateType gives an enumeration of update message which includes
   eventUpdate and dataStructureUpdate.  The eventUpdate message will
   take a list of event notifications which will have an influence on
   peer's routing table.  The dataStructureUpdate message will take a
   list of data structure information wich includes the routing table
   items.  This structure allows for unknown option types.

7.1.2.  Peer types

   enum { ordinaryPeer (0), unitLeaderPeer (1),
     sliceLeaderPeer (2), (255) } PeerType;


   The PeerType gives an enumeration of peer type which can be used by
   different one hop like algorithms.  For example, the concepts of unit
   leader and slice leader are used in heterogeneous DHTs like the one
   hop lookups and two hop lookups.  And in D1HT and 1h-Calot, all the
   peers are ordinary.

7.1.3.  Event notification types

   enum { peerJoin (0), peerLeave (1), groupPeerInformation (3), (255) }
   EventType;


   The EventType gives an enumeration of event kinds, the common events
   include peer joining and peer leaving.  Almost all of the algorithms
   need the two events.  The third event type is groupPeerInformation
   which is used in the two hop algorithm to spread group members'
   information.

7.1.4.  PeerContactItem struct

   struct {
     Node-ID               peer_id
     IpAddressPort         peer_addr_port
   } PeerContactItem;


   The struct of PeerContactItem is used to construct the information
   about peers.  The Node-ID and IpAddressPort are defined in
   [I-D.ietf-p2psip-base] which represents the ID of the peer and peer's



Peng, et al.               Expires May 2, 2012                  [Page 9]


Internet-Draft           One Hop Lookups Plugin             October 2011


   IP address and port.

7.1.5.  EventNotification struct

   struct {
     EventType                                                     event_type;
     PeerType                                                      peer_type;
     uint32                                                                length;

     select (event_type) {

       case peerJoin:
         select (PeerType) {
           case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer:
             uint32                                                          slice_number;
             uint32                                                          unit_number;
             PeerContactItem                               joining_peer;
             PeerContactItem                               replaced_peer;
         }

       case peerLeave:
         select (PeerType) {
           case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer:
             uint32                                                          slice_number;
             uint32                                                          unit_number;
             PeerContactItem                               leaving_peer;
         }

       case groupPeerInformation:
         select (PeerType) {
           case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer:
                   uint32                                                        group_size;
                   uint32                                                        slice_number;
             PeerContactItem <0...n>                       group_peer_list;
         }
     }
   } EventNotificationItem;


   The structure of EventNotificationItem is an item of
   EventNotification message.  The peer who receives this item can
   refresh its routing table and other important data structures by
   analysing the information it takes.

   The EventNotificationItem structure contains parameters as follows.






Peng, et al.               Expires May 2, 2012                 [Page 10]


Internet-Draft           One Hop Lookups Plugin             October 2011


   event_type
       The type of event the message will take, its value is one of the
       types defined in the EventType.  This structure allows for
       unknown event types.
   peer_type
       The type of peer who wants to get this information, its value is
       one of the types defined in the PeerType.  This structure allows
       for unknown peer types.
   slice_number, unit_number
       The number of slice and unit the peer who caused this event
       belongs to.  These two parameters are only used in the one hop
       lookup algorithm and the two hop lookup algorithm.
   joining_peer
       The contact information of the peer's who is going to be one of
       the overlay members.
   replaced_peer
       The contact information of the peer's who is replaced by the new
       arrival one.  This variable is only used in the one hop lookup
       algorithm and the two hop lookup algorithm when the slice leader
       or unit leader changes.
   leaving_peer
       The contact information of the peer's who is going to leave the
       overlay.
   group_size
       This variable means the number of peers in one group.  This
       variable is only used in the two hop lookup algorithm.
   group_peer_list
       The representatives or entry points' contact information for
       other slices.  This list is only used in the two hop lookup
       algorithm.
   length
       This variable means the length of the remainder of this message.

   typedf EventNotificationItem <0...n>                    EventNotificationItemList;

   struct {
     uint32                                                                          uptime;
     PeerContactItem                                               detect_peer;
     uint32                                                                          D1HT_TTL;
     uint32                                                                          multicast_range_start;
     uint32                                                                          multicast_range_end;
     uint32                                                                          event_notification_item_counts;
     EventNotificationItemList                             event_notification_item_list;
   } EventNotification;


   The EventNotificaton struct is the most important message body in
   Updates.  This message can be used in the one hop lookup algorithm,



Peng, et al.               Expires May 2, 2012                 [Page 11]


Internet-Draft           One Hop Lookups Plugin             October 2011


   D1HT, 1h-Calot and the two hop lookups algorithm.  But different
   algorithms will use different variables.  In the D1HT and 1h-Calot,
   all the peers are ordinary, so both of them will not use the
   variables related to unit or slice.

   The EventNotification structure contains parameters as follows.
   uptime
       The time this peer has been up in seconds.
   detect_peer
       This variable means the peer's contact information who detects
       these events.
   D1HT_TTL
       This variable is only used in the D1HT algorithm to be a mark for
       message spreading.  Other algorithms can set it as zero.
   multicast_range_start, multicast_range_end
       These two variables mean the multicast range marked by the root
       of a mulcasting tree.  They are only used in the 1h-Calot.
   event_notification_item_counts
       This variable means the number of events one update message will
       take.  In the 1h-Calot algorithm, every peer will construct the
       multicasting tree to spread notifications immediately when it
       detects the overlay changes.  So, the variable can be set as 1 in
       the 1h-Calot.
   event_notification_item_list
       This variable means a list of event notifications.  In the one
       hop lookup algorithm, D1HT and the two hop lookup algorithm, this
       variable can reduce some network traffic.
























Peng, et al.               Expires May 2, 2012                 [Page 12]


Internet-Draft           One Hop Lookups Plugin             October 2011


7.1.6.  DataStructureContent struct

   struct {
     PeerType                                                        peer_type;
     PeerContactItem                                       predecessor_peer;
     PeerContactItem                                       successor_peer;
     PeerContactItem                                       routing_table_list <0...n>;
     uint32                                                                  length;

     select (peer_type) {
       case ordinaryPeer:
         uint32                                                      unit_number;
         uint32                                                      slice_number;
         PeerContactItem                           unit_leader;
         PeerContactItem                           slice_leader;

       case unitLeaderPeer:
         PeerContactItem                           boundary_peers <2>;

       case sliceLeaderPeer:
         PeerContactItem                           slice_leaders_list <0...n>;
         PeerContactItem                           group_peers_list <0...n>;
     }
   } DataStructureContent


   The structure DataStructureContent mainly used in the transferring of
   data structure during the peer joining procedure or receiving a
   message contains the send_update flag [I-D.ietf-p2psip-base].  The
   length of the structure changes in different peer type because
   different peer has different data structure.  The ordinary peer only
   has predecessor, successor, unit leader, slice leader and routing
   table information.  And the unit leader and slice leader should also
   know some other information.

   The DataStructureContent structure contains parameters as follows.
   routing_table_list
       This variable represents the peer's routing table which has all
       of the peers' information in the overlay.  In the two hop lookup
       algorithm, the routing table contains all of the same slice
       peers's information and other slices' entry peers information.
   boundary_peers
       This variable represents the information of peers' who are the
       boundary of unit.  It is used in the one hop lookup algorithm and
       the two hop lookup algorithm.






Peng, et al.               Expires May 2, 2012                 [Page 13]


Internet-Draft           One Hop Lookups Plugin             October 2011


   slice_leaders_list
       All of the slice leaders' information will be stored in this
       list, and it can be transferred from one slice leader to another
       or from the old slice leader to the new one.  It is used in the
       one hop lookup algorithm and the two hop lookup algorithm.

7.1.7.  OneHopUpdate message

   typedef EventNotification<0...n>        EventNotificationList;
   typedef DataStructureContent<0...n>     DataStructureContentList;

   struct {
     UpdateType           update_type;
     uint32               length;

     select (update_type) {
     case eventUpdate:
       EventNotificationList       event_notification_list;

     case dataStructureUpdate:
       DataStructureContentList    data_structure_list;
     }
   } OneHopUpdateReq;


   This structure is the Update message used in the O (1) protocols.
   The update message is composed by two kinds of list.  Using list can
   take more information.  One of them is the EventNotification
   structures, and the other is the DataStructureContent structures.
   All the peers in the overlay can use this update request to inform
   event notification or transport routing information.

   struct {
     uint32          update_response;
   } OneHopUpdateAns;


   The structure OneHopUpdateAns is used to be a response to the
   OneHopUpdateReq message.  It is only has a response number to
   represent the receiver's attitude which may include success, fail and
   error.  This number also can be defined as a type.
   update_response
       The response of the Update message, different number has
       different meaning includes success, error and so on.







Peng, et al.               Expires May 2, 2012                 [Page 14]


Internet-Draft           One Hop Lookups Plugin             October 2011


7.2.  Different using strategies

   The update message defiend above can cover 4 kinds of O (1)
   protocols, they are the one hop lookup protocol, D1HT, 1h-Calot and
   the two hop lookup protocol.  Different protocol has different using
   strategies for the same update message.

7.2.1.  One hop lookups

   In the one hop lookups, event notifications can be disseminated in a
   hierarchical architecture.  The slice leader collects all the events
   happened in its responsible slice and spreads this event notification
   list to all the other slice leaders.  The notifications are composed
   of peer joining and leaving without group information.  It also does
   not use the variable D1HT_TTL which is designed for D1HT specially.
   Other slice leaders receive the notification list and then send it to
   the unit leaders it is responsible for.  At last, the unit leader
   spread this list to the ordinary peers via the keep-alive messages.

7.2.2.  D1HT

   In the D1HT, event can be disseminated by the EDRA (Event Detection
   and Reporting Algorithm), which is able to notify an event to whe
   whole system in logartithmic time and yet to have good load-balance
   properties coupled with very low bandwitdth overhead.  This algorithm
   will use the D1HT_TLL variable defined in the EventNotification
   structure.  In the EDRA, each message will have a Time-To-Live (TTL)
   counter and will be addressed to its successor.  The details of this
   algorithm can be found in [D1HT].  The update message of D1HT is far
   different from the one hop lookups algorithm, because all the peers
   in D1HT are ordinary, so it just use the messages related to the
   ordinary peers.  Other protocols will set the D1HT_TTL as zero or a
   negative number.

7.2.3.  1h-Calot

   The overlay architecture of 1h-Calot is same as D1HT because all the
   peers are ordinary.  In fact, the principles of 1h-Calot are
   extremely simple: multicast maintains the routing tables; information
   in the routing table is then used to guide multicast and routing.  In
   the 1h-Calot, a resource is stored on the node whose identifier is
   the closet to the resource's key in absolute distacnce, regardless of
   the direction (clockwise or counterclockwise), this is the basic of
   the routing table maintenance algorithm.  A new arrival peer informs
   other peers of its arrival by multicasting a notification through a
   tree rooted at the new arrival peer.  Every message will take a range
   variable to tell the next peer its responsible informing boundary.
   1h-Calot multicasts each notification through a different tree



Peng, et al.               Expires May 2, 2012                 [Page 15]


Internet-Draft           One Hop Lookups Plugin             October 2011


   expanded just in time according to the current content of the routing
   tables.

   This algorithm will use the multicast_range_start and
   multicast_range_end variables defined in the EventNotification
   structure to carry the range information.  Beacause the 1h-Calot peer
   will notify and construct its multicasting tree as soon as the peer
   detects the overlay changes, so there is only one update message on a
   multicasting tree which means that the variable
   event_notification_item_counts will always be one.

7.2.4.  Two hop lookups

   Although the one hop lookups is a very efficient DHT algorithm, for
   systems of larger size, the bandwidth requirements of this scheme may
   become too large for a significant fraction of peers.  The two hop
   lookups schema keeps a fixed fraction of the total routing state on
   each peer and consumes much less bandwidth, and thus scales to a
   larger system size.

   The overlay architecture of two hop lookups is as same as the one hop
   lookups.  In addition, every slice leader chooses a group of its own
   peers for each other slice; the group may be chosen randomly or they
   may be based on proximity metrics.  The slice leader sends routing
   information about one group which can be carried in the
   group_information_list defined in the EventNotificationItem structure
   to exactly one other slice leader.  The information about the group
   is then dissemanated to all members of that slice as in the one hop
   lookups.  At las, each peer not only keeps full routing information
   about peers in its own slice, but also has routing information for
   the peers in every other slice.  The update message dissemanating is
   still accomplished by the event notification schema used in the one
   hop lookups.


8.  Leaving

   The leaving action is also a kind of event, so it can be expressed by
   the update message.  When a peer leaves the overlay peacefully, it
   needs to send a leaving message to its predecessor or successor which
   may just include some traditional information like the Node-ID and
   some security proves like the certificate.  And the predecessor or
   succeor who receives this leaving message will verify its identity
   and address a update message contains the peer leaving information to
   its slice leader.  The data transfer can also use the update
   messages, but it sometimes depends on the replication strategies.

   If a peer leaves the overlay without any notification, the neighbor



Peng, et al.               Expires May 2, 2012                 [Page 16]


Internet-Draft           One Hop Lookups Plugin             October 2011


   peer (predecessor or successor) must inform this event to the leader
   by the update messages, and then the backup peer of the leaving one
   should transfer the backup data to the new one who is charging the
   leaving peer's namespace.  When a unit leader or slice leader leaves
   this overlay, the overlay should choose a new one to replace the old
   one.


9.  Replication

   The replica placement policy of the one hop algorithm can use the way
   defined in SandStone.  It is designed to maximize data reliability
   and availability, and to maximize network bandwidth utilization.  In
   the SandStone, the subscriber data can be replicated on multiple
   peers, three on default.  We can use a SandStone like replica
   replacement policy with three replications.  The first replica is
   stored in the primary peer's successor; the second is saved in a peer
   of different unit but in same slice; and the third is put in a peer
   from different slice.

   Above all, one of the most important issues is the choosing of the
   replica peer.  Maybe we can use a special function whose input is
   peer's Node-ID to calculate the three replica peers, and of course
   the root peer should know about the replication method in order to be
   able to search the data.


10.  Fault tolerance

10.1.  First hop fail

   In the one hop algorithm, if a query fails on its first attempt it
   does not return an error to an application.  Instead, queries can be
   rerouted with the RELOAD message RouteQueryAns.  We can define the
   RouteQueryAns as follows.

   struct {
      uint32               response_state;
      PeerContactItem      destination_peer;
   } RouteQueryAns;

   The structure RouteQueryAns is composed by a state number and a
   Node-ID.  The state number tells the query state and the
   PeerContactItem gives a peer's information that is closer to the
   destination.  And in most of cases, two hops are enough to locate one
   peer or resource.  If the two hops are still wrong, then we can know
   the lookup procedure is fail.




Peng, et al.               Expires May 2, 2012                 [Page 17]


Internet-Draft           One Hop Lookups Plugin             October 2011


   response_state
       The response mark of the RouteQueryReq message's; and it may
       include success, failure and errors.
   destination_peer
       If the query is failed, the application can send RouteQueryReq
       message to this peer again for finding resources.

   If a lookup query from peer A to peer B because the routing table is
   outdate or the peer B has left, peer A can retry the query by sending
   the RouteQueryReq to the peer C who is the successor of peer B. And
   at that time joining a peer D to be peer B's new successor and peer
   C's predecessor, then peer C can reply RouteQueryAns message with the
   redirect_peer of peer D's, and peer A can contact it in this routing
   step.

10.2.  Leaders fail

   The most important issues are how to keep the correct functioning of
   unit leaders and slice leaders; we need to recover from their
   failure.  When a slice or unit leader fails, its successor soon
   detects the failure and becomes the new leader; the successor of a
   failed slice leader will communicate with its unit leaders and other
   slice leaders to recover information about the missed events.


11.  Security Considerations

   There is no specific security consideration associated with this
   draft.


12.  IANA Considerations

   There are no IANA considerations associated to this memo.


13.  Acknowledgements


14.  References

14.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [I-D.ietf-p2psip-base]
              Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and



Peng, et al.               Expires May 2, 2012                 [Page 18]


Internet-Draft           One Hop Lookups Plugin             October 2011


              H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
              Base Protocol", draft-ietf-p2psip-base-15 (work in
              progress), May 2011.

14.2.  Informative References

   [RFC3174]  Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1
              (SHA1)", RFC 3174, September 2001.

   [One-Hop-Lookups]
              Gupta, A., Liskov, B., and R. Rodrigues, "One Hop Lookups
              for Peer-to-Peer Overlays", June 2003.

   [SandStone]
              Shi, G., Chen, J., Gong, H., Fan, L., Xue, H., Lu, Q., and
              L. Liang, "SandStone: A DHT based Carrier Grade
              Distributed Storage System", September 2009.

   [TwoHopLookups]
              Gupta, A., Liskov, B., and R. Rodrigues, "Efficient
              Routing for Peer-to-Peer Overlays".

   [D1HT]     Monnerat, L. and C. Amorim, "D1HT: A Distributed One Hop
              Hash Table".

   [1h-Calot]
              Tang, C., Buco, M., Chang, R., Dwarkadas, S., Luan, L.,
              Edward, E., and C. Ward, "On the Tradeoff among Capacity,
              Routing Hops, and Being Peer-to-Peer in the Design of
              Structured Overlay Networks".

   [SingleHopDHT]
              Monnerat, L. and C. Amorim, "Peer-to-Peer Single Hop
              Distributed Hash Tables".


Authors' Addresses

   Jin Peng
   China Mobile
   Unit 2, 28 Xuanwumenxi Ave,
   Xuanwu District
   Beijing  100053
   P.R.China

   Email: Penjin@chinamobile.com





Peng, et al.               Expires May 2, 2012                 [Page 19]


Internet-Draft           One Hop Lookups Plugin             October 2011


   Lifeng Le
   China Mobile
   Unit 2, 28 Xuanwumenxi Ave,
   Xuanwu District
   Beijing  100053
   P.R.China

   Email: Lelifeng@chinamobile.com


   Kai Feng
   Beijing University of Posts and Telecommunications
   10 Xi Tu Cheng Rd.
   Haidian District
   Beijing  100876
   P.R.China

   Email: fengkai_sunny@139.com

































Peng, et al.               Expires May 2, 2012                 [Page 20]