IETF SIPPING Working Group                                       C. Shen
Internet-Draft                                            H. Schulzrinne
Intended status: Standards Track                             Columbia U.
Expires: May 25, 2010                                           A. Koike
                                                                     NTT
                                                       November 21, 2009


  A Mechanism for Session Initiation Protocol (SIP) Avalanche Restart
                            Overload Control
            draft-shen-sipping-avalanche-restart-overload-00

Abstract

   When a large number of clients register with a SIP registrar server
   at approximately the same time, the server may become overloaded.
   Near-simultaneous floods of SIP SUBSCRIBE and PUBLISH requests may
   have similar effects.  Such request avalanches can occur, for
   example, after a power failure and recovery in a metropolitan area.
   This document describes how to avoid such overload situations.  Under
   this mechanism, a server estimates an avalanche restart backoff
   interval during its normal operation and conveys this interval to its
   clients through a new Register-Restart header in registration
   responses.  Once an avalanche restart actually occurs, the clients
   perform backoff based on the previously received Register-Restart
   header value before sending out the first registration attempt.
   Thus, the mechanism spreads all the initial client registration
   requests and prevents them from overloading the registrar server.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at



Shen, et al.              Expires May 25, 2010                  [Page 1]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 25, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.

































Shen, et al.              Expires May 25, 2010                  [Page 2]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5
   3.  Register-Restart Header for Registration Responses  . . . . . . 5
     3.1.  Generating the Register-Restart Header  . . . . . . . . . . 5
     3.2.  Determining the Register-Restart Header Value . . . . . . . 5
     3.3.  Processing the Register-Restart Header  . . . . . . . . . . 6
     3.4.  Using the Register-Restart Header . . . . . . . . . . . . . 6
   4.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
   5.  Backward Compatibility  . . . . . . . . . . . . . . . . . . . . 7
   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     8.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
     8.2.  Informative References  . . . . . . . . . . . . . . . . . . 8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 9


































Shen, et al.              Expires May 25, 2010                  [Page 3]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


1.  Introduction

   A Session Initiation Protocol (SIP) [RFC3261] server can be
   overloaded for a number of different reasons.  One of them is
   avalanche restart, which is described in [RFC5390] as follows:

      Avalanche Restart: One of the most troubling sources of overload
      is avalanche restart.  This happens when a large number of clients
      all simultaneously attempt to connect to the network with a SIP
      registration.  Avalanche restart can be caused by several events.
      One is the "Manhattan Reboots" scenario, where there is a power
      failure in a large metropolitan area, such as Manhattan.  When
      power is restored, all of the SIP phones, whether in PCs or
      standalone devices, simultaneously power on and begin booting.
      They will all then connect to the network and register, causing a
      flood of SIP registration messages.  Another cause of avalanche
      restart is failure of a large network connection, for example, the
      access router for an enterprise.  When it fails, SIP clients will
      detect the failure rapidly using the mechanisms in [RFC5626].
      When connectivity is restored, this is detected, and clients re-
      registration, all within a short time period.  Another source of
      avalanche restart is failure of a proxy server.  If clients had
      all connected to the server with TCP, its failure will be
      detected, followed by re-connection and re-registration to another
      server.  Note that [RFC5626] does provide some remedies to this
      case.

   The SIP server avalanche restart overload problem is caused by the
   synchronized, simultaneous initial registration attempts after a
   failure recovery.  If the first round of registration attempts from
   all clients cause server overload, most of those registrations will
   fail.  Those clients will then by default all retry after the same
   amount of time, causing repeated server avalanche restart overload.
   [RFC5626] describes how to alleviate this situation: if the initial
   registration attempt after the boot fails, the clients wait for a
   randomized backoff time before retrying.  This mechanism reduces the
   possibility of repeated avalanche restart.  However, since all
   clients still send registration immediately after boot, it does not
   prevent the initial avalanche restart overload.

   A key method to prevent avalanche restart server overload is to have
   clients backoff before their first registration attempt.  The backoff
   intervals of each client must be carefully selected so that their
   registration attempts are spaced sufficiently far apart not to
   overload the server, and they are also not too conservative which may
   cause unnecessary client registration delays.  An individual client,
   without knowing the state information of all other peer clients and
   the registrar server, is inherently incapable of choosing such an



Shen, et al.              Expires May 25, 2010                  [Page 4]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   appropriate backoff interval.

   This document specifies a solution to the avalanche restart overload
   problem by allowing the registrar server to instruct the clients how
   long they should wait before the initial registration upon a restart
   event.  Under this mechanism, the server estimates an avalanche
   restart backoff interval during its normal operation.  This interval
   is the minimum period of time that the server needs to serve all the
   expected registration requests after an avalanche restart, assuming
   all the registration requests are properly spaced.  In order for the
   server to convey this interval to its clients, this document defines
   a new SIP Register-Restart header.  The registrar server places the
   avalanche restart backoff interval into the Register-Restart header
   and inserts it into regular responses to client registration
   requests.  When an avalanche restart actually happens, each client is
   required to wait a randomly-chosen time between 0 and the avalanche
   restart backoff interval.  Any PUBLISH or SUBSCRIBE requests after
   the restart must be sent after the registration has been completed.

   This document also defines an algorithm to determine the avalanche
   restart backoff interval based on the server's processing capability
   and the number of clients it is serving.  The effectiveness of this
   algorithm depends on the assumption that both the server processing
   capability and the number of clients the server serves remain similar
   before and after the event that causes the avalanche restart.  This
   assumption holds true in most avalanche restart cases when the
   registrar server before and after the avalanche restart is the same
   one, e.g., in the "Manhattan Reboots" scenario, and the failure and
   recovery of a large network connection scenario.  The cases where the
   clients will all register with a different server after restart is
   more complicated, because those clients will most likely not have an
   avalanche restart backoff interval value from the new registrar
   server a priori.  In those cases, if the initial avalanche restart
   overload still occurs, the mechanisms in [RFC5626] can be used to
   help prevent repeated avalanche restart overload.

   Some devices, especially mobile terminals, may have lower layer
   (e.g., PHY or Data Link layer) backoff or blocking mechanisms during
   avalanche restart or congestion cases.  This document addresses the
   SIP application layer.  Operators can disable this application layer
   avalanche restart protection method if the lower layer has already
   provided similar mechanism.  Such cross-layer optimizations, however,
   are out of scope of this document.

   This document complements other SIP server overload control
   specifications which address different aspects of the SIP server
   overload space, such as [I-D.hilt-sipping-overload],
   [I-D.shen-sipping-load-control-event-package], and



Shen, et al.              Expires May 25, 2010                  [Page 5]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   [I-D.ietf-sipping-overload-design].


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


3.  Register-Restart Header for Registration Responses

   This document defines the SIP Register-Restart header for
   registration responses.  The value of the Register-Restart header, in
   seconds, denotes the avalanche restart backoff interval, which is the
   minimum time the server needs to successfully service all likely
   client registration requests under an avalanche restart situation,
   assuming all requests are spaced evenly in time.

3.1.  Generating the Register-Restart Header

   A SIP registrar server inserts a Register-Restart header containing
   its most up-to-date avalanche restart backoff interval value in the
   responses to registration requests.

   Example:

       SIP/2.0 200 OK
       Register-Restart: 300

3.2.  Determining the Register-Restart Header Value

   A registrar server computes and updates the Register-Restart header
   values and conveys them to the clients during its normal operations.
   Once an avalanche restart actually happens, the most recent Register-
   Restart header value that the clients have received from the
   registrar server are used.  A registrar server MAY use the following
   algorithm for determining the appropriate Register-Restart header
   value.

   During the normal operation period, the SIP registrar server
   maintains the current count of all its registrants, e.g., assuming
   the number of registered clients is R. The SIP registrar server also
   estimates its processing capacity, e.g., assuming it is C requests
   per second.  The Register-Restart value can be set to (R/C)*(1+k),
   where k is a small coefficient that provides a capacity redundancy.
   A recommended value of k is 0.1.




Shen, et al.              Expires May 25, 2010                  [Page 6]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   It should be noted that change of either R or C adjusts the server
   computed Register-Restart value.  The value C is usually stable with
   the same server and registration request pattern.  The value R may
   change over time.  The server SHOULD recompute the Register-Restart
   value whenever there is a change in either R or C, unless it is
   considered too expensive to do so, which is normally not the case.
   Since the updated Register-Restart value is only pushed to the
   clients when the client sends in a registration, there might be a
   short period where the server side updated Register-Restart value and
   some client side stored Register-Restart values are not synchronized.
   However, considering that the changing pace of R is slow, and the
   time scale between the possible happenings of avalanche restarts
   (e.g., months) is usually much larger than the interval between
   typical registration renewals (e.g., hours), these short periods of
   discrepancies are not a concern.  Therefore, in general this approach
   provides a sufficiently accurate characterization of the system
   status.  More importantly, the values of R and C are expected to
   remain constant for the same server before and after typical
   avalanche restart events, e.g., a power failure and recovery.

3.3.  Processing the Register-Restart Header

   Before receiving the very first registration response from a new
   registrar server, the client restart backoff value for that registrar
   server is zero, i.e., the restart backoff mechanism is disabled.

   Upon receiving a response to the registration request containing the
   Register-Restart header, a SIP client that supports this
   specification MUST check if there is an existing Register-Restart
   header value for this registrar stored in the system.  If not, it
   stores the newly received Register-Restart header value.  Otherwise,
   it compares the new value with the existing one and updates it if
   they differ.  The value of Register-Restart header MUST be stored
   together with the corresponding identity of the registrar server,
   e.g., the DNS name of the registrar server.  There is no separate
   validity period parameter for Register-Restart.  The validity
   duration of the Register-Restart header is the same as that of the
   corresponding registration operation.

3.4.  Using the Register-Restart Header

   At the client side, avalanche restart backoff is disabled by default,
   unless the client that supports this specification has received a
   positive Register-Restart header value from the corresponding
   registrar server.

   A SIP client always keeps the most updated Register-Restart header
   value.  When this value is positive and if the client detects that it



Shen, et al.              Expires May 25, 2010                  [Page 7]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   is about to perform the first registration with the same registrar
   server after a power-off reboot or a connection-loss recovery, the
   client SHOULD generate a uniformly distributed random interval
   between 0 and the current Register-Restart value, and wait until the
   end of that interval to send the registration request.  However, the
   client side backoff MAY be manually disabled by a human operator when
   necessary, e.g., when the operator is expecting an urgent call, or
   when the power-off or connection-loss event is known as a local
   incident rather than a global event.

   Furthermore, in order to prevent similar floods of SUBSCRIBE and
   PUBLISH requests after the restart, any PUBLISH and SUBSCRIBE MUST be
   sent after the registration has been completed.

   It should be noted that the power-off reboot case requires that the
   state information about the Register-Restart value and the registrar
   server identity be stored in a memory space that could survive power
   restart.


4.  Syntax

   The new Register-Restart header adds the following lines to the
   existing SIP header definition.


       message-header = Register-Restart

       Register-Restart = "Register-Restart" HCOLON delta-seconds



5.  Backward Compatibility

   If a registrar server supports this specification, but not all of its
   clients are upgraded, then those non-compliant clients will ignore
   the Register-Restart header and not perform backoff.  Although it
   appears that this might give the non-compliant clients an unfair
   advantage over those clients that do perform backoff, since the non-
   compliant clients will send synchronized registration attempts to the
   registrar server and can cause server overload, they will be
   penalized by registration failures.  Depending on the number of the
   non-compliant clients vs. compliant clients, if the registrar server
   can still process requests when it does not receive registration
   storms from all the non-compliant clients, the requests from the
   compliant clients which spread apart, are more likely to succeed.





Shen, et al.              Expires May 25, 2010                  [Page 8]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


6.  Security Considerations

   The Register-Restart header can be used by an attacker to launch a
   possible denial-of-service attack on SIP clients if the attacker can
   insert an infinitely large Register-Restart value in the response
   sent to the clients.  In those situations, the client may generate a
   very large backoff time before it attempts to send a registration
   request, and therefore the client is subject to denial-of-service
   attack.  However, this kind of attack is only applicable after a
   power-cycle reboot or failure and recovery of a large network
   connection, which is rare.  Furthermore, if the attacker can modify
   the registration request or response, that attacker can very easily
   prevent registration in any number of ways, so the Register-Restart
   header does not introduce new types of attacks.  One method to
   prevent the registration request and response from being altered by
   attackers is to use TLS between the client and the registrar server.


7.  IANA Considerations

   [TBD]


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

8.2.  Informative References

   [I-D.hilt-sipping-overload]
              Hilt, V. and H. Schulzrinne, "Session Initiation Protocol
              (SIP) Overload Control", draft-hilt-sipping-overload-07
              (work in progress), October 2009.

   [I-D.ietf-sipping-overload-design]
              Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design
              Considerations for Session Initiation Protocol (SIP)
              Overload Control", draft-ietf-sipping-overload-design-02
              (work in progress), July 2009.




Shen, et al.              Expires May 25, 2010                  [Page 9]


Internet-Draft   SIP Avalanche Restart Overload Control    November 2009


   [I-D.shen-sipping-load-control-event-package]
              Shen, C., Schulzrinne, H., and A. Koike, "A Session
              Initiation Protocol (SIP) Load Control Event Package",
              draft-shen-sipping-load-control-event-package-03 (work in
              progress), October 2009.

   [RFC5390]  Rosenberg, J., "Requirements for Management of Overload in
              the Session Initiation Protocol", RFC 5390, December 2008.

   [RFC5626]  Jennings, C., Mahy, R., and F. Audet, "Managing Client-
              Initiated Connections in the Session Initiation Protocol
              (SIP)", RFC 5626, October 2009.


Authors' Addresses

   Charles Shen
   Columbia University
   Department of Computer Science
   1214 Amsterdam Avenue, MC 0401
   New York, NY  10027
   USA

   Phone: +1 212 854 3109
   Email: charles@cs.columbia.edu


   Henning Schulzrinne
   Columbia University
   Department of Computer Science
   1214 Amsterdam Avenue, MC 0401
   New York, NY  10027
   USA

   Phone: +1 212 939 7004
   Email: hgs@cs.columbia.edu


   Arata Koike
   NTT Service Integration Labs &
   NTT Washington DC Representative Office
   1100 13th St., NW, Suite 900
   Washington DC,   20005
   USA

   Phone: +1 202 312 1451
   Email: koike.arata@lab.ntt.co.jp




Shen, et al.              Expires May 25, 2010                 [Page 10]