Operations and Management Area Working Group                      P. Fan
Internet-Draft                                                     L. Li
Intended status: Informational                              China Mobile
Expires: January 16, 2014                                  July 15, 2013


  Requirements for IP/MPLS network transmission interruption duration
             draft-fan-opsawg-transmission-interruption-03

Abstract

   The transmission performance of IP/MPLS network affects upper layer
   services and networks, but there is no consensus in the industry on
   transmission interruption for IP/MPLS network up to now.  This memo
   studies requirements for the interruption duration criteria in
   several service scenarios.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 16, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Fan & Li                Expires January 16, 2014                [Page 1]


Internet-Draft      IP/MPLS transmission interruption          July 2013


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Services and Performance Criteria . . . . . . . . . . . . . .   3
     2.1.  Softswitch  . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  SS7 transport . . . . . . . . . . . . . . . . . . . . . .   5
     2.3.  LTE Backhaul  . . . . . . . . . . . . . . . . . . . . . .   6
     2.4.  Ethernet VPN  . . . . . . . . . . . . . . . . . . . . . .   6
     2.5.  IPTV  . . . . . . . . . . . . . . . . . . . . . . . . . .   7
   3.  Other considerations  . . . . . . . . . . . . . . . . . . . .   7
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   6.  Appendix: Impact Analysis on Transmission Quality of IP
       Carried Softswitch Voice  . . . . . . . . . . . . . . . . . .   7
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  10
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Today's IP/MPLS network is widely used as a bearer network to carry
   diversified packet switched services.  The transmission qualities of
   these services are closely related to the performance of bearer
   layers, as network failure, delay, congestion and other abnormities
   will inevitably bring about service interruption and user perception
   degradation.  However, there is no consensus in the industry on
   transmission interruption for IP/MPLS network up to now.  This memo
   studies relationships between service performance and transmission
   interruption duration in several scenarios, and is intended to reach
   a list of requirements for these interruption duration criteria.

   For a long time the industry has been aspiring for the so-called
   golden standard for network resilience, that is the 50-millisecond
   recovery threshold.  [HeavyReading] gives us a basic introduction to
   the origin of this fast protection legacy which can date back to
   1980s. The 50ms threshold was established informally in the early
   1980s, and then formally through standardization of [G.841]
   recommendation on SDH network protection architects.  The specific
   requirement shows a maximum threshold for detecting and restoring a
   fault of 60ms, which adds up fault detection duration of less than
   10ms and protection switching time of less than 50ms.  The report
   also mentions original concerns that the threshold results from.  The
   voice channel banks deployed in early 1980s had limited fault
   tolerance.  Failures that lasted longer than 200ms would generate a
   Carrier Group Alarm (CGA) which caused the channel bank to terminate
   all connections over that given TDM line.  So an outage budget was
   developed by carriers and the 50ms standard was employed to protect
   voice services.  However newer channel banks at that time had started



Fan & Li                Expires January 16, 2014                [Page 2]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   to implement a CGA timer of 2s, so the 50ms protection was adopted to
   protect a small and diminishing fraction of digital network.

   Historically this 50ms fast protection speed has been achieved by SDH
   network.  Using various fast convergence technics, IP/MPLS is also
   able to react within 50ms.  As for network applications that are
   carried by optical or packet core, changes have been made through the
   past decades, accompanied by the continuing questions about needs for
   50ms protection.  Here we list three basic considerations about
   services and their requirement for IP/MPLS: for services like TDM
   over IP/MPLS, the traditional 50ms guarantee should be kept and met;
   for current IP services (e.g. voice, internet), experiences or
   experiments are to be provided for guidance; for services in future,
   we are supposed to propose requirement early and give consideration
   to IP/MPLS.

2.  Services and Performance Criteria

   Services delivered by IP/MPLS network have different transmission
   quality requirements, thus introduce different performance criteria
   for the bearing IP/MPLS network.  We believe there are two principles
   that need to be considered during network and service design,
   configuration and operation.  The IP/MPLS bearer should satisfy
   quality requirements of upper level services and applications, while
   services and applications should also take into account the intrinsic
   IP capabilities.  In this section we will describe concerns on IP/
   MPLS and service mutual adaptation from aspects of several kinds of
   service scenarios.

2.1.  Softswitch

   From the softswitch point of view, the IP carrying nature imposes
   certain influence to the service quality.  Especially when speech is
   delivered by IP, the communication quality of voice is impaired, and
   in turn makes higher requirements for the transmission performance of
   IP.  The following table gives a list of criteria regarding
   transmission quality of a typical GSM network as well as impacting
   factors brought by IP bearer.

   +-----------------------------------------+------------------------+
   |            Criteria of GSM              |   Impacting Factors    |
   |         Transmission Quality            |  Brought by IP Bearer  |
   +----------+------------------------------+------------------------+
   |          |Call loss of wireless channel |          None          |
   |          +------------------------------+------------------------+
   |          |  Call loss between switches  |     Failure of Nc/Mc   |
   |Call Loss |    (typical value: <=1%)     | interface carried by IP|
   |          +------------------------------+------------------------+



Fan & Li                Expires January 16, 2014                [Page 3]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   |          | Call loss between switch and |          None          |
   |          |  BSC (typical value: <=0.5%) |                        |
   +----------+------------------------------+------------------------+
   |   Call   |      Call cut-off rate       |     Failure of Nc/Mc   |
   | Cut-off  |     (typical value: <1%)     | interface carried by IP|
   +----------+------------------------------+------------------------+
   |          |   Service providing delay    |          None          |
   |          +------------------------------+------------------------+
   |Connection|   Calling party connection   |   IP carried signaling |
   |  Delay   |  delay (typical value: <=4s) |          delay         |
   |          +------------------------------+------------------------+
   |          |Called party connection delay |          None          |
   |          |     (typical value: <=4s)    |                        |
   +----------+------------------------------+------------------------+


   If voice is carried by IP, communication quality criteria of call
   loss, call cut-off and connection delay are likely to be influenced.
   This subsection focuses on the three criteria and their impacting
   factors to give requirements for softswitch and IP bearer networks,
   with detailed analysis described in the appendix.  Note that the
   current discussion on softswitch is focused on quality of
   transmission while not on quality of voice.  In another word, the
   scope of discussion is limited to network related QoS aspect, while
   subjective QoE criteria such as PESQ (Perceptual Evaluation of Speech
   Quality) and MOS (Mean Opinion Score) are left to later revisions.

   Call loss related requirement:  The duration of SCTP interface
      association timer should be shorter than that of the state machine
      message timer of upper layer protocols, and this duration is
      further recommended to be no longer than 6 seconds in order to
      maintain detection sensitivity; the interruption duration of IP
      bearer network should be as short as possible to avoid call loss,
      and this duration is further recommended to be no longer than 5
      seconds.

   Call cut-off related requirement:  The SCTP association should be
      guaranteed during IP layer interruption to avoid interface
      breakoff alert.  The requirements are the same as those related to
      call loss.

   Connection delay related requirement:  The IP convergence time should
      be no longer than 3 seconds to ensure that connection delay is
      shorter than 4 seconds.

   The overall requirement for IP/MPLS interruption duration is no
   longer than 3 seconds.




Fan & Li                Expires January 16, 2014                [Page 4]


Internet-Draft      IP/MPLS transmission interruption          July 2013


2.2.  SS7 transport

   The Signaling System No. 7 (SS7/C7) network is one of the examples of
   the principle that services should take into account the ability of
   IP.  The bearer of SS7 protocol stack has been experiencing evolution
   from TDM to IP.  Traditionally the user parts of SS7 (including MAP,
   CAP, BSSAP+, ISUP, etc.) are carried by MTP layers, but the bearer
   has gradually been evolved into a packetized form with SIGTRAN
   (including M2PA, M2UA, M3UA, etc.) using SCTP associations over IP.
   The change requires transport layer to take mechanisms to meet demand
   of SCN signaling, and more importantly it requires protocols to make
   adaption to the "best effort" fact of IP.

   The SIGTRAN uses an architecture that can be described as standard IP
   plus unified transport plus diversified adaption units.  It
   introduces SCTP to realize reliable signaling transport over IP.  The
   SCTP itself provides reliable transmission mechanisms, such as path
   selection and monitoring, validation and acknowledgment mechanisms,
   and retransmission timing management.

   The unreliable nature of IP makes it necessary for the upper-level
   protocols to be more tolerable to the possible instability of bearer.
   Once a service request from a UE is accepted, the system allocates
   resources and establishes paths for the user.  A breakoff caused by
   IP will result in signaling disconnection or rerouting.  Signaling
   transmission path may also be switched back after IP layer restores.
   Frequent switchovers and disconnections lead to unnecessary system
   cost and service interruption, so parameters should be configured a
   little bit "insensitive" to try to sustain connections on control
   plane.

   One of the examples of parameter configuration is the timer value.
   The following gives two cases about SCTP on transport layer and M2PA
   on adaption layer.  The values should not be set very small to
   prevent unnecessary disconnection caused by IP instability.  However,
   because upper services of SS7 may also have timeout rules, values
   should not be set very large too to avoid violating the rules.

   1) SCTP

   SCTP uses RTO to manage timeout duration for retransmission in case
   of feedback missing.  The RTO is given an initial, a max and a min
   value, and is calculated instantaneously with a set of management
   rules.  Many other parameters are used for fault detection in SCTP.
   Association.Max.Retrans is used to indicate the upper limit of number
   of possible retransmission without considering endpoint down.
   Path.Max.Retrans is a similar value to detect path failure.  The
   parameters together characterize the ability of SCTP to tolerate



Fan & Li                Expires January 16, 2014                [Page 5]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   bearer downwards and provide reliable SS7 transport upwards.  The
   typical values of the parameters are RTO.Initial = 0.5 sec, RTO.MIN =
   0.5 sec, RTO.MAX = 1.5 sec, Path.Max.Retrans = 5, Assoc.Max.Retrans =
   10.

   2) M2PA

   Although protocols like H.248 and BICC can be carried directly upon
   SCTP, the user part protocols of SS7 usually have to be carried by
   SCTP/IP with the help of different adaption layers.  In this case,
   the attributes of adaption layers, e.g. M2PA used between STPs, are
   more important to SS7.  M2PA uses a T7 timer to indicate the maximum
   delay of acknowledgement and start T7 at the time of data
   transmission.  If no message is acknowledged after the maximum
   waiting time, T7 expires and M2PA sends a message of out of service
   to the peer end.  Because propagation delays in IP networks are more
   variable than in traditional SS7 networks, the value of T7 should be
   set considering IP propagation delays, as well as acknowledgement
   time, SCTP slow-start algorithms, upper service timers and other
   factors.  Typical value of T7 is 7~10 sec.

   Parameter configuration induced tolerance to bearer may have some
   influence on service, but it avoids service cut-off or severe user
   perception degradation.  For services like SMS or route lookup,
   possible latency may be introduced, but operations can still be
   completed after short delay.  Because SMS has no strict requirement
   for instantaneity, impact on service is limited.  If route lookup
   takes more time due to IP interruption and convergence, user may
   experience longer setup delay when dialing.  For service of location
   update, even if operation fails because bearer is interrupted for too
   long, UE has the mechanism to initiate request again.

2.3.  LTE Backhaul

   To be further analyzed.

2.4.  Ethernet VPN

   Ethernet VPNs (e.g. VPLS) are used to provide transparent Ethernet
   type layer 2 connections for customers.  Ethernet frames are treated
   as service payload and encapsulated and transported in providers MPLS
   network.  The interruption criteria of IP/MPLS bearer should
   guarantee continuity of Ethernet service, and IP/MPLS failover is not
   supposed to generate outage of Ethernet service.

   [Y.1731] and [IEEE802.1ag] describe in detail OAM functions and
   mechanisms for Ethernet, with specific recommendation on connectivity
   fault management.  Ethernet uses continuity check function to detect



Fan & Li                Expires January 16, 2014                [Page 6]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   loss of continuity between any pair of MEPs in a MEG, and this
   function is realized by sending CCMs (connectivity check messages)
   between peer MEPs.  When a MEP does not receive CCM from a peer MEP
   within a certain interval, it detects loss of continuity to that peer
   MEP.  The threshold interval is specified as 3.5 times the CCM
   transmission period, which corresponds to a loss of three consecutive
   CCMs from the peer MEP, and the CCM transmission period is
   recommended to be the default value of 1 second.  So the interruption
   duration of IP/MPLS for Ethernet VPN services should be less than 3
   seconds.

2.5.  IPTV

   To be further analyzed.

3.  Other considerations

   So far this document has focused on use cases and their requirement
   for IP/MPLS, and other practical issues are not included in this
   version.  For example, an IP/MPLS packet core is expected to carry a
   variety of services, so the requirement for IP/MPLS may have to
   include additional concerns on this multi-service co-existence
   scenario.  A simple and straight-forward way may be to satisfy the
   most critical need for protection time required by the services.
   Another issue is related to service awareness.  Whether service type
   is or can be known by IP/MPLS would influence the ability of IP/MPLS
   to provide reliability guarantee accordingly.  It seems to be easier
   to perform service identification on edge devices than network core.
   We believe these kinds of issues need to be taken into account, and
   currently we will just leave them to be updated in future revisions.

4.  Security Considerations

   TBD

5.  IANA Considerations

   This memo includes no request to IANA.

6.  Appendix: Impact Analysis on Transmission Quality of IP Carried
    Softswitch Voice

   This section describes impact on transmission quality of softswitch
   voice when carried by IP and requirements for IP bearer convergence
   time.

   1) Call Loss




Fan & Li                Expires January 16, 2014                [Page 7]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   Call loss is used to describe the circumstance where a phone call
   fails to establish after initiated by a subscriber due to network
   faults.  In the practical network, the call loss rate is mainly
   associated by the factors as follows:

   1.  Interfaces, including Nc, Mc and interface between MSS and SG.

   2.  State machine message timer.  If a timeout takes place, the state
       machine releases signaling messages, producing a call loss.
       Typical value of BICC timer is 10~15 seconds and value of DTAP
       timer about 15 seconds.

   3.  Interface association timer.  Associations breaks off at the
       expiration of timer.

   4.  Bearer network convergence time.

   If the configured timer duration of a state machine is shorter than
   the timer duration of interface association, then although interface
   association may not be broken off, call loss is still possible to
   occur due to message timer expiration.  If the association timer
   duration is shorter than IP routing convergence time, the association
   is considered broken off by SCTP, hence message loss at interface
   between MSS and SG as well as interface Nc results in massive call
   loss, and new calling request cannot be satisfied because of
   interface Mc breakoff.  In this case, the call loss rate can be
   calculated as

   Call Loss Rate = ( IP Convergence Time + Association Restoration
                    Time ) * CAPS / BHCA.


   However, if the association timer duration is longer than IP routing
   convergence time, then the association is considered normal by SCTP,
   and data will be retransmitted.  Although this may cause buffer
   overflow leading to call loss, the call loss rate is possible to
   achieve approximately zero if buffer is big enough.

   From the analysis above and practical operation experience, the
   requirements for softswitch and IP bearer are as follows: the
   duration of SCTP interface association timer should be shorter than
   that of the state machine message timer, and this duration is further
   recommended to be no longer than 6 seconds in order to maintain
   detection sensitivity; the interruption duration of IP bearer network
   should be as short as possible to avoid call loss during the IP layer
   interruption period, and this duration is further recommended to be
   no longer than 5 seconds.




Fan & Li                Expires January 16, 2014                [Page 8]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   2) Call Cut-off

   Call cut-off is referred to the abnormal release during a phone call
   due to reasons other than intentional release by any of the parties
   involved in the call.  The call cut-off rate is related with:

   1.  Interfaces, including Nc and interface between MSS and SG.

   2.  Interface association timer.

   3.  Bearer network convergence time.

   If the association timer duration is shorter than IP routing
   convergence time, established phone calls will be released once
   interruption of interface Nc or interface connecting MSS and SG is
   detected.  In the case of association breakoff, call cut-off rate can
   be calculated as

   Call Cut-off Rate = ( CAPS * Call Duration ) * Busy Hour Association
                       Breakoffs / BHCA.


   While if the association is not interrupted, the call cut-off rate
   can be approximately zero.

   In conclusion, the SCTP association should be guaranteed during IP
   layer interruption to avoid interface breakoff alert.  The
   requirements for softswitch and IP bearer are the same as those
   related to call loss.

   3) Connection Delay

   The connection delay from a call initiation by a calling party to
   PLMN should be no longer than 4 seconds.  This delay is affected by
   factors below:

   1.  RRC connection setup delay (irrelevant to whether service is
       carried by IP or not).

   2.  Core network signaling interaction delay.  The message number at
       interface Nc/Nb is 6, and is 8 (calling side) or 16 (called side,
       in case of IP-IP) at interface Mc.  Each message is with a delay
       of no longer than 50 milliseconds.  Calling message delay at
       interface Nc is no longer than 300 milliseconds.  If long
       distance call is made though CMN, the message delay is to be
       increased by transmission delay of 5 msec/km and CMN process
       delay.  So the message delay is likely to be 400 milliseconds.




Fan & Li                Expires January 16, 2014                [Page 9]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   3.  IP bearer network QoS and load.

   The connection delay is influenced by the delay criterion defined in
   the IP bearer network QoS, and is raised by delay, jitter, packet
   loss caused by network overload.  In addition, if the configured
   timer duration of interface association is too long, the SCTP
   sensitivity to the retransmitted messages after packet loss will be
   decreased, which increases connection delay.

   Connection delay is generally expressed as

   Connection Delay = IP convergence time + RRC connection setup delay
                      + Signaling Interaction Delay,


   and is no longer than 4 seconds.  So the IP network in normal working
   state should be constrained within a certain range of load to ensure
   that delay is shorter than 50 milliseconds, while in interruption
   state the IP convergence time should be no longer than 3 seconds to
   ensure that connection delay is shorter than 4 seconds.

   From the analysis of IP/MPLS performance according to the three
   criteria above, we suggest the transmission interruption duration of
   IP/MPLS network for softswitch service should be no longer than 3
   seconds.

7.  Acknowledgements

   The authors would like to thank Chris Donley and Melinda Shore for
   their kind help in content enrichment, and Christopher Liljenstolpe,
   Andrew Malis and Adrian Farrel for their helpful comments on the
   document.

8.  Informative References

   [G.841]    ITU-T Recommendation G.841, ., "Types and characteristics
              of SDH network protection architectures", October 1998.

   [HeavyReading]
              Bennett, G., "Resilience Reliability and OAM in Converged
              Network", Heavy Reading, Vol. 2, No. 6, February 2004.

   [IEEE802.1ag]
              IEEE Std 802.1ag-2007, ., "IEEE Standard for Local and
              metropolitan area networks, Virtual Bridged Local Area
              Networks, Amendment 5: Connectivity Fault Management",
              December 2007.




Fan & Li                Expires January 16, 2014               [Page 10]


Internet-Draft      IP/MPLS transmission interruption          July 2013


   [Y.1731]   ITU-T Recommendation Y.1731, ., "OAM Functions and
              Mechanisms for Ethernet based Networks", July 2011.

Authors' Addresses

   Peng Fan
   China Mobile
   32 Xuanwumen West Street, Xicheng District
   Beijing  100053
   P.R. China

   Email: fanpeng@chinamobile.com


   Lianyuan Li
   China Mobile
   32 Xuanwumen West Street, Xicheng District
   Beijing  100053
   P.R. China

   Email: lilianyuan@chinamobile.com






























Fan & Li                Expires January 16, 2014               [Page 11]