IPFIX Working Group                                            L. Peluso
Internet-Draft                                                  T. Zseby
Intended status: Informational                Fraunhofer Institute FOKUS
Expires: December 28, 2007                                  S. D'Antonio
                                         CINI Consortium/ITeM Laboratory
                                                           June 26, 2007


                       Flow selection Techniques
                 draft-peluso-flowselection-tech-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 28, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   Flow selection is the process in charge of electing a limited number
   of flows from all of those accounted at an observation point to be
   considered into the measurement process chain.  The flow selection
   process can be enabled at different stages of the monitoring
   reference model by directly acting on the metering process after that
   packet classification is performed, i.e. flow state dependent packet



Peluso, et al.          Expires December 28, 2007               [Page 1]


Internet-Draft          Flow selection Techniques              June 2007


   sampling, or on the exporting process by limiting the number of flows
   to be stored and/or exported to the collector applications.  This
   document describes the motivations which might lead flow selection to
   be performed and a categorization of the related techniques.  The
   document furthermore provides the basis for the definition of
   information models for configuring flow selection techniques.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1.  General terminology  . . . . . . . . . . . . . . . . . . .  3
     3.2.  Selection process related terminology  . . . . . . . . . .  6
   4.  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   5.  Flow selection techniques  . . . . . . . . . . . . . . . . . .  7
     5.1.  Flow selection on flow record content  . . . . . . . . . . 10
     5.2.  Flow selection on flow record arrival time . . . . . . . . 11
     5.3.  Flow selection on external events  . . . . . . . . . . . . 11
   6.  Solutions for flow cache data structure  . . . . . . . . . . . 11
   7.  Information model for flow selection configuration . . . . . . 11
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 12
     11.2. Informative References . . . . . . . . . . . . . . . . . . 12
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13
   Intellectual Property and Copyright Statements . . . . . . . . . . 15















Peluso, et al.          Expires December 28, 2007               [Page 2]


Internet-Draft          Flow selection Techniques              June 2007


1.  Introduction

   <Text for this section>


2.  Scope

   The main aim of this document is to describe and analyse the flow
   selection that can be performed inside an IPFIX device.  This
   document does not intend to deal with the flow selection that might
   result from the sampling of packets in the metering process before
   that the classification process is performed.  Although that approach
   leads to a natural selection of the flows generated after the
   classification process, packet sampling techniques are widely
   analysed in [PSAMP-TECH] and, therefore, outside the scope of this
   document.  Instead, it describes those selection techniques that
   might be considered in order to enable flow selection by directly
   acting at flow level within the metering process and/or the exporting
   process.


3.  Terminology

   The terminology used here is fully consistent with all terms listed
   in [IPFIX-ARCH] and [PSAMP-TECH] and includes additional terms
   required for the description of flow selection techniques.  For the
   sake of clarity, the definitions of the terms here used are below
   reproposed.

3.1.  General terminology

   * Observation Point

      An Observation Point is a location in the network where IP packets
      can be observed.  Examples include:

        (i)   a line to which a probe is attached;

        (ii)  shared medium, such as an Ethernet-based LAN;

        (iii) a single port of a router, or a set of interfaces
              (physical or logical) of a router;

        (iv)  an embedded measurement subsystem within an interface.

      Note that every Observation Point is associated with an
      Observation Domain, and that one Observation Point may be a
      superset of several other Observation Points.  For example one



Peluso, et al.          Expires December 28, 2007               [Page 3]


Internet-Draft          Flow selection Techniques              June 2007


      Observation Point can be an entire line card.  That would be the
      superset of the individual Observation Points at the line card's
      interfaces.

   * Observation Domain

      An Observation Domain is the largest set of Observation Points for
      which Flow information can be aggregated by a Metering Process.
      For example, a router line card may be an observation domain if it
      is composed of several interfaces, each of which is an Observation
      Point.  Each Observation Domain presents itself to the Collecting
      Process using an Observation Domain ID to identify the IPFIX
      Messages it generates.  Every Observation Point is associated with
      an Observation Domain.  It is recommended that Observation Domain
      IDs are also unique per IPFIX Device.

   * Observed Packet Stream

      The Observed Packet Stream is the set of all packets observed at
      the Observation Point.

   * IP Traffic Flow or Flow

      There are several definitions of the term 'flow' being used by the
      Internet community.  Within the context of IPFIX we use the
      following definition: A Flow is defined as a set of IP packets
      passing an Observation Point in the network during a certain time
      interval.  All packets belonging to a particular Flow have a set
      of common properties.  Each property is defined as the result of
      applying a function to the values of:

      1.  One or more packet header fields (e.g. destination IP
          address), transport header fields (e.g. destination port
          number), or application header field;

      2.  One or more characteristics of the packet itself (e.g. number
          of MPLS labels);

      3.  One or more fields derived from packet treatment (e.g. next
          hop IP address, output interface).

      A packet is said to belong to a Flow if it completely satisfies
      all the defined properties of the Flow.  This definition covers
      the range from a Flow containing all packets observed at a network
      interface to a Flow consisting of just a single packet between two
      applications.  It includes packets selected by a sampling
      mechanism.




Peluso, et al.          Expires December 28, 2007               [Page 4]


Internet-Draft          Flow selection Techniques              June 2007


   * Flow Key

      Each of the fields which

      1.  Belong to the packet header (e.g. destination IP address);

      2.  Are a property of the packet itself (e.g. packet length);

      3.  Are derived from packet treatment (e.g.  AS number)

      and which are used to define a Flow are termed Flow Keys.

   * Flow Record

      A Flow Record contains information about a specific Flow that was
      observed at an Observation Point.  A Flow Record contains measured
      properties of the Flow (e.g. the total number of bytes for all the
      Flow's packets) and usually characteristic properties of the Flow
      (e.g. source IP address).

   * Metering Process

      The Metering Process generates Flow Records.  Inputs to the
      process are packet headers and characteristics observed at an
      Observation Point, and packet treatment at the Observation Point
      (for example the selected output interface).  The Metering Process
      consists of a set of functions that includes packet header
      capturing, timestamping, sampling, classifying, and maintaining
      Flow Records.  The maintenance of Flow Records may include
      creating new records, updating existing ones, computing Flow
      statistics, deriving further Flow properties, detecting Flow
      expiration, passing Flow Records to the Exporting Process, and
      deleting Flow Records.

   * Exporting Process

      An Exporting Process sends Flow Records to one or more Collecting
      Processes.  The Flow Records are generated by one or more Metering
      Processes.

   * Exporter

      A device which hosts one or more Exporting Processes is termed an
      Exporter.

   * IPFIX Device





Peluso, et al.          Expires December 28, 2007               [Page 5]


Internet-Draft          Flow selection Techniques              June 2007


      An IPFIX Device hosts at least one Exporting Process.  It may host
      further Exporting processes and arbitrary numbers of Observation
      Points and Metering Process.

   * Collecting Process

      A Collecting Process receives Flow Records from one or more
      Exporting Processes.  The Collecting Process might process or
      store received Flow Records, but such actions are out of scope for
      this document.

   * Collector

      A device which hosts one or more Collecting Processes is termed a
      Collector.

3.2.  Selection process related terminology

   In this section, some additional terms are presented which extend the
   terminology introduced in [PSAMP-TECH].

   * Flow Selection Process

      A Flow Selection Process takes the set of the accounted Flow
      Records as its input and selects a subset of that set as its
      output.

   * Flow Selection State

      A Flow Selection Process may maintain state information for use by
      the Flow Selection Process.  At a given time, the Flow Selection
      State may depend on flows observed at and before that time, and
      other variables.  Examples include:

        (i)   number of accounted flow records;

        (ii)  number of available rooms for flow recording;

        (iii) state of the pseudorandom number generators;

        (iv)  hash values calculated during selection.

   * Flow Selector

      A Flow Selector defines the action of a Flow Selection Process on
      a single flow of its input.  The Flow Selector can make use of the
      following information in determining whether a flow is selected:




Peluso, et al.          Expires December 28, 2007               [Page 6]


Internet-Draft          Flow selection Techniques              June 2007


        (i)   the content of the flow record;

        (ii)  any information state related to the flow recording;

        (iii) any selection state that may be maintained by the Flow
              Selection Process.


4.  Motivation

   As stated in [PSAMP-TECH], packet selection is in charge of electing
   a representative subset of packets that allow accurate estimates of
   properties of the unsampled traffic to be formed.  Its main
   application consists in performing some forms of data reduction on
   observed Internet traffic in order to limit the processing overhead
   at measurement devices.  Despite its proven ability in achieving this
   objective, the mechanism responsible for steering the selection
   process is generally driven by a packet-based decision strategy.  It
   means that, the basis element on which this selection mechanism is
   performed is a packet and mainly the decision of which packets are
   suitable to be elected somehow depends on packets themselves.  As a
   consequence, depending on the specific adopted selection strategy,
   packet selection may not take in consideration eventual impacts of
   its actions on subsequent measurement components, such as flow
   recording and exporting processes, which are instead based on a
   higher-level representation, i.e. flows rather than packets.  Under
   this perspective, flow selection differs from packet selection in the
   way that the basis elements on which the selection process is applied
   is not a packet but a flow.  For IPFIX this would be flow records.
   In many networks the distribution of the number of packets per flow
   or the number of bytes per flow are heavy-tailed.  That means, most
   flows consist only of a small number of packets and only few flows
   have a large number of packets.  The few large flows contribute to
   the majority to the overall traffic volume [DuLT01a], [DuLT01b].
   This observation on the flow size distributions in Internet traffic
   is also referred to as "Quasi-Zipf-Law" [KuXW04] or as "elephant and
   mice phenomenon".  The large flows are referred to as elephant flows
   or heavy hitters.  Nevertheless, such observations depend on the flow
   definition in use and can change with regard to the profile of future
   applications.  For several applications it makes sense to select only
   the flows of interest. [more here].


5.  Flow selection techniques

   Figure 1 shows the IPFIX reference model as defined in [IPFIX-ARCH],
   and extends it in order to point out the functional components where
   flow selection can take place.  As previously mentioned, flow



Peluso, et al.          Expires December 28, 2007               [Page 7]


Internet-Draft          Flow selection Techniques              June 2007


   selection can be provided at different stages of the measurement
   chain.  One can act at packet level, within the metering process,
   and/or at flow level, by directly operating on the flow recording
   and/or exporting processes.















































Peluso, et al.          Expires December 28, 2007               [Page 8]


Internet-Draft          Flow selection Techniques              June 2007


                       Packet(s) coming in to Observation Point(s)
                         |                                   |
                         v                                   v
        +----------------+-------------------------+   +-----+-------+
        |          Metering Process on an          |   |             |
        |             Observation Point            |   |             |
        |                                          |   |             |
        |   packet header capturing                |   |             |
        |        |                                 |...| Metering    |
        |   timestamping                           |   | Process N   |
        |        |                                 |   |             |
        |   packet selection                       |   |             |
        |        |                                 |   |             |
        |   classification                         |   |             |
        |        |                                 |   |             |
        |   flow state dependent sampling (*)      |   |             |
        |        |                                 |   |             |
        |   aggregation                            |   |             |
        |        |                                 |   |             |
        |   flow recording (*)                     |   |             |
        |        |                                 |   |             |
        |        |        Timing out Flows         |   |             |
        |        |    Handle resource overloads    |   |             |
        +--------|---------------------------------+   +-----|-------+
                 |                                           |
         Flow Records (selected by Observation Domain)  Flow Records
                 |                                           |
                 +----------------------+--------------------+
                                        |
                 +----------------------|---------------+
                 | Exporting Process    v               |
                 |      +---------------+-----------+   |
                 |      |     flow selection (*)    |   |
                 |      +---------------+-----------+   |
                 |                      |               |
                 |      +---------------+-----------+   |
                 |      |        flow export        |   |
                 |      +---------------+-----------+   |
                 |                      |               |
                 +----------------------+---------------+
                                        |
                                        v
                         IPFIX export packet to Collector

      (*) indicates where flow selection can take place.

                                 Figure 1




Peluso, et al.          Expires December 28, 2007               [Page 9]


Internet-Draft          Flow selection Techniques              June 2007


   As for the metering process, the flow selection consists in
   accounting only a subset of all the incoming packets collected at the
   observation point.  However, unlike the selection process realized
   before the packet classification is performed, the flow selection at
   the metering process is in charge of electing only those incoming
   packets which somehow satisfy certain conditions related to the flows
   state information available from the flow recording process.  This
   kind of selection is referred as a packet sampling technique, in
   accordance with [PSAMP-TECH] which introduces it as flow state
   dependent sampling.  The state of the stored flow records is thus
   considered during the packet selection, so that the process
   responsible for generating or updating flow records might result
   easily influenced by selectively accounting the packets which feed
   it.  Under this perspective, unlike the flow selection performed at
   the flow recording and exporting processes, flow selection operate at
   a very early stage to regard to the concept of flow, as it acts at
   packet level.  In this way, in fact, one can prevent that some
   observed/observable packets might enforce the flow recording process
   to account, for instance, not representative or not expected flow
   records.  Coming to the flow selection that might be provided in the
   flow recording and/or exporting processes, as above clarified, it is
   done at flow level, therefore, after that packets are classified in
   to the correspondent flows.  More exactly, the flow selection process
   can be carried out on the flow recording process by storing new flow
   records only in those cases in which enough resources are available
   at the monitoring device to maintain them or by discarding already
   accounted records which, under certain circumstances and at a certain
   point in time, might be retained not anymore representative.
   Finally, at the flow exporting process it might be required that not
   all of the stored flow records available to be exported can be
   actually send to the collectors.

   We can distinguish the following selection techniques:

   1.  based on flow record content (i.e. all reported flow
       characteristics);

   2.  based on flow record arrival time;

   3.  based on external events like the exhaustion of local resources.

5.1.  Flow selection on flow record content

   <Text for this section>







Peluso, et al.          Expires December 28, 2007              [Page 10]


Internet-Draft          Flow selection Techniques              June 2007


5.2.  Flow selection on flow record arrival time

   <Text for this section>

5.3.  Flow selection on external events

   <Text for this section>


6.  Solutions for flow cache data structure

   The flow cache is the component of the flow monitoring system which
   in charge of storing flow records, i.e. the data structures devoted
   to contain values of predefined metrics related to every observed
   flow.  The effectiveness of the flow cache definitely affects the
   overall performance of the flow monitoring system.  This is the most
   challenging component, as it has to search for the flow records and
   update the related metrics within the packet interarrival time.
   Elements in the flow cache can be ordered according to a Least
   Recently Used (LRU) algorithm: as a packet arrives at the network
   interface it is classified, i.e. a flowID, is computed and assigned
   to it.  Solutions for the generation of flow IDs and search
   mechanisms for flow records within flow cache are described in
   [MoCD06].  In case a corresponding flow record, i.e. a record with
   that flowID, already exists in the linked list, then it is updated
   with packet-related data and moved to the top of the list.
   Otherwise, a new flow record is created and inserted on top of the
   list.  This ordering algorithm allows addressing two issues: first,
   timed out flows can be easily identified by scanning the list from
   the tail and checking for each record whether the difference between
   the last update time and the current time exceeds the timeout value.
   Second, it is intuitive that records related to living flows
   transporting a lot of traffic, the so-called elephant flows, are
   frequently moved to the head of the list.  Therefore, data about such
   flows can be found with high probability by scanning the LRU list
   from the head.


7.  Information model for flow selection configuration

   This section aims at describing the representative parameters of the
   above presented flow selection techniques.  To this regard, this
   section provides the basis for an information model to adopt in order
   to configure the flow selection process at an IPFIX device.







Peluso, et al.          Expires December 28, 2007              [Page 11]


Internet-Draft          Flow selection Techniques              June 2007


8.  IANA Considerations

   This document makes no request of IANA.


9.  Security Considerations

   <Text for this section>


10.  Acknowledgements

   <Text for this section>


11.  References

11.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

11.2.  Informative References

   [DuLT01a]  Duffield, N., Lund, C., and M. Thorup, "Charging from
              Sampled Network Usage", ACM Internet Measurement Workshop
              IMW 2001, San Francisco, USA, November 2001.

   [DuLT01b]  Duffield, N., Lund, C., and M. Thorup, "Properties and
              Prediction of Flow Statistics from Sampled Packet
              Streams", ACM SIGCOMM Internet Measurement Workshop 2002,
              November 2002.

   [DuLT01c]  Duffield, N., Lund, C., and M. Thorup, "Learn More, sample
              less: control of volume and variance in network
              measurement", IEEE Transactions on Information Theory,
              May 2005.

   [DuLT01d]  Duffield, N., Lund, C., and M. Thorup, "Flow Sampling
              under Hard Resource Constraints", ACM IFIP Conference on
              Measurement and Modeling of Computer Systems SIGMETRICS,
              June 2004.

   [EsVa01]   Estan, C. and G,. Varghese, "New Directions in Traffic
              Measurement and Accounting: Focusing on the Elephants,
              Ignoring the Mice", ACM SIGCOMM Internet Measurement
              Workshop 2001, San Francisco (CA), November 2001.




Peluso, et al.          Expires December 28, 2007              [Page 12]


Internet-Draft          Flow selection Techniques              June 2007


   [FeGL98]   Feldmann, A., Rexford, J., and R. Caceres, "Efficient
              Policies for Carrying Web Traffic over Flow-Switched
              Networks", IEEE/ACM Transaction on Networking,
              December 1998.

   [IPFIX-ARCH]
              Sadasivan, G., Bownlee, N., Claise, B., and J. Quittek,
              "Architecture for IP Flow Information Export", Internet
              Draft draft-ietf-ipfix-architecture-12.txt, work in
              progress, September 2006.

   [IPFIX-INFO]
              Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
              Meyer, "Information Model for IP Flow Information Export",
              Internet Draft draft-ietf-ipfix-info-15.txt, work in
              progress, February 2007.

   [KuXW04]   Kumar, K., Xu, J., Wang, J., Spatschek, O., and L. Li,
              "Space-code bloom filter for efficient per-flow traffic
              measurement", INFOCOM 2004 Twenty-third AnnualJoint
              Conference of the IEEE Computer and Communications
              Societies, March 2004.

   [MoCD06]   Molina, M., Chiosi, A., D'Antonio, S., and G. Ventre,
              "Design principles and algorithms for effective high-speed
              IP flow monitoring", September 2006.

   [Moli03a]  Molina, M., "A scalable and efficient methodology for flow
              monitoring in the Internet", International Teletraffic
              Congress (ITC-18), Berlin, September 2003.

   [PSAMP-TECH]
              Zseby, T., Molina, M., Raspall, F., Duffield, N., and S.
              Niccolini, "Sampling and Filtering techniques for IP
              Packet Selection", Internet
              Draft draft-ietf-psamp-sample-tech-10.txt, work in
              progress, June 2007.














Peluso, et al.          Expires December 28, 2007              [Page 13]


Internet-Draft          Flow selection Techniques              June 2007


Authors' Addresses

   Lorenzo Peluso
   Fraunhofer Institute FOKUS
   Kaiserin-Augusta-Allee 31
   Berlin  10589
   Germany

   Phone: +49 30 3463 7171
   Email: lpeluso@fokus.fraunhofer.de


   Tanja Zseby
   Fraunhofer Institute FOKUS
   Kaiserin-Augusta-Allee 31
   Berlin  10589
   Germany

   Phone: +49 30 3463 7153
   Email: zseby@fokus.fraunhofer.de


   Salvatore D'Antonio
   CINI Consortium/ITeM Laboratory
   Monte S.Angelo, Via Cinthia
   Napoli  80126
   Italy

   Phone: +39 081 679944
   Email: saldanto@unina.it





















Peluso, et al.          Expires December 28, 2007              [Page 14]


Internet-Draft          Flow selection Techniques              June 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Peluso, et al.          Expires December 28, 2007              [Page 15]