IPFIX Working Group                                         G. Cheng
Internet Draft                                               J. Gong
Intended status: Standards Track                            w. Zhang
Expires: Dec 23,2011                                           H. Wu
                                                Southeast University
                                                       June 22, 2011


                   A Composite IP Packet Selector
              draft-cheng-ipfix-packet-selector-00.txt

Abstract

This document specifies a composite IP packet selector in Metering
Process of the IP Flow Information Export protocol (IPFIX). The
composite selector is realized by combining a sampling selector
using systematic or random sampling technique followed by a hash-
based filtering selector computing the hash function on 5-tuples
information (source/ destination IP address, source/destination port
number, port). Taking flow sampling into account in packet selection,
the designed composite selector could better solve the short-flow
lost problem meeting in simple systematic or random sampling
selector.

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time.  It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts is at
http://datatracker.ietf.org/drafts/current/.

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

This Internet-Draft will expire on December 22, 2011.



Cheng, et al              Expires December 23, 2011            [Page 1]Internet-Draft         A Composite IP Packet Selector         June 2011


Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the
document authors.  All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document.  Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.  Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.





































Cheng, et al             Expires December 23, 2011             [Page 2]Internet-Draft         A Composite IP Packet Selector         June 2011


Table of Contents

1. Introduction.....................................................4
2. Terminology......................................................4
3. Composite selector...............................................6
   3.1. Architecture................................................6
   3.2. Simple sampling selector....................................7
   3.3. Hash-based filtering selector...............................7
   3.4. Algorithm...................................................8
   3.5. Hash Function...............................................8
4. Formal Syntax....................................................9
5. Security Considerations..........................................9
References ........................................................10
Acknowledgments....................................................10
Author's Addresses.................................................10




































Cheng, et al             Expires December 23, 2011             [Page 3]Internet-Draft         A Composite IP Packet Selector         June 2011


1. Introduction

With the network data rates increment and fine-grained traffic
measurements need, sustained capture of network traffic at line rate
is difficult to perform even with the expensive specialized
measurement hardware. Therefore, some form of data reduction at the
point of measure is necessary. This can be achieved by an
intelligent packet selection through Sampling or Filtering, as well
as use of aggregation techniques. The motivation for Sampling is to
select a representative subset of packets that allow accurate
estimates of properties of the unsampled whole traffic. The
motivation for Filtering is to remove all packets that are not of
interest. The motivation for aggregation is to combine data and
allow compact pre-defined views of the traffic. Flow-based IP
traffic measurements synthetically apply packet selection and
aggregation techniques to achieve the capture of network traffic at
line rate in the backbone link.

The IPFIX working group gives a brief description about their
systematic and random sampling techniques using for packet selection
in metering process (section 5.2 of RFC 3917). With good use of
packet sampling method, they could efficiently reduce the data
amount to capture at the observation point. However, the simple
sampling techniques have a natural disadvantage in the capture of
short-flows. With equal probability to select each packet, long-
flows with a large number of packets have more opportunity to be
captured than short-flows with a relatively small number of packets.
Therefore, with a lower sampling probability, the simple sampling
techniques here may lead to serious lost of short-flows in flow-
based IP traffic measurements. Furthermore, short-flows usually
produced by anomalous network events such as DDoS attack. In a word,
simple sampling techniques have a high lost rate in the capture of
short-flows which could be used to find and analyze network
anomalous event.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119.

2. Terminology

The terminology defined here is fully consistent with all terms
listed in [RFC 5474 and RFC 5475] but includes additional terms
required for the description of the specific filtering selector.

In addition, this document defines the following terms





Cheng, et al             Expires December 23, 2011             [Page 4]Internet-Draft         A Composite IP Packet Selector         June 2011


* Filtering: A filter is a Selector that selects a packet
deterministically based on the Packet Content, or its treatment, or
functions of these occurring in the Selection State. Two examples
are:

(i) Property Match Filtering: A packet is selected if a specific
field in the packet equals a predefined value.

(ii) Hash-based Selection: A Hash Function is applied to the Packet
Content, and the packet is selected if the result falls in a
specified range.

* Sampling: A Selector that is not a filter is called a Sampling
operation. This reflects the intuitive notion that if the selection
of a packet cannot be determined from its content alone, there must
be some type of Sampling taking place. Sampling operations can be
divided into two subtypes:

(i) Content-independent Sampling, which does not use Packet Content
in reaching Sampling decisions. Examples include systematic Sampling,
and uniform pseudorandom Sampling driven by a pseudorandom number
whose generation is independent of Packet Content. Note that in
content independent Sampling, it is not necessary to access the
Packet Content in order to make the selection decision.

(ii) Content-dependent Sampling, in which the Packet Content is used
in reaching selection decisions. An application is pseudorandom
selection according to a probability that depends on the contents of
a packet field, e.g., Sampling packets with a probability dependent
on their TCP/UDP port numbers. Note that this is not a Filter.

* Hash Domain: A Hash Domain is a subset of the Packet Content and
the packet treatment, viewed as an N-bit string for some positive
integer N.

* Hash Range: A Hash Range is a set of M-bit strings for some
positive integer M that defines the range of values that the result
of the hash operation can take.

* Hash Function: A Hash Function defines a deterministic mapping
from the Hash Domain into the Hash Range.

* Hash Selection Range: A Hash Selection Range is a subset of the
Hash Range. The packet is selected if the action of the Hash
Function on the Hash Domain for the packet yields a result in the
Hash Selection Range.





Cheng, et al             Expires December 23, 2011             [Page 5]Internet-Draft         A Composite IP Packet Selector         June 2011


* Hash-based Selection: A Hash-based Selection is Filtering
specified by a Hash Domain, a Hash Function, a Hash Range, and a
Hash Selection Range.

* Observed Packet Stream: The Observed Packet Stream is the set of
all packets observed at the Observation Point.

* Selected Packet Stream: A Selected Packet Stream denotes a set of
packets from the Observed Packet Stream that flows past some
specified point within the Metering Process. An example of a
Selected Packet Stream is the output of the selection process. Note
that packets selected from a stream, e.g., by Sampling, do not
necessarily possess a property by which they can be distinguished
from packets that have not been selected. For this reason, the term
"stream" is favored over "flow", which is defined as a set of
packets with common properties [RFC3917].

* Non Selected Packet Stream: A Non Selected Packet Stream denotes a
set of packets from the Observed Packet Stream that can not flow
past all specified point within the Metering Process. An example of
a Non Selected Packet Stream is the dropped packet stream of the
selection process. Note that packets not selected from a stream.

* Packet Content: The Packet Content denotes the union of the packet
header (which includes link layer, network layer, and other
encapsulation headers) and the packet payload. At some Observation
Points, the link header information may not be available.

* 5-tuple flow information: Basic information in the packet header:
source IP address, destination IP address, source port number,
destination port number, and port.


3. Composite selector

The composite selector aims at the solution of the high lost rate
problem in the capture of short-flows with the simple sampling
techniques. It is realized by combining a sampling selector using
systematic or random sampling technique followed by a hash-based
filtering selector computing a hash function on 5-tuples information.
This section detail describes its architecture and each component.


3.1. Architecture







Cheng, et al             Expires December 23, 2011             [Page 6]Internet-Draft         A Composite IP Packet Selector         June 2011


            +----------------------------------------+
            | +--------+                             |
            | |        |---Selected Packet Stream ----->
            | |        |                             |
            | | simple |                             |
            | |sampling| Non       +----------+      |
   Observed | |selector| Selected  |hash-based|      |  Selected
   Packet---->|        |-Packet--> |filtering |-------> Packet
   Stream   | |        | Stream    |selector  |      |  Stream
            | +--------+           +----------+      |
            |     Composite Selector                 |
            +----------------------------------------+
          Figure 1: Architecture of A Composite Selector

The composite selector composes two cascaded selector: a simple
sampling selector followed by a specific hash-based filtering
selector. The latter one takes the non selected packet stream of the
previous one as its input.
In the first stage, the sampling selector uses simple systematic or
random sampling technique to select packets from observed packet
stream. If the packet is selected then export it outside, otherwise
forward it to the filtering selector.

In the second stage, the filtering selector computes a hash function
on 5-tuples information of each packet coming from the sampling
selector, and selects the packet whose hash key matching the
predefined patterns.
The input of the composite selector is the observed packet stream
while the output composes two parts. One is the selected packet
stream in the first stage; the other is the non selected stream of
the first stage but selected again in the second stage.


3.2. Simple sampling selector

A sampling selector is targeted at the selection of a representative
subset of packets. The subset is used to infer knowledge about the
whole set of observed packets without processing them all. The
selection can depend on packet position, and/or on Packet Content,
and/or on (pseudo) random decisions.

Because the sampling selector here is the same as what the IPFIX
working group described in RFC 3917, the document doesn't repeatedly
introduce this part.

3.3. Hash-based filtering selector



Cheng, et al             Expires December 23, 2011             [Page 7]Internet-Draft         A Composite IP Packet Selector         June 2011


A normal hash-based filtering selector uses a hash function h to map
the Packet Content c, or some portion of it, onto a Hash Range R.
The packet is selected if h(c) is an element of S, which is a subset
of R called the Hash Selection Range.

To solve the high lost rate problem in the capture of short-flows,
the hash-based filtering selector here should take flow sampling
into account in packet filtering. That is on the basis of 5-tuples
flow information to compute a hash function.

3.4. Algorithm

First of all, the algorithm should predefine a pattern set - a set
of one or more patterns while each pattern definite a hash mapping
range.

On receiving a packet, the filtering selector computes the hash key
of the packet'5-tuple.

Then, it selects the packet if the hash key matching any one pattern
in the set.

3.5. Hash Function

Because applying the hash-based packet Selection, BOB function MUST
be used for packet selection operations in order to be compliant
with PSAMP (RFC 5475).

If a Hash-based Selection with the BOB function is used with IPv4
traffic, the following input bytes MUST be used.
   - IP identification field
   - Flags field
   - Fragment offset
   - Source IP address
   - Destination IP address
   - A configurable number of bytes from the IP payload, starting at
a configurable offset

Due to the lack of suitable IPv6 packet traces, all candidate Hash
Functions in RFC5476 were evaluated only for IPv4. Due to the IPv6
header fields and address structure, it is expected that there is
less randomness in IPv6 packet headers than in IPv4 headers.
Nevertheless, the randomness of IPv6 traffic has not yet been
evaluated sufficiently to get any evidence. In addition to this,
IPv6 traffic profiles may change significantly in the future when
IPv6 is used by a broader community.

If a Hash-based Selection with the BOB function is used with IPv6
traffic, the following input bytes MUST be used.


Cheng, et al             Expires December 23, 2011             [Page 8]Internet-Draft         A Composite IP Packet Selector         June 2011


   - Payload length (2 bytes)
   - Byte number 10,11,14,15,16 of the IPv6 source address
   - Byte number 10,11,14,15,16 of the IPv6 destination address
   - A configurable number of bytes from the IP payload, starting at
a configurable offset. It is recommended to use at least 4 bytes
from the IP payload.

The payload itself is not changing during the path. Even if some
routers process some extension headers, they are not going to strip
them from the packet. Therefore, the payload length is invariant
along the path. Furthermore, it usually differs for different
packets. The IPv6 address has 16 bytes. The first part is the
network part and contains low variation. The second part is the host
part and contains higher variation. Therefore, the second part of
the address is used. Nevertheless, the uniformity has not been
checked for IPv6 traffic.


4. Formal Syntax

The following syntax specification uses the augmented Backus-Naur
Form (BNF) as described in RFC-2234 [2].

5. Security Considerations

Security considerations concerning the choice of a Hash Function for
Hash-based Selection. Furthermore, the Hash Function has a number of
potential attacks to craft Packet Streams that are
disproportionately detected and/or discover the Hash Function
parameters, the vulnerabilities of different Hash Functions to these
attacks, and practices to minimize these vulnerabilities.

In addition to this, a user can gain knowledge about the start and
stop triggers in time-based systematic Sampling, e.g., by sending
test packets. This knowledge might allow users to modify their send
schedule in a way that their packets are disproportionately selected
or not selected.

For random Sampling, a cryptographically strong random number
generator should be used in order to prevent that an advisory can
predict the selection decision.

Further security threats can occur when Sampling parameters are
configured or communicated to other entities. The configuration and
reporting of Sampling parameters are out of scope of this document.
Therefore, the security threats that originate from this kind of
communication cannot be assessed with the information given in this
document.



Cheng, et al             Expires December 23, 2011             [Page 9]Internet-Draft         A Composite IP Packet Selector         June 2011


Some of these threats can probably be addressed by keeping
configuration information confidential and by authenticating
entities that configure Sampling. Nevertheless, a full analysis and
assessment of threats for configuration and reporting has to be done
if configuration or reporting methods are proposed.


References

[1] Bradner, S., "The Internet Standards Process-Revision 3", BCP 9,
RFC 2026, October 1996.
[2] Crocker, D. and Overell, P(Editors), "Augmented BNF for Syntax
Specifications:ABNF", RFC 2234, Internet Mail Consortium and Demon
Internet Ltd, November 1997.
[3] J.Quittek, T.Zseby, B.Claise and S.Zander, "Requirements for IP
Flow Information Export (IPFIX)", RFC 3917, October 2004
[4] Duffield, N., Ed., "A Framework for Packet Selection and
Reporting", RFC 5474, March 2009.
[5] Zseby, T., Molina, M., Duffield, D., Niccolini, S., and F.
Rapall, "Sampling and Filtering Techniques for IP Packet Selection",
RFC 5475, March 2009.
[6] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol
Specifications", RFC 5476, March 2009.
[7] Vyas Sekar, Michael K Reiter, Hui Zhang, "Revisiting the Case
for a Minimalist Approach for Network Flow Monitoring", In Proc.IMC,
November 2010.

Acknowledgments

This work is materially supported by the National Key Technology
Program of China under Grant No.2008BAH37B04, the National Grand
Fundamental Research 973 program of China under Grant No.
2009CB320505, the National Nature Science Foundation of China under
Grant No. 60973123.


Author's Addresses


Guang Cheng
School of Computer Science and Engineering
Southeast University
Sipailou No.2, Nanjing, P.R.China
Phone: +86 25 83794000
Email: gcheng@njnet.edu.cn

Jian Gong
School of Computer Science and Engineering
Southeast University


Cheng, et al             Expires December 23, 2011             [Page 10]Internet-Draft         A Composite IP Packet Selector         June 2011


Sipailou No.2, Nanjing, P.R.China
Phone: +86 25 83794000
Email: jgong@njnet.edu.cn

Weiwei Zhang
School of Computer Science and Engineering
Southeast University
Sipailou No.2, Nanjing, P.R.China
Phone: +86 25 83794000
Email: wwzhang@njnet.edu.cn

Hua Wu
School of Computer Science and Engineering
Southeast University
Sipailou No.2, Nanjing, P.R.China
Phone: +86 25 83794000
Email: hwu@njnet.edu.cn


































Cheng, et al             Expires December 23, 2011             [Page 11]