IPFIX Working Group                                            E. Boschi
Internet-Draft                                               B. Trammell
Intended status: Experimental                             Hitachi Europe
Expires: January 15, 2009                                  July 14, 2008


                     IP Flow Anonymisation Support
                     draft-boschi-ipfix-anon-01.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 15, 2009.

Abstract

   This document describes anonymisation techniques for IP flow data.
   It provides a categorization of common anonymisation schemes and
   defines the parameters needed to describe them.  It describes support
   for anonymization within the IPFIX protocol, providing the basis for
   the definition of information models for configuring anonymisation
   techniques within an IPFIX Metering or Exporting Process, and for
   reporting the technique in use to an IPFIX Collecting Process.








Boschi & Trammell       Expires January 15, 2009                [Page 1]


Internet-Draft        IP Flow Anonymisation Support            July 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  3
     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Categorisation of Anonymisation Techniques . . . . . . . . . .  4
   4.  Anonymisation of IP Flow Data  . . . . . . . . . . . . . . . .  5
     4.1.  IP Address Anonymisation . . . . . . . . . . . . . . . . .  6
       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . .  7
       4.1.2.  Random Permutations  . . . . . . . . . . . . . . . . .  7
       4.1.3.  Prefix-preserving Pseudonymisation . . . . . . . . . .  7
     4.2.  Timestamp Anonymisation  . . . . . . . . . . . . . . . . .  7
       4.2.1.  Precision Degradation  . . . . . . . . . . . . . . . .  7
       4.2.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . .  7
       4.2.3.  Random Time Shifts . . . . . . . . . . . . . . . . . .  8
     4.3.  Counter Anonymisation  . . . . . . . . . . . . . . . . . .  8
       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . .  8
       4.3.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . .  8
       4.3.3.  Random Noise Addition  . . . . . . . . . . . . . . . .  8
     4.4.  Anonymisation of Other Flow Fields . . . . . . . . . . . .  9
   5.  Parameters for the Description of Anonymisation Techniques . .  9
   6.  Anonymisation Support in IPFIX . . . . . . . . . . . . . . . .  9
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     9.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 10
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10
   Intellectual Property and Copyright Statements . . . . . . . . . . 12





















Boschi & Trammell       Expires January 15, 2009                [Page 2]


Internet-Draft        IP Flow Anonymisation Support            July 2008


1.  Introduction

   The standardisation of an IP flow information export protocol
   [RFC5101] and associated representations removes a technical barrier
   to the sharing of IP flow data across organizational boundaries and
   with network operations, security, and research communities for a
   wide variety of purposes.  However, with wider dissemination comes
   greater risks to the privacy of the users of networks under
   measurement, and to the security of those networks.  While it is not
   a complete solution to the issues posed by distribution of IP flow
   information, anonymisation is an important tool for the protection of
   privacy within network measurement infrastructures.

   This document presents a mechanism for representing anonymised data
   within IPFIX and guidelines for using it.  It begins with a
   categorization of anonymisation techniques.  It then describes
   applicability of each technique to commonly anonymisable fields of IP
   flow data, organized by information element data type and semantics
   as in [RFC5102]; enumerates the parameters required by each of the
   applicable anonymisation techniques; and provides guidelines for the
   use of each of these techniques in accordance with best practices in
   data protection.  Finally, it specifies a mechanism for exporting
   anonymised data and binding anonymisation metadata to templates using
   IPFIX Options.

1.1.  IPFIX Protocol Overview

   In the IPFIX protocol, { type, length, value } tuples are expressed
   in templates containing { type, length } pairs, specifying which {
   value } fields are present in data records conforming to the
   Template, giving great flexibility as to what data is transmitted.
   Since Templates are sent very infrequently compared with Data
   Records, this results in significant bandwidth savings.  Various
   different data formats may be transmitted simply by sending new
   Templates specifying the { type, length } pairs for the new data
   format.  See [RFC5101] for more information.

   The IPFIX information model [RFC5102] defines a large number of
   standard Information Elements which provide the necessary { type }
   information for Templates.  The use of standard elements enables
   interoperability among different vendors' implementations.
   Additionally, non-standard enterprise-specific elements may be
   defined for private use.

1.2.  IPFIX Documents Overview

   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
   Flow Information" [RFC5101] and its associated documents define the



Boschi & Trammell       Expires January 15, 2009                [Page 3]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   IPFIX Protocol, which provides network engineers and administrators
   with access to IP traffic flow information.

   "Architecture for IP Flow Information Export" [I-D.ietf-ipfix-arch]
   defines the architecture for the export of measured IP flow
   information out of an IPFIX Exporting Process to an IPFIX Collecting
   Process, and the basic terminology used to describe the elements of
   this architecture, per the requirements defined in "Requirements for
   IP Flow Information Export" [RFC3917].  The IPFIX Protocol document
   [RFC5101] then covers the details of the method for transporting
   IPFIX Data Records and Templates via a congestion-aware transport
   protocol from an IPFIX Exporting Process to an IPFIX Collecting
   Process.

   "Information Model for IP Flow Information Export" [RFC5102]
   describes the Information Elements used by IPFIX, including details
   on Information Element naming, numbering, and data type encoding.
   Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the
   various applications of the IPFIX protocol and their use of
   information exported via IPFIX, and relates the IPFIX architecture to
   other measurement architectures and frameworks.

   This document references the Protocol and Architecture documents for
   terminology and extends the IPFIX Information Model to provide new
   Information Elements for anonymisation metadata.


2.  Terminology

   Terms used in this document that are defined in the Terminology
   section of the IPFIX Protocol [RFC5101] document are to be
   interpreted as defined there.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


3.  Categorisation of Anonymisation Techniques

   Anonymisation modifies a data set in order to protect the identity of
   the people or entities described by the data set from disclosure.
   With respect to network traffic data, anonymisation generally
   attempts to preserve some set of properties of the network traffic
   useful for a given application or applications, while ensuring the
   data cannot be traced back to the specific networks, hosts, or users
   generating the traffic.




Boschi & Trammell       Expires January 15, 2009                [Page 4]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   Anonymisation may be broadly split into three categories:
   generalisation and reversible or irreversible substitution.  When
   generalisation is used, identifying information is grouped in sets,
   and one single value is used to identify each set element.  In
   effect, this causes multiple records to become indistinguishable,
   thereby aggregating them together.  Generalisation is an irreversible
   operation, in that the information needed to identify a single record
   from its "generalised value" is lost.

   Substitution (or pseudonymization) maps the real space of identifiers
   or values into a separate, replacement space, using some substitution
   function.  If the substitution function is invertible or can
   otherwise be reversed, then the substitution is reversible, and a
   real identifier can be recovered from a given replacement identifier.
   This allows to keep different elements distinguishable from each
   other: the number of different elements in the real and the
   replacement space is the same.

   Irreversible substitution results when a randomising or one-way
   function is used to map the value space; real identifiers cannot be
   recovered in an irreversible substitution.  The number of different
   elements in the real and replacement spaces is not necessarily the
   same.


4.  Anonymisation of IP Flow Data

   Due to the restricted semantics of IP flow data, there are a
   relatively limited set of specific anonymisation techniques available
   on flow data, though each falls into the broad categories above.
   Each type of field that may commonly appear in a flow record may have
   its own applicable specific techniques.

   While anonymisation is generally applied at the resolution of single
   fields within a flow record, attacks against anonymisation use entire
   flows and relationships between hosts and flows within a given data
   set.  Therefore, fields which may not necessarily be identifying by
   themselves may be anonymised in order to increase the anonymity of
   the data set as a whole.

   Of all the fields in an IP flow record, only IP addresses directly
   identify entities in the real world.  Each IP address is associated
   with an interface on a network host, and can potentially be
   identified with a single user.  Additionally, IP addresses are
   structured identifiers; that is, partial IP address prefixes may be
   used to identify networks just as full IP addresses identify hosts.
   This makes anonymisation of IP addresses particularly important.




Boschi & Trammell       Expires January 15, 2009                [Page 5]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   Port numbers identify abstract entities (applications) as opposed to
   real-world entities, but they can be used to classify hosts and user
   behavior.  Passive port fingerprinting, both of well-known and
   ephemeral ports, can be used to determine the operating system
   running on a host.  Relative data volumes by port can also be used to
   determine the host's function (workstation, web server, etc.); this
   information can be used to identify hosts and users.

   While not identifiers in and of themselves, timestamps and counters
   can reveal the behavior of the hosts and users on a network.  Any
   given network activity is recognizable by a pattern of relative time
   differences and data volumes in the associated sequence of flows,
   even without host address information.  They can therefore be used to
   identify hosts and users.  Timestamps and counters are also
   vulnerable to traffic injection attacks, where traffic with a known
   pattern is injected into a network under measurement, and this
   pattern is later identified in the anonymised data set.

   The simplest and most extreme form of anonymisation, which can be
   applied to any field of a flow record, is black-marker anonymisation,
   or complete deletion of a given field.  While black-marker
   anonymisation completely protects the data in the deleted fields from
   the risk of disclosure, it also reduces the utility of the anonymised
   data set as a whole.  Techniques that retain some information while
   reducing (though not eliminating) the disclosure risk will be
   extensively discussed in the following sections; note that the
   techniques specifically applicable to IP addresses, timestamps, and
   counters will be discussed in separate sections.

4.1.  IP Address Anonymisation

   The following table gives an overview of the schemes for IP address
   anonymization described in this document and their categorization.

   +----------------------------------+----------------+---------------+
   | Scheme                           | Action         | Reversibility |
   +----------------------------------+----------------+---------------+
   | Truncation                       | Generalisation | N             |
   | Random Permutation               | Substitution   | Y/N           |
   | Prefix-preserving                | Substitution   | Y             |
   | Pseudonymisation                 |                |               |
   +----------------------------------+----------------+---------------+

   Note that random permutations might be either reversible or not,
   depending on the function used.






Boschi & Trammell       Expires January 15, 2009                [Page 6]


Internet-Draft        IP Flow Anonymisation Support            July 2008


4.1.1.  Truncation

   Truncation removes "n" of the least significant bits from an IP
   Address.  Note that truncating 8 bits would replace an IP Address
   with the corresponding class C network address.

4.1.2.  Random Permutations

   When random permutations are used, each IP Address is replaced with a
   random permutation on the set of possible IP Addresses.  The
   permutation function can be implemented using hash tables.

4.1.3.  Prefix-preserving Pseudonymisation

   Prefix-preserving pseudonymisation preserves the structure of IP
   Addresses.  If two IP Addresses match on a prefix of "n" bits, their
   anonymised versions will match on a prefix of "n" bits too.

4.2.  Timestamp Anonymisation

   [TODO: introductory text]

        +-----------------------+----------------+---------------+
        | Scheme                | Action         | Reversibility |
        +-----------------------+----------------+---------------+
        | Precision Degradation | Generalisation | N             |
        | Enumeration           | Substitution   | Y             |
        | Random Shifts         | Substitution   | Y             |
        +-----------------------+----------------+---------------+

4.2.1.  Precision Degradation

   Precision Degradation removes the most precise components of a
   timestamp, accounting all events occurring in each given interval
   (e.g. one millisecond for millisecond level degradation) as
   simultaneous.  This has the effect of potentially collapsing many
   timestamps into one.  With this technique time precision is reduced,
   and sequencing may be lost, but the information at which time the
   event happened is kept.

4.2.2.  Enumeration

   Enumeration keeps the chronological order in which events occurred
   while eliminating time information.  Timestamps are substituted by
   equidistant timestamps (or numbers) starting from an rendomly chosen
   start value.





Boschi & Trammell       Expires January 15, 2009                [Page 7]


Internet-Draft        IP Flow Anonymisation Support            July 2008


4.2.3.  Random Time Shifts

   Random Time Shifts keep the information on how far apart two events
   are from each other.  This is achieved by shifting all timestamps by
   the same random number.  Note that random time shifts also preserve
   chronological order.

4.3.  Counter Anonymisation

   Counters (such as packet and octet volumes per flow) are subject to
   fingerprinting and injection attacks against anonymisation, as
   timestamps are, but relative magnitudes of activity can be useful for
   certain analysis tasks.  [TODO: more intro text]

        +-----------------------+----------------+---------------+
        | Scheme                | Action         | Reversibility |
        +-----------------------+----------------+---------------+
        | Precision Degradation | Generalisation | N             |
        | Binning               | Generalisation | N             |
        | Random noise addition | Substitution   | N             |
        +-----------------------+----------------+---------------+

4.3.1.  Precision Degradation

   As with precision degradation in timestamps, precision degradation of
   counters removes lower-order bits of the counters, treating all the
   counters in a given range as having the same value.  Depending on the
   precision reduction, this loses information about the relationships
   between sizes of similarly-sized flows, but keeps relative magnitude
   information.

4.3.2.  Binning

   Binning can be seen as a special case of precision degradation; the
   operation is identical, except for in precision degradation the
   counter ranges are uniform, and in binning they need not be.  For
   example, a common counter binning scheme for packet counters could be
   to bin values 1-2 together, and 3-infinity together, thereby
   separating potentially completely-opened TCP connections from
   unopened ones.  Binning schemes are generally chosen to keep
   precisely the amount of information required in a counter for a given
   analysis task

4.3.3.  Random Noise Addition

   Random noise addition adds a random amount to a counter in each flow;
   this is used to keep relative magnitude information and minimize the
   disruption to size relationship information while avoiding



Boschi & Trammell       Expires January 15, 2009                [Page 8]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   fingerprinting attacks against anonymization.

4.4.  Anonymisation of Other Flow Fields

   [TODO: as section 4.1]


5.  Parameters for the Description of Anonymisation Techniques

   [TODO: see corresponding section of draft-ietf-psamp-sample-tech for
   the proposed structure of this section.]


6.  Anonymisation Support in IPFIX

   [TODO: Here we'll describe how the information specified above can be
   transmitted on the wire using an option template.  The idea is to
   scope the option to the Template ID and for each field specify which
   are anonymised, providing info on the output characteristics of the
   technique, and which ones aren't.]

   [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the
   same time is indicated with multiple elements of the same type (in
   application order as in PSAMP)]

   [EDITOR'S NOTE: for blackmarking we'll recommend not to export the
   information at all following the data protection law principle that
   only necessary information should be exported.]


7.  Security Considerations

   [TODO: write this section.]


8.  IANA Considerations

   This document contains no actions for IANA.


9.  References

9.1.  Normative References

   [RFC5101]  Claise, B., "Specification of the IP Flow Information
              Export (IPFIX) Protocol for the Exchange of IP Traffic
              Flow Information", RFC 5101, January 2008.




Boschi & Trammell       Expires January 15, 2009                [Page 9]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
              Meyer, "Information Model for IP Flow Information Export",
              RFC 5102, January 2008.

9.2.  Informative References

   [I-D.ietf-ipfix-arch]
              Sadasivan, G. and N. Brownlee, "Architecture Model for IP
              Flow Information Export", draft-ietf-ipfix-arch-02 (work
              in progress), October 2003.

   [I-D.ietf-ipfix-as]
              Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12
              (work in progress), July 2007.

   [I-D.ietf-ipfix-architecture]
              Sadasivan, G., "Architecture for IP Flow Information
              Export", draft-ietf-ipfix-architecture-12 (work in
              progress), September 2006.

   [I-D.ietf-ipfix-reducing-redundancy]
              Boschi, E., "Reducing Redundancy in IP Flow Information
              Export (IPFIX) and Packet  Sampling (PSAMP) Reports",
              draft-ietf-ipfix-reducing-redundancy-04 (work in
              progress), May 2007.

   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
              "Requirements for IP Flow Information Export (IPFIX)",
              RFC 3917, October 2004.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.


Authors' Addresses

   Elisa Boschi
   Hitachi Europe
   c/o ETH Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Phone: +41 44 632 70 57
   Email: elisa.boschi@hitachi-eu.com






Boschi & Trammell       Expires January 15, 2009               [Page 10]


Internet-Draft        IP Flow Anonymisation Support            July 2008


   Brian Trammell
   Hitachi Europe
   c/o ETH Zurich
   Gloriastrasse 35
   8092 Zurich
   Switzerland

   Phone: +41 44 632 70 13
   Email: brian.trammell@hitachi-eu.com










































Boschi & Trammell       Expires January 15, 2009               [Page 11]


Internet-Draft        IP Flow Anonymisation Support            July 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.











Boschi & Trammell       Expires January 15, 2009               [Page 12]