Internet Draft
  Document: <draft-niccolini-hash-descr-00.txt>
  Expires: April 2004
                                                             S. Niccolini
                                                       University of Pisa
                                                                M. Molina
                                                          NEC Europe Ltd.
                                                            Nick Duffield
                                                     AT&T Labs -
                                                                - Research
  
                                                             October 2003
  
  
     Hash functions description for packet selection
  
  
  Status of this Memo
  
     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.
  
     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups. Note that
     other groups may also distribute working documents as Internet-
     Drafts. Internet-Drafts are draft documents valid for a maximum of
     six months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."
  
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt
  
     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.
  
  Abstract
  
     This document is concerned with the specification and properties of
     Hash Functions for packet selection in PSAMP. Having a detailed and
     exhaustive description of the Hash Function, in terms of input
     parameters, output value, selection range and internal operations
     is fundamental for coherent configuration of the same hash based
     selection at different points in the network. This coherency is
     fundamental for all network measurement applications needing
     consistent packet selection.
  
  Table of Contents
  
     1.   Introduction.................................................2
     1.1  Hash-based Packet Selection..................................2
     1.2  Consistent Packet Selection..................................3
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 1]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     2.   Criteria for PSAMP Hash Functions............................3
     3.   Hash function description....................................4
     3.1  MMH (Multilinear Modular Hashing) function...................4
     3.1.1 Input parameters.............................................5
     3.1.2 Built in parameters..........................................5
     3.1.3 Output.......................................................5
     3.1.4 Functioning..................................................5
     3.2  "Bob" hash function..........................................7
     3.2.1 Input Parameters.............................................7
     3.2.2 Built in parameters..........................................7
     3.2.3 Output.......................................................8
     3.2.4 Functioning..................................................8
     4.   References..................................................10
     5.   Author's Addresses..........................................11
     6.   Intellectual Property Statement.............................11
     7.   Full Copyright Statement....................................11
  
  
  1. Introduction
  
     This document is concerned with the specification and properties of
     Hash Functions for packet selection in PSAMP. Having a detailed and
     exhaustive description of the Hash Function, in terms of input
     parameters, output value, selection range and internal operations
     is fundamental for coherent configuration of the same hash based
     selection at different points in the network. This coherency is
     fundamental for all network measurement applications needing
     consistent packet selection.
  
  1.1 Hash-based Packet Selection
  
  
     The PSAMP framework document for passive packet measurement [DuFw]
     divides packet selection techniques in two broad classes: sampling
     and filtering. Filtering is defined as follows:
  
          ææa filter is a selection operation that selects a packet
          deterministically based on the packet content, its treatment,
          and functions of these occurring in the selection state.
          Examples include match/mask filtering, and hash-based
          selection.ÆÆ
  
     The deterministic property is of particular importance for
     measurement applications that require the consistent selection of
     packet at multiple points in the network. This is described more
     fully in the next section.
  
     This document is concerned with the specification and properties of
     hash functions to be used for packet selection in PSAMP. We start
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 2]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     by recalling the following definitions of the components of hash-
     based selection from [DuFw].
  
       * Hash-based selection: a filter specified by a hash domain, a
       hash function, and hash range and a hash selection range.
  
       * Hash domain: a subset of the packet content and the packet
       treatment, viewed as an N-bit string for some positive integer N.
  
       * Hash range: a set of M-bit strings for some positive integer M.
  
       * Hash function: a deterministic map from the hash domain into
       the hash range.
  
       * Hash selection range: a subset of the hash range. The packet is
       selected if the action of the hash function on the hash domain
       for the packet yields a result in the hash selection range.
  
  1.2 Consistent Packet Selection
  
     Hash functions are deterministic in nature, in the sense that they
     always produce the same output from a given input. This is a useful
     property for some network measurement applications, since it
     enables the consistent selection of a given packet at different
     points in the network, in a sense we now describe.
     Suppose that a hash-based selector operates at multiple points on a
     packetÆs path. Consider the following conditions:
  
          1. The same hash function is employed at each point
          2. The hash domain is the same at each point and contains only
             invariant portions of the packet header and/or payload,
             i.e., those portions that do not change from point to point
          3. the hash selection range is the same at each point.
  
     If, and only if, conditions 1,2, and 3 are all satisfied, then the
     hash-based selectors at each point are consistent in the sense that
     a given packet is selected at either all the points or at none.
     Consistent packet selection using hash functions was proposed in
     [DuGr01].
  
     This document will specify (in its final form) certain hash
     functions suitable to be used in the PSAMP protocol.
  
  2. Criteria for PSAMP Hash Functions.
  
     The number of hash functions that could conceivably be used for
     consistent packet selection is virtually infinite. However, for the
     applications considered in [DuFw], two properties are particularly
     important: computation speed and mixing strength. The latter means
     that small changing in the input should produce big changes in the
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 3]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     output (the hash value). Jointly satisfying these two properties is
     difficult, therefore the choice of a good hash function often
     implies a trade off decision.
  
     The purpose of this document is to give a formal description of
     some hash function rather that indicate which is the best one.
     Nevertheless, we have included some that, according to tests
     performed [NiTa03], are candidates to satisfy both properties.
     Other hash functions may be added in later version of this draft,
     along with better indications about their relative performances.
  
     The description of each hash functions that we propose in this
     draft comprises the following items:
  
        1. input parameters;
        2. built-in parameters;
        3. output (including selection range);
        4. hash function operation;
  
     The first three items need to be formalized in the information
     model for packet selection [ZeTech] and the PSAMP MIB [DiRoCla], in
     order that the hash function, consistently configured, can be
     activated in the desired measurement points. The last item provides
     a consistent reference for the implementation of the hash function
     by defining the precise mapping that the hash function makes from
     its input to its output. Some reference code is included for this
     purpose.
  
  3. Hash function description
  
  3.1  MMH (Multilinear Modular Hashing) function
  
     MMH is a hash function suitable for fast software implementation
     able to hash a string of variable size [HaKr97].
     In order to achieve fast software implementation while preserving
     the low collision probability, the MMH hash function is built
     implementing a division-less modular reduction. For sake of
     simplicity, we refer to the implementation done on 32-bit word
     machines. The generalization of such implementation may be
     considered for adapting the MMH to different word length machines,
     but the reference code presented in 3.1.4 is optimized for 32-bit
     word machines and requires a compiler that supports the
     multiplication of two 32-bit words and feeds the result into a 64
     bit string (long long type)
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 4]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
  3.1.1 Input parameters
  
     The only input parameter of the MMH is:
  
     - the length of the input string (key) to be hashed, in bytes. The
        MMH operates on elementary blocks of 32 bits, therefore the key
        length must be a multiple of 4 (if the key is not a multiple of
        4, it must be zero-padded)
  
  3.1.2 Built in parameters
  
     The MMH uses the following built-in parameters:
  
     - ro: a prime number in the range of [2^32 ; 2^32 + 2^16]. It is
        used to implement the division-less modular reduction;
     - A vector of (different) 32-bit numbers used to implement the so
        called inner-product operation. A good choice is to take only
        prime numbers;
     - The number of elements of such a vector, which must be at least
        equal to the key length divided by 4.
  
  3.1.3 Output
  
     The output of the MMH function is a 32-bit number. It should be
     specified:
     - A 32 bit mask to apply to the output
     - The selection range as a list of non overlapping intervals [start
        value, end value] where value is in [0,2^32]
  
  3.1.4 Functioning
  
     To obtain the hash value, MMH first computes two quantities called
     inner-product-low and inner-product-high, respectively. These are
     obtained adding together the results of the multiplication of the
     words of the input key by the built-in vector.
     The two inner products computed are 64-bit long, they need to be
     firstly modular reduced to the prime number range ([0 ; ro]) and
     then to the 32-bit range ([0 ; 2^32]). The output of the reduction
     (and of the function itself) is the hash value.
  
     Reference code:
  
  /*----------------------mmh hash function ----------------------------
  -*/
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 5]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
  /*
  */
  /* The mmh_table size limits the maximum key length that can be hashed
  */
  /* With a table of size T we can hash keys up to 4*T bytes long keys
  */
  /*
  */
  /*--------------------------------------------------------------------
  -*/
  
  #define DO(i) sum += key2[i] * (unsigned long long) mmh_table[i]
  
  /* mmh signature table with the first 40 prime numbers: */
  /* we can hash string up to 160 char long               */
  static unsigned long mmh_table[40] =
    {   2,      3,      5,      7,     11,
       13,     17,     19,     23,     29,
       31,     37,     41,     43,     47,
       53,     59,     61,     67,     71,
       73,     79,     83,     89,     97,
      101,    103,    107,    109,    113,
      127,    131,    137,    139,    149,
      151,    157,    163,    167,    173};
  
  /**
   **  FUNCTION: mmh
   **  PARAMETERS:
   **  - key: pointer to the char string to be hashed
   **  - len: length of the char string to be hashed
   **  RETURN: hashvalue in an unsigned long variable
   **  PURPOSE: Accepts a pointer to a string to be hashed
   **  and returns an unsigned long.
   **  NOTE: using this function requires that
   **  the key is a 4-byte multiple otherwise
   **  it gives erroneus results. The padding with 0x0 bytes
   **  of keys non multiple of 4 must be performed
   **  before calling this function!
   **/
  unsigned long mmh(char *key, unsigned short len)
    {
      unsigned short      i;
      signed long long    stmp;
      unsigned long long  utmp;
  
      unsigned long long  sum = 0LL;  /*running sum*/
  
      unsigned long       ret;        /*return value*/
  
      unsigned long       *key2 = (unsigned long *)key;
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 6]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
  
      for (i=0; i<(len/4); i++)
        DO(i);
  
      /*********return (sum % 0x10000000fLL);***************/
  
      stmp = (sum & 0xffffffffLL) - ((sum >> 32) * 15);       /*lo - hi
  * 15 */
      utmp = (stmp & 0xffffffffLL) - ((stmp >> 32) * 15);     /*lo - hi
  * 15 */
  
      ret = utmp & 0xffffffff;
  
      if (utmp > 0x10000000fLL)     /* if larger than p - subtract 15
  again */
        ret -=15;
  
      return ret;
    }
  
  3.2  "Bob" hash function
  
     "Bob" hash function is a hash function designed for having each bit
     of the input affecting every bit of the return value and using both
     1-bit and 2-bit deltas to achieve the so called avalanche effect
     [Jenk97]. This function was originally built for hash table lookup
     with fast software implementation.
  
  3.2.1 Input Parameters
  
     The input parameters of such a function are:
     - the length of the input string (key) to be hashed, in bytes. The
        elementary input blocks of Bob hash are the single bytes,
        therefore no padding is needed.
     - an init value (an arbitrary 32-bit number).
  
  3.2.2 Built in parameters
  
     The Bob Hash uses the following built-in parameter:
  
     - the golden ratio (an arbitrary 32-bit number used in the hash
        function computation: its purpose is to avoid mapping all zeros
        to all zeros);
  
     Note: the mix sub-function (see mix (a,b,c) macro in the reference
     code in 3.2.4) has a number of parameters governing the shifts in
     the registers. The one presented is not the only possible choice.
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 7]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     It is an open point whether these may be considered additional
     built-in parameters to specify at function configuration.
  
  3.2.3 Output
  
     The output of the MMH function is a 32-bit number. It should be
     specified:
     - A 32 bit mask to apply to the output
     - The selection range as a list of non overlapping intervals [start
        value, end value] where value is in [0,2^32]
  
  3.2.4 Functioning
  
     The hash value is obtained computing first an initialization of an
     internal state (composed of 3 32-bit numbers, called a, b, c in the
     reference code below), then, for each input byte of the key the
     internal state is combined by addition and mixed using the mix sub-
     function. Finally, the internal state mixed one last time and the
     third number of the state (c) is chosen as the return value.
  
  typedef  unsigned long  int  ub4;   /* unsigned 4-byte quantities */
  typedef  unsigned       char ub1;   /* unsigned 1-byte quantities */
  
  #define hashsize(n) ((ub4)1<<(n))
  #define hashmask(n) (hashsize(n)-1)
  
  /*
  --------------------------------------------------------------------
  mix -- mix 3 32-bit values reversibly.
  For every delta with one or two bits set, and the deltas of all three
    high bits or all three low bits, whether the original value of a,b,c
    is almost all zero or is uniformly distributed,
  * If mix() is run forward or backward, at least 32 bits in a,b,c
    have at least 1/4 probability of changing.
  * If mix() is run forward, every bit of c will change between 1/3 and
    2/3 of the time.  (Well, 22/100 and 78/100 for some 2-bit deltas.)
  mix() was built out of 36 single-cycle latency instructions in a
    structure that could supported 2x parallelism, like so:
        a -= b;
        a -= c; x = (c>>13);
        b -= c; a ^= x;
        b -= a; x = (a<<8);
        c -= a; b ^= x;
        c -= b; x = (b>>13);
        ...
    Unfortunately, superscalar Pentiums and Sparcs can't take advantage
    of that parallelism.  They've also turned some of those single-cycle
    latency instructions into multi-cycle latency instructions
  --------------------------------------------------------------------
  */
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 8]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
  #define mix(a,b,c)  \
  { \
    a -= b; a -= c; a ^= (c>>13); \
    b -= c; b -= a; b ^= (a<<8); \
    c -= a; c -= b; c ^= (b>>13); \
    a -= b; a -= c; a ^= (c>>12);  \
    b -= c; b -= a; b ^= (a<<16); \
    c -= a; c -= b; c ^= (b>>5); \
    a -= b; a -= c; a ^= (c>>3);  \
    b -= c; b -= a; b ^= (a<<10); \
    c -= a; c -= b; c ^= (b>>15); \
  }
  
  /*
  --------------------------------------------------------------------
  hash() -- hash a variable-length key into a 32-bit value
    k       : the key (the unaligned variable-length array of bytes)
    len     : the length of the key, counting by bytes
    initval : can be any 4-byte value
  Returns a 32-bit value.  Every bit of the key affects every bit of
  the return value.  Every 1-bit and 2-bit delta achieves avalanche.
  About 6*len+35 instructions.
  
  The best hash table sizes are powers of 2.  There is no need to do
  mod a prime (mod is sooo slow!).  If you need less than 32 bits,
  use a bitmask.  For example, if you need only 10 bits, do
    h = (h & hashmask(10));
  In which case, the hash table should have hashsize(10) elements.
  
  If you are hashing n strings (ub1 **)k, do it like this:
    for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);
  
  By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use this
  code any way you wish, private, educational, or commercial.  It's free.
  
  See http://burtleburtle.net/bob/hash/evahash.html
  Use for hash table lookup, or anything where one collision in 2^^32 is
  acceptable.  Do NOT use for cryptographic purposes.
  --------------------------------------------------------------------
  */
  
  ub4 bob_hash(k, length, initval)
  register ub1 *k;        /* the key */
  register ub4  length;   /* the length of the key */
  register ub4  initval;  /* an arbitrary value */
  {
     register ub4 a,b,c,len;
  
     /* Set up the internal state */
     len = length;
     a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
     c = initval;         /* another arbitrary value */
  
     /*------------------------------------ handle most of the key */
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 9]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     while (len >= 12)
     {
        a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24));
        b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24));
        c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24));
        mix(a,b,c);
        k += 12; len -= 12;
     }
  
     /*---------------------------- handle the last 11 bytes */
     c += length;
     switch(len)       /* all the case statements fall through */
     {
     case 11: c+=((ub4)k[10]<<24);
     case 10: c+=((ub4)k[9]<<16);
     case 9 : c+=((ub4)k[8]<<8);
        /* the first byte of c is reserved for the length */
     case 8 : b+=((ub4)k[7]<<24);
     case 7 : b+=((ub4)k[6]<<16);
     case 6 : b+=((ub4)k[5]<<8);
     case 5 : b+=k[4];
     case 4 : a+=((ub4)k[3]<<24);
     case 3 : a+=((ub4)k[2]<<16);
     case 2 : a+=((ub4)k[1]<<8);
     case 1 : a+=k[0];
       /* case 0: nothing left to add */
     }
     mix(a,b,c);
     /*-------------------------------- report the result */
     return c;
  }
  
  
  4. References
  
     [DuFw] N. Duffield: A Framework for Passive Packet Measurement,
              Internet Draft, <draft-ietf-psamp-framework-03.txt>, work
              in progress, June 2003
  
     [DuGr01] N. G. Duffield and M. Grossglauser, Trajectory Sampling
              for Direct Traffic Observation, IEEE/ACM Trans. on
              Networking, 9(3), 280-292, June 2001.
  
     [ZeTech] T.Zseby, M.Molina, F.Raspall, N.Duffield: Sampling and
              Filtering Techniques for IP Packet Selection, Internet
              Draft < draft-ietf-psamp-sample-tech-02.txt>, work in
              progress, June 2003
  
     [DiRoCla] T.Dietz, D.Romascanu, B. Claise: Definitions of Managed
              Objects for Packet Sampling, Internet Draft <draft-ietf-
              psamp-mib-00.txt>, work in progress, June 2003
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 10]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     [Jenk97] B. Jenkins: Algorithm Alley, Dr. Dobb's Journal, September
              1997. http://burtleburtle.net/bob/hash/doobs.html
  
     [HaKr97] S. Halevi, H. Krawczyk : MMH: Software Message
              Authentication in the Gbit/second Rates, LNCS vol. 1267,
              Springer, 1997. Pages 172-189
  
     [NiTa03] S. Niccolini, S. Tartarelli, F. Raspall, M. Molina : On
              time synchronization and hashing for passive One-Way Delay
              measurements, work in progress, available at
              http://www.ccrle.nec.de/mmolina/papers/paper_hash_sync.pdf
  
  5. Author's Addresses
  
     Saverio Niccolini
     University of Pisa
     Via Caruso 1
     56122 Pisa
     Italy
     Phone: +39 050 2217-592
     EMail: s.niccolini@iet.unipi.it
  
     Maurizio Molina
     NEC Europe Ltd., Network Laboratories
     Adenauerplatz 6
     69115 Heidelberg
     Germany
     Phone: +49 6221 90511-18
     Email: molina@ccrle.nec.de
  
     Nick Duffield
     AT&T Labs - Research
     Room B-139
     180 Park Ave
     Florham Park NJ 07932, USA
     Phone: +1 973-360-8726
     Email: duffield@research.att.com
  
  6. Intellectual Property Statement
  
     AT&T Corporation may own intellectual property applicable to this
     contribution. The IETF has been notified of AT&T's licensing intent
     for the specification contained in this document. See
     http://www.ietf.org/ietf/IPR/ATT-GENERAL.txt for AT&T's IPR
  
     statement.
  7. Full Copyright Statement
  
     Copyright (C) The Internet Society (2002). All Rights Reserved.
     This document and translations of it may be copied and furnished to
     others, and derivative works that comment on or otherwise explain
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 11]


  Internet Draft Hash functions description for packet selection,
  October 2003
  
  
     it or assist in its implementation may be prepared, copied,
     published and distributed, in whole or in part, without restriction
     of any kind, provided that the above copyright notice and this
     paragraph are included on all such copies and derivative works.
     However, this document itself may not be modified in any way, such
     as by removing the copyright notice or references to the Internet
     Society or other Internet organizations, except as needed for the
     purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards process
     must be followed, or as required to translate it into languages
     other than
     English.
  
     The limited permissions granted above are perpetual and will not be
     revoked by the Internet Society or its successors or assigns.
  
     This document and the information contained herein is provided on
     an
     "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
     TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
     BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
     HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
     MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  Niccolini, Molina, Duffield   Expires April 2004     [Page 12]