Internet Draft
Document: <draft-niccolini-hash-descr-00.txt>
Expires: April 2004
S. Niccolini
University of Pisa
M. Molina
NEC Europe Ltd.
Nick Duffield
AT&T Labs -
- Research
October 2003
Hash functions description for packet selection
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document is concerned with the specification and properties of
Hash Functions for packet selection in PSAMP. Having a detailed and
exhaustive description of the Hash Function, in terms of input
parameters, output value, selection range and internal operations
is fundamental for coherent configuration of the same hash based
selection at different points in the network. This coherency is
fundamental for all network measurement applications needing
consistent packet selection.
Table of Contents
1. Introduction.................................................2
1.1 Hash-based Packet Selection..................................2
1.2 Consistent Packet Selection..................................3
Niccolini, Molina, Duffield Expires April 2004 [Page 1]
Internet Draft Hash functions description for packet selection,
October 2003
2. Criteria for PSAMP Hash Functions............................3
3. Hash function description....................................4
3.1 MMH (Multilinear Modular Hashing) function...................4
3.1.1 Input parameters.............................................5
3.1.2 Built in parameters..........................................5
3.1.3 Output.......................................................5
3.1.4 Functioning..................................................5
3.2 "Bob" hash function..........................................7
3.2.1 Input Parameters.............................................7
3.2.2 Built in parameters..........................................7
3.2.3 Output.......................................................8
3.2.4 Functioning..................................................8
4. References..................................................10
5. Author's Addresses..........................................11
6. Intellectual Property Statement.............................11
7. Full Copyright Statement....................................11
1. Introduction
This document is concerned with the specification and properties of
Hash Functions for packet selection in PSAMP. Having a detailed and
exhaustive description of the Hash Function, in terms of input
parameters, output value, selection range and internal operations
is fundamental for coherent configuration of the same hash based
selection at different points in the network. This coherency is
fundamental for all network measurement applications needing
consistent packet selection.
1.1 Hash-based Packet Selection
The PSAMP framework document for passive packet measurement [DuFw]
divides packet selection techniques in two broad classes: sampling
and filtering. Filtering is defined as follows:
ææa filter is a selection operation that selects a packet
deterministically based on the packet content, its treatment,
and functions of these occurring in the selection state.
Examples include match/mask filtering, and hash-based
selection.ÆÆ
The deterministic property is of particular importance for
measurement applications that require the consistent selection of
packet at multiple points in the network. This is described more
fully in the next section.
This document is concerned with the specification and properties of
hash functions to be used for packet selection in PSAMP. We start
Niccolini, Molina, Duffield Expires April 2004 [Page 2]
Internet Draft Hash functions description for packet selection,
October 2003
by recalling the following definitions of the components of hash-
based selection from [DuFw].
* Hash-based selection: a filter specified by a hash domain, a
hash function, and hash range and a hash selection range.
* Hash domain: a subset of the packet content and the packet
treatment, viewed as an N-bit string for some positive integer N.
* Hash range: a set of M-bit strings for some positive integer M.
* Hash function: a deterministic map from the hash domain into
the hash range.
* Hash selection range: a subset of the hash range. The packet is
selected if the action of the hash function on the hash domain
for the packet yields a result in the hash selection range.
1.2 Consistent Packet Selection
Hash functions are deterministic in nature, in the sense that they
always produce the same output from a given input. This is a useful
property for some network measurement applications, since it
enables the consistent selection of a given packet at different
points in the network, in a sense we now describe.
Suppose that a hash-based selector operates at multiple points on a
packetÆs path. Consider the following conditions:
1. The same hash function is employed at each point
2. The hash domain is the same at each point and contains only
invariant portions of the packet header and/or payload,
i.e., those portions that do not change from point to point
3. the hash selection range is the same at each point.
If, and only if, conditions 1,2, and 3 are all satisfied, then the
hash-based selectors at each point are consistent in the sense that
a given packet is selected at either all the points or at none.
Consistent packet selection using hash functions was proposed in
[DuGr01].
This document will specify (in its final form) certain hash
functions suitable to be used in the PSAMP protocol.
2. Criteria for PSAMP Hash Functions.
The number of hash functions that could conceivably be used for
consistent packet selection is virtually infinite. However, for the
applications considered in [DuFw], two properties are particularly
important: computation speed and mixing strength. The latter means
that small changing in the input should produce big changes in the
Niccolini, Molina, Duffield Expires April 2004 [Page 3]
Internet Draft Hash functions description for packet selection,
October 2003
output (the hash value). Jointly satisfying these two properties is
difficult, therefore the choice of a good hash function often
implies a trade off decision.
The purpose of this document is to give a formal description of
some hash function rather that indicate which is the best one.
Nevertheless, we have included some that, according to tests
performed [NiTa03], are candidates to satisfy both properties.
Other hash functions may be added in later version of this draft,
along with better indications about their relative performances.
The description of each hash functions that we propose in this
draft comprises the following items:
1. input parameters;
2. built-in parameters;
3. output (including selection range);
4. hash function operation;
The first three items need to be formalized in the information
model for packet selection [ZeTech] and the PSAMP MIB [DiRoCla], in
order that the hash function, consistently configured, can be
activated in the desired measurement points. The last item provides
a consistent reference for the implementation of the hash function
by defining the precise mapping that the hash function makes from
its input to its output. Some reference code is included for this
purpose.
3. Hash function description
3.1 MMH (Multilinear Modular Hashing) function
MMH is a hash function suitable for fast software implementation
able to hash a string of variable size [HaKr97].
In order to achieve fast software implementation while preserving
the low collision probability, the MMH hash function is built
implementing a division-less modular reduction. For sake of
simplicity, we refer to the implementation done on 32-bit word
machines. The generalization of such implementation may be
considered for adapting the MMH to different word length machines,
but the reference code presented in 3.1.4 is optimized for 32-bit
word machines and requires a compiler that supports the
multiplication of two 32-bit words and feeds the result into a 64
bit string (long long type)
Niccolini, Molina, Duffield Expires April 2004 [Page 4]
Internet Draft Hash functions description for packet selection,
October 2003
3.1.1 Input parameters
The only input parameter of the MMH is:
- the length of the input string (key) to be hashed, in bytes. The
MMH operates on elementary blocks of 32 bits, therefore the key
length must be a multiple of 4 (if the key is not a multiple of
4, it must be zero-padded)
3.1.2 Built in parameters
The MMH uses the following built-in parameters:
- ro: a prime number in the range of [2^32 ; 2^32 + 2^16]. It is
used to implement the division-less modular reduction;
- A vector of (different) 32-bit numbers used to implement the so
called inner-product operation. A good choice is to take only
prime numbers;
- The number of elements of such a vector, which must be at least
equal to the key length divided by 4.
3.1.3 Output
The output of the MMH function is a 32-bit number. It should be
specified:
- A 32 bit mask to apply to the output
- The selection range as a list of non overlapping intervals [start
value, end value] where value is in [0,2^32]
3.1.4 Functioning
To obtain the hash value, MMH first computes two quantities called
inner-product-low and inner-product-high, respectively. These are
obtained adding together the results of the multiplication of the
words of the input key by the built-in vector.
The two inner products computed are 64-bit long, they need to be
firstly modular reduced to the prime number range ([0 ; ro]) and
then to the 32-bit range ([0 ; 2^32]). The output of the reduction
(and of the function itself) is the hash value.
Reference code:
/*----------------------mmh hash function ----------------------------
-*/
Niccolini, Molina, Duffield Expires April 2004 [Page 5]
Internet Draft Hash functions description for packet selection,
October 2003
/*
*/
/* The mmh_table size limits the maximum key length that can be hashed
*/
/* With a table of size T we can hash keys up to 4*T bytes long keys
*/
/*
*/
/*--------------------------------------------------------------------
-*/
#define DO(i) sum += key2[i] * (unsigned long long) mmh_table[i]
/* mmh signature table with the first 40 prime numbers: */
/* we can hash string up to 160 char long */
static unsigned long mmh_table[40] =
{ 2, 3, 5, 7, 11,
13, 17, 19, 23, 29,
31, 37, 41, 43, 47,
53, 59, 61, 67, 71,
73, 79, 83, 89, 97,
101, 103, 107, 109, 113,
127, 131, 137, 139, 149,
151, 157, 163, 167, 173};
/**
** FUNCTION: mmh
** PARAMETERS:
** - key: pointer to the char string to be hashed
** - len: length of the char string to be hashed
** RETURN: hashvalue in an unsigned long variable
** PURPOSE: Accepts a pointer to a string to be hashed
** and returns an unsigned long.
** NOTE: using this function requires that
** the key is a 4-byte multiple otherwise
** it gives erroneus results. The padding with 0x0 bytes
** of keys non multiple of 4 must be performed
** before calling this function!
**/
unsigned long mmh(char *key, unsigned short len)
{
unsigned short i;
signed long long stmp;
unsigned long long utmp;
unsigned long long sum = 0LL; /*running sum*/
unsigned long ret; /*return value*/
unsigned long *key2 = (unsigned long *)key;
Niccolini, Molina, Duffield Expires April 2004 [Page 6]
Internet Draft Hash functions description for packet selection,
October 2003
for (i=0; i<(len/4); i++)
DO(i);
/*********return (sum % 0x10000000fLL);***************/
stmp = (sum & 0xffffffffLL) - ((sum >> 32) * 15); /*lo - hi
* 15 */
utmp = (stmp & 0xffffffffLL) - ((stmp >> 32) * 15); /*lo - hi
* 15 */
ret = utmp & 0xffffffff;
if (utmp > 0x10000000fLL) /* if larger than p - subtract 15
again */
ret -=15;
return ret;
}
3.2 "Bob" hash function
"Bob" hash function is a hash function designed for having each bit
of the input affecting every bit of the return value and using both
1-bit and 2-bit deltas to achieve the so called avalanche effect
[Jenk97]. This function was originally built for hash table lookup
with fast software implementation.
3.2.1 Input Parameters
The input parameters of such a function are:
- the length of the input string (key) to be hashed, in bytes. The
elementary input blocks of Bob hash are the single bytes,
therefore no padding is needed.
- an init value (an arbitrary 32-bit number).
3.2.2 Built in parameters
The Bob Hash uses the following built-in parameter:
- the golden ratio (an arbitrary 32-bit number used in the hash
function computation: its purpose is to avoid mapping all zeros
to all zeros);
Note: the mix sub-function (see mix (a,b,c) macro in the reference
code in 3.2.4) has a number of parameters governing the shifts in
the registers. The one presented is not the only possible choice.
Niccolini, Molina, Duffield Expires April 2004 [Page 7]
Internet Draft Hash functions description for packet selection,
October 2003
It is an open point whether these may be considered additional
built-in parameters to specify at function configuration.
3.2.3 Output
The output of the MMH function is a 32-bit number. It should be
specified:
- A 32 bit mask to apply to the output
- The selection range as a list of non overlapping intervals [start
value, end value] where value is in [0,2^32]
3.2.4 Functioning
The hash value is obtained computing first an initialization of an
internal state (composed of 3 32-bit numbers, called a, b, c in the
reference code below), then, for each input byte of the key the
internal state is combined by addition and mixed using the mix sub-
function. Finally, the internal state mixed one last time and the
third number of the state (c) is chosen as the return value.
typedef unsigned long int ub4; /* unsigned 4-byte quantities */
typedef unsigned char ub1; /* unsigned 1-byte quantities */
#define hashsize(n) ((ub4)1<<(n))
#define hashmask(n) (hashsize(n)-1)
/*
--------------------------------------------------------------------
mix -- mix 3 32-bit values reversibly.
For every delta with one or two bits set, and the deltas of all three
high bits or all three low bits, whether the original value of a,b,c
is almost all zero or is uniformly distributed,
* If mix() is run forward or backward, at least 32 bits in a,b,c
have at least 1/4 probability of changing.
* If mix() is run forward, every bit of c will change between 1/3 and
2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.)
mix() was built out of 36 single-cycle latency instructions in a
structure that could supported 2x parallelism, like so:
a -= b;
a -= c; x = (c>>13);
b -= c; a ^= x;
b -= a; x = (a<<8);
c -= a; b ^= x;
c -= b; x = (b>>13);
...
Unfortunately, superscalar Pentiums and Sparcs can't take advantage
of that parallelism. They've also turned some of those single-cycle
latency instructions into multi-cycle latency instructions
--------------------------------------------------------------------
*/
Niccolini, Molina, Duffield Expires April 2004 [Page 8]
Internet Draft Hash functions description for packet selection,
October 2003
#define mix(a,b,c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
/*
--------------------------------------------------------------------
hash() -- hash a variable-length key into a 32-bit value
k : the key (the unaligned variable-length array of bytes)
len : the length of the key, counting by bytes
initval : can be any 4-byte value
Returns a 32-bit value. Every bit of the key affects every bit of
the return value. Every 1-bit and 2-bit delta achieves avalanche.
About 6*len+35 instructions.
The best hash table sizes are powers of 2. There is no need to do
mod a prime (mod is sooo slow!). If you need less than 32 bits,
use a bitmask. For example, if you need only 10 bits, do
h = (h & hashmask(10));
In which case, the hash table should have hashsize(10) elements.
If you are hashing n strings (ub1 **)k, do it like this:
for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);
By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use this
code any way you wish, private, educational, or commercial. It's free.
See http://burtleburtle.net/bob/hash/evahash.html
Use for hash table lookup, or anything where one collision in 2^^32 is
acceptable. Do NOT use for cryptographic purposes.
--------------------------------------------------------------------
*/
ub4 bob_hash(k, length, initval)
register ub1 *k; /* the key */
register ub4 length; /* the length of the key */
register ub4 initval; /* an arbitrary value */
{
register ub4 a,b,c,len;
/* Set up the internal state */
len = length;
a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */
c = initval; /* another arbitrary value */
/*------------------------------------ handle most of the key */
Niccolini, Molina, Duffield Expires April 2004 [Page 9]
Internet Draft Hash functions description for packet selection,
October 2003
while (len >= 12)
{
a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24));
b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24));
c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24));
mix(a,b,c);
k += 12; len -= 12;
}
/*---------------------------- handle the last 11 bytes */
c += length;
switch(len) /* all the case statements fall through */
{
case 11: c+=((ub4)k[10]<<24);
case 10: c+=((ub4)k[9]<<16);
case 9 : c+=((ub4)k[8]<<8);
/* the first byte of c is reserved for the length */
case 8 : b+=((ub4)k[7]<<24);
case 7 : b+=((ub4)k[6]<<16);
case 6 : b+=((ub4)k[5]<<8);
case 5 : b+=k[4];
case 4 : a+=((ub4)k[3]<<24);
case 3 : a+=((ub4)k[2]<<16);
case 2 : a+=((ub4)k[1]<<8);
case 1 : a+=k[0];
/* case 0: nothing left to add */
}
mix(a,b,c);
/*-------------------------------- report the result */
return c;
}
4. References
[DuFw] N. Duffield: A Framework for Passive Packet Measurement,
Internet Draft, <draft-ietf-psamp-framework-03.txt>, work
in progress, June 2003
[DuGr01] N. G. Duffield and M. Grossglauser, Trajectory Sampling
for Direct Traffic Observation, IEEE/ACM Trans. on
Networking, 9(3), 280-292, June 2001.
[ZeTech] T.Zseby, M.Molina, F.Raspall, N.Duffield: Sampling and
Filtering Techniques for IP Packet Selection, Internet
Draft < draft-ietf-psamp-sample-tech-02.txt>, work in
progress, June 2003
[DiRoCla] T.Dietz, D.Romascanu, B. Claise: Definitions of Managed
Objects for Packet Sampling, Internet Draft <draft-ietf-
psamp-mib-00.txt>, work in progress, June 2003
Niccolini, Molina, Duffield Expires April 2004 [Page 10]
Internet Draft Hash functions description for packet selection,
October 2003
[Jenk97] B. Jenkins: Algorithm Alley, Dr. Dobb's Journal, September
1997. http://burtleburtle.net/bob/hash/doobs.html
[HaKr97] S. Halevi, H. Krawczyk : MMH: Software Message
Authentication in the Gbit/second Rates, LNCS vol. 1267,
Springer, 1997. Pages 172-189
[NiTa03] S. Niccolini, S. Tartarelli, F. Raspall, M. Molina : On
time synchronization and hashing for passive One-Way Delay
measurements, work in progress, available at
http://www.ccrle.nec.de/mmolina/papers/paper_hash_sync.pdf
5. Author's Addresses
Saverio Niccolini
University of Pisa
Via Caruso 1
56122 Pisa
Italy
Phone: +39 050 2217-592
EMail: s.niccolini@iet.unipi.it
Maurizio Molina
NEC Europe Ltd., Network Laboratories
Adenauerplatz 6
69115 Heidelberg
Germany
Phone: +49 6221 90511-18
Email: molina@ccrle.nec.de
Nick Duffield
AT&T Labs - Research
Room B-139
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-8726
Email: duffield@research.att.com
6. Intellectual Property Statement
AT&T Corporation may own intellectual property applicable to this
contribution. The IETF has been notified of AT&T's licensing intent
for the specification contained in this document. See
http://www.ietf.org/ietf/IPR/ATT-GENERAL.txt for AT&T's IPR
statement.
7. Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain
Niccolini, Molina, Duffield Expires April 2004 [Page 11]
Internet Draft Hash functions description for packet selection,
October 2003
it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process
must be followed, or as required to translate it into languages
other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on
an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Niccolini, Molina, Duffield Expires April 2004 [Page 12]