IPSec Working Group T. Krovetz, Intel
INTERNET-DRAFT J. Black, UNR
Expires April 2001 S. Halevi, IBM
A. Hevia, U.C. San Diego
H. Krawczyk, Technion
P. Rogaway, U.C. Davis
October 2000
UMAC: Message Authentication Code using Universal Hashing
<draft-krovetz-umac-00.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This specification describes how to generate an authentication tag
(also called a "MAC") using the UMAC message authentication code.
UMAC is designed to be very fast to compute, in software, on
contemporary processors. Measured speeds are as low as 1.0 cycles
per byte. The heart of UMAC is a universal hash function, UHASH,
which relies on addition and multiplication of 16-bit, 32-bit, or
64-bit numbers, operations well-supported by contemporary machines.
To generate the authentication tag on a given message, UHASH is
applied to the message and key to produce a short, fixed-length, hash
value, and this hash value is then XOR-ed with a key-derived
pseudorandom pad. UMAC enjoys a rigorous security analysis and its
only "cryptographic" use is a block cipher, AES, to generate the
pseudorandom pads and internal key material.
Krovetz et al. Expires April 2001 [Page 0]
INTERNET-DRAFT UMAC October 2000
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Named parameter sets: UMAC16 and UMAC32 . . . . . . . . . . . . . 5
2.1 Named parameters . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Alternative instantiations . . . . . . . . . . . . . . . . . 6
2.3 Naming convention . . . . . . . . . . . . . . . . . . . . . 7
3 Notation and basic operations . . . . . . . . . . . . . . . . . . 7
3.1 Operations on strings . . . . . . . . . . . . . . . . . . . 8
3.2 Operations on integers . . . . . . . . . . . . . . . . . . . 10
3.3 String-Integer conversion operations . . . . . . . . . . . . 10
3.4 Mathematical operations on strings . . . . . . . . . . . . . 11
4 Key and pad derivation functions . . . . . . . . . . . . . . . . 12
4.1 KDF: Key derivation function . . . . . . . . . . . . . . . . 12
4.2 PDF: Pad-derivation function . . . . . . . . . . . . . . . . 13
5 UHASH-32: Universal hash function for a 32-bit word size . . . . 15
5.1 NH-32: NH hashing with a 32-bit word size . . . . . . . . . 16
5.2 L1-HASH-32: First-layer hash . . . . . . . . . . . . . . . . 17
5.3 POLY: Polynomial hash . . . . . . . . . . . . . . . . . . . 19
5.4 L2-HASH-32: Second-layer hash . . . . . . . . . . . . . . . 20
5.5 L3-HASH-32: Third-layer hash . . . . . . . . . . . . . . . . 22
5.6 UHASH-32: Three-layer universal hash . . . . . . . . . . . . 23
6 UHASH-16: Universal hash function for a 16-bit word size . . . . 24
6.1 NH-16: NH hashing with a 16-bit word size . . . . . . . . . 24
6.2 L1-HASH-16: First-layer hash . . . . . . . . . . . . . . . . 25
6.3 L2-HASH-16: Second-layer hash . . . . . . . . . . . . . . . 27
6.4 L3-HASH-16: Third-layer hash . . . . . . . . . . . . . . . . 28
6.5 UHASH-16: Three-layer universal hash . . . . . . . . . . . . 29
7 UMAC tag generation . . . . . . . . . . . . . . . . . . . . . . . 30
7.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30
8 Security considerations . . . . . . . . . . . . . . . . . . . . . 31
8.1 Resistance to cryptanalysis . . . . . . . . . . . . . . . . 31
8.2 Tag lengths and forging probability . . . . . . . . . . . . 31
8.3 Selective-assurance authentication . . . . . . . . . . . . . 33
8.4 Nonce considerations . . . . . . . . . . . . . . . . . . . . 34
8.5 Guarding against replay attacks . . . . . . . . . . . . . . 35
9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 36
10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
11 Author contact information . . . . . . . . . . . . . . . . . . . 37
A Suggested application programming interface (API) . . . . . . . . 38
B Reference code and test vectors . . . . . . . . . . . . . . . . . 39
Krovetz et al. Expires April 2001 [Page 1]
INTERNET-DRAFT UMAC October 2000
1 Introduction
This specification describes how to generate an authentication tag
(also called a "MAC") using the UMAC message authentication code.
Typically the authentication tag will be transmitted along with a
message and a nonce to allow the receiver of the message to verify
the message's authenticity. Generation and verification of the
authentication tag depends on the message, the nonce, and on a secret
key (typically, shared by sender and receiver).
UMAC is designed to be very fast to compute, in software, on
contemporary processors. The heart of UMAC is a universal hash
function, UHASH, which relies on addition and multiplication of
16-bit, 32-bit, and 64-bit numbers. These operations are supported
well by contemporary machines.
For many applications, especially ones with short-lived
authentication needs, sufficient speed is already obtained by
algorithms such as HMAC-SHA1 [2, 9] or the CBC-MAC of a block cipher
[1, 8]. But for the most speed-demanding applications, UMAC may be a
better choice: An optimized implementation of UMAC can achieve peak
performance which is about an order of magnitude faster than what can
be achieved with HMAC or CBC-MAC. Moreover, UMAC offers a tradeoff
between forging probability and speed (it is possible to trade
forging probability for speed). UMAC has been designed so that
computing the prefix of a tag can be done faster than computing the
entire tag. This feature allows for a receiver to verify the
authenticity of a message to various levels of assurance depending on
its needs and resources. Finally, UMAC enjoys better analytical
security properties than many other constructions.
Closely associated to this specification are the papers [3, 4, 10,
11]. See those papers for descriptions of the ideas which underlie
this algorithm, for performance data, and for proofs of the
correctness and maximal forging probability of UMAC.
The UMAC algorithms described in the papers [3, 4] are
"parameterized". This means that various low-level choices, like the
endian convention and the underlying cryptographic primitive, have
not been fixed. One must choose values for these parameters before
the authentication tag generated by UMAC (for a given message, key,
and nonce) becomes fully-defined. In this document we provide two
collections of parameter settings, and have named the sets UMAC16 and
UMAC32. The parameter sets have been chosen based on experimentation
and provide good performance on a wide variety of processors. UMAC16
is designed to excel on processors which provide small-scale SIMD
parallelism of the type found in Intel's MMX and Motorola's AltiVec
instruction sets, while UMAC32 is designed to do well on processors
Krovetz et al. Expires April 2001 [Page 2]
INTERNET-DRAFT UMAC October 2000
with good 32- and 64- bit support. UMAC32 may take advantage of SIMD
parallelism in future processors.
UMAC has been designed to allow implementations which accommodate
"on-line" authentication. This means that pieces of the message may
be presented to UMAC at different times (but in correct order) and an
on-line implementation will be able to process the message correctly
without the need to buffer more than a few dozen bytes of the
message. For simplicity, the algorithms in this specification are
presented as if the entire message being authenticated were available
at once.
The ideas which underlie UMAC go back to Wegman and Carter [12]. The
sender and receiver share a secret key (the MAC key) which
determines:
* The key for a "universal hash function". This hash function is
"non-cryptographic", in the sense that it does not need to have any
cryptographic "hardness" property. Rather, it needs to satisfy
some combinatorial property, which can be proven to hold without
relying on unproven hardness assumptions. The concept of a
universal hash function (family) is due to [5].
* The key for a pseudorandom function. This is where one needs a
cryptographic hardness assumption. The pseudorandom function may
be obtained (for example) from a block cipher or cryptographic hash
function. The concept of a pseudorandom function (family) is due
to [6].
To authenticate a message, Msg, one first applies the universal hash
function, resulting in a string which is typically much shorter than
the original message. The pseudorandom function is applied to a
nonce, and the result is used in the manner of a Vernam cipher: the
authentication tag is the xor of the output from the hash function
and the output from the pseudorandom function. Thus, an
authentication tag is generated as
AuthTag = f(Nonce) xor h(Msg).
Here f is the pseudorandom function shared between the sender and the
receiver, and h is a universal hash function shared by the sender and
the receiver. In UMAC, a shared key is used to key the pseudorandom
function f, and then f is used for both tag generation and internally
to generate all of the bits needed by the universal hash function.
For a general discussion of the speed and assurance advantages of
this approach see, for example, [3, 7].
The universal hash function that we use is called UHASH. It combines
Krovetz et al. Expires April 2001 [Page 3]
INTERNET-DRAFT UMAC October 2000
several software-optimized algorithms into a multi-layered structure.
The algorithm is moderately complex. Some of this complexity comes
from extensive speed optimizations.
For the pseudorandom function we use the block cipher of the Advanced
Encryption Standard (AES). (At the time of this working draft, the
AES definition process is still in progress. Here AES refers to the
final blok cipher defined by this process.) Any block cipher with
the same block-length (128 bits) and key-length (128 bits) could
trivially be substituted in place of what we call AES. With slightly
more effort one can define UMAC using a pseudorandom function other
than a block cipher.
One unusual feature of UMAC is that authentication-tag generation
depends on a nonce (in addition to depending on the message and key)
It is imperative that the nonce not be reused when generating
authentication tags under the same key. Thus the nonce will normally
be implemented by a counter, though any other way to achieve a non-
repeating value (or almost certainly non-repeating value) is
acceptable.
This document specifies the procedure for generating the
authentication tag from the message, key and nonce. The exact way in
which the message, nonce and authentication tag are transmitted
between sender and receiver is not specified here. It is the
responsibility of the particular applications using UMAC to specify
how the message, nonce and tag are transmitted. For example, an
application may choose to send the three values concatenated by some
encoding scheme while others may choose not to transmit the nonce at
all if it is known to both parties (e.g., when the nonce is a shared
state used to detect replay of messages), or to send only part of the
bits of the nonce.
Section 8 discusses security considerations that are important for
the proper understanding and use of UMAC.
To the authors' knowledge no ideas utilized in UMAC have been or will
be patented. To the best of the authors' knowledge, it should be
possible to use UMAC immediately, without any intellectual property
concerns.
Public-domain reference code for UMAC is available from the UMAC
homepage: http://www.cs.ucdavis.edu/~rogaway/umac/ Other information,
like timing data and papers, are distributed from the same URL.
Krovetz et al. Expires April 2001 [Page 4]
INTERNET-DRAFT UMAC October 2000
1.1 Organization
The rest of this document is organized as follows: In Section 2
parameters of the named parameter sets UMAC16 and UMAC32 are
described. In Section 3 we introduce the basic notations used
throughout the rest of the document. Section 4 describes the methods
used for generating the Vernam pad and the pseudorandom strings
needed internally for hashing. In Sections 5 and 6 the universal
hash function is described. Finally, in Section 7 we describe how
all these components fit together in the UMAC construction. Some
readers may prefer to read sections 4-7 backwards, in order to get a
top-down description. Section 8 describes some security
considerations in the use of UMAC.
2 Named parameter sets: UMAC16 and UMAC32
As described in [3, 4], a concrete instantiation of UMAC requires the
setting of many parameters. We have chosen two sets of values for
all of these parameters which allow for good performance on a wide
variety of processors. For maximum performance we offer UMAC16 which
is designed to exploit the vector-parallel instructions on the Intel
MMX and Motorola AltiVec instruction sets. For good performance on
processors which support 32- and 64-bit quantities well, we offer
UMAC32.
2.1 Named parameters
Throughout the algorithms described in this document, we have
integrated most of the parameters required for a concrete UMAC
instantiation as unnamed numeric constants. However, we have named
six parameters and assign them the following values depending on
whether one wishes to use UMAC16 or UMAC32.
UMAC16 UMAC32
------ ------
WORD-LEN 2 4
UMAC-OUTPUT-LEN 8 8
L1-KEY-LEN 1024 1024
UMAC-KEY-LEN 16 16
ENDIAN-FAVORITE LITTLE LITTLE
L1-OPERATIONS-SIGN SIGNED UNSIGNED
Here we give a brief explanation of the role each named parameter
plays.
Krovetz et al. Expires April 2001 [Page 5]
INTERNET-DRAFT UMAC October 2000
WORD-LEN: Specifies the size in bytes of a "word". UMAC
will be significantly faster in execution if
the executing machine supports well certain
operations on datatypes of this size. Note
that WORD-LEN is not necessarily the native
wordsize of the target machine (and on some
machines a smaller value turns out to be
preferable).
UMAC-OUTPUT-LEN: Specifies the length of the authentication tag
generated by UMAC, in bytes.
L1-KEY-LEN: Specifies the "block length," in bytes, on
which the hash-function initially operates.
This much storage (and then some) will be
needed in the run-time environment for UMAC's
internal keys.
UMAC-KEY-LEN: Specifies the length in bytes of the user-sup-
plied UMAC key.
ENDIAN-FAVORITE: Specifies which endian-orientation will be fol-
lowed in the reading of data to be hashed.
This need not be equal to the native endianess
of any specific machine running UMAC.
L1-OPERATIONS-SIGN: Specifies whether the strings manipulated in
the hash-function are to be initially consid-
ered as signed or unsigned integers.
2.2 Alternative instantiations
Although this document only specifies two named parameter sets, the
named parameters could be altered to suit specific authentication
needs which are not adequately served by either UMAC16 or UMAC32.
Below, we list alternatives that are supported by this specification
for each of the named parameters.
WORD-LEN ::= 2 | 4
UMAC-OUTPUT-LEN ::= 1 | 2 | ... | 31 | 32
L1-KEY-LEN ::= 32 | 64 | 128 | 256 | ... | 2^28
UMAC-KEY-LEN ::= 16 | 32
ENDIAN-FAVORITE ::= BIG | LITTLE
L1-OPERATIONS-SIGN ::= SIGNED | UNSIGNED
Roughly speaking, doubling UMAC-OUTPUT-LEN approximately doubles
execution time and squares (ie. decreases) the probability of MAC
Krovetz et al. Expires April 2001 [Page 6]
INTERNET-DRAFT UMAC October 2000
forgery. Setting ENDIAN-FAVORITE to BIG causes UMAC to perform
better on big-endian processors rather than little-endian processors.
Setting L1-OPERATIONS-SIGN to UNSIGNED slightly increases UMAC
security at the expense of complicating implementations on systems
which do not support unsigned integers well. This effectively
disallows the use of Intel's MMX instructions which only support
signed integers. Finally, increasing L1-KEY-LEN tends to speed tag
generation on large messages, but requires more memory for processing
and could potentially slow the processor by overflowing its cache.
2.3 Naming convention
A concise shorthand may be used to specify an instance of UMAC. The
word "UMAC" followed by up to six parameters specifies unambiguously
an instance of UMAC. If only a prefix of the six parameters are
written, it is implicitly specified that those missing parameters
take on default values listed below. The format of the shorthand is
"UMAC-w/l/n/k/s/e", and the meaning of the letters (and their
defaults) is as follows:
w = WORD-LEN (4)
l = UMAC-OUTPUT-LEN (8)
n = L1-KEY-LEN (1024)
k = UMAC-KEY-LEN (16)
s = L1-OPERATIONS-SIGN (UNSIGNED)
e = ENDIAN-FAVORITE (LITTLE)
Some examples
UMAC-4/8/1024/16/UNSIGNED/LITTLE (Same as named set "UMAC32" )
UMAC-2/8/1024/16/SIGNED/LITTLE (Same as named set "UMAC16" )
UMAC-4/12 ("UMAC32" with 96-bit output)
UMAC-2/8/4096 ("UMAC16" with 4K L1-key and)
(unsigned L1-OPERATIONS )
3 Notation and basic operations
The specification of UMAC involves the manipulation of both strings
and numbers. String variables are denoted with initial capitals
(upper-case), whereas numeric variables are denoted in all lower-
case. Global parameters are denoted in all capital letters. Simple
functions, like those for string-length and string-xor, are written
with all lower-case, while the algorithms of UMAC are named in all
upper-case.
Whenever a variable is followed by an underscore ("_"), the
Krovetz et al. Expires April 2001 [Page 7]
INTERNET-DRAFT UMAC October 2000
underscore is intended to denote a subscript, with the subscripted
expression needing to be evaluated to resolve the meaning of the
variable. For example, if i=2, then M_{2 * i} refers to the variable
M_4.
We now define some basic operations for manipulating strings and
numbers, and for converting between the two.
3.1 Operations on strings
In this specification, we view the messages to be hashed (as well as
the keys used for hashing) as strings of bytes. A "byte" is an 8-bit
string. The algorithms have been designed so that they are easily
extendable to allow arbitrary bit-strings, if necessary. We use the
following notation to manipulate these strings.
length(S): The length of string S in bytes.
zeroes(n): The string made of n zero-bytes.
S xor T: The string which is the bitwise exclusive-or of S and
T. Strings S and T must have the same length.
S and T: The string which is the bitwise conjunction of S and
T. Strings S and T must have the same length.
S[i]: The i-th byte of the string S (indices begin at 1).
S[i..j]: The substring of S consisting of bytes i through j.
S || T: The string S concatenated with string T.
zeropad(S,n): The string S, padded with zero-bytes to the nearest
non-zero multiple of n bytes. Formally, zeropad(S,n)
= S || zeroes(i), where i is the smallest nonnegative
integer such that S || zeroes(i) is non-empty and n
divides length(S)+i.
3.1.1 ENDIAN-SWAP: Adjusting endian orientation
This routine is used to make the data input to UMAC conform to the
ENDIAN-FAVORITE global parameter.
Krovetz et al. Expires April 2001 [Page 8]
INTERNET-DRAFT UMAC October 2000
3.1.1.1 Discussion
The most time consuming portion of many UMAC computations involves
the reading of key and message data from memory. Because big- and
little-endian computers will read these bytes differently, specifying
a particular endian-orientation for UMAC could have significant
performance ramifications. If necessary, the key-bytes can be
preprocessed once during key setup to eliminate the need for their
reorientation during performance-critical tag generation. But,
message data presumably cannot be preprocessed. Any reorientation
needed for each message must be done during tag generation,
introducing a significant penalty to computers whose native endian-
orientation is opposite to that specified for UMAC. Therefore, UMAC
defines a parameter, ENDIAN-FAVORITE, which allows UMAC to be
specified to favor big- or little-endian memory conventions. If the
parameter is set to favor little-endian computers, then we specify
the reversal of the bytes of every word in the input message using
the following support function. By reversing the data in the
specification, an implementation on a little-endian machine would in
fact do nothing but read the input data using native-endian word
loads. The loads would automatically reverse the bytes within each
word, fulfilling the requirements of the specification. Any other
endian reorientation needed to comply with the specification requires
an insignificant amount of time during each tag calculation.
3.1.1.2 Interface
Function Name:
ENDIAN-SWAP
Input:
S, string with length divisible by WORD-LEN bytes.
Output:
T, string S with each word endian-reversed.
3.1.1.3 Algorithm
Compute T using the following algorithm.
//
// Break S into word-size chunks
//
n = length(S) / WORD-LEN
Let S_1, S_2, ..., S_n be strings of length WORD-LEN bytes
so that S_1 || S_2 || .. || S_n = S.
//
// Byte-reverse each chunk, and build-up T
Krovetz et al. Expires April 2001 [Page 9]
INTERNET-DRAFT UMAC October 2000
//
T = <empty string>
for i = 1 to n do
Let W_1, W_2, ..., W_{WORD-LEN} be bytes
so that W_1 || W_2 || ... || W_{WORD-LEN} = S_i
SReversed_i = W_{WORD-LEN} || W_{WORD-LEN - 1} || ... || W_1
T = T || SReversed_i
Return T
3.2 Operations on integers
In this specification, we generally use standard notation for
mathematical operations, such as "*" for multiplication, "+" for
addition and "mod" for modular reduction. Some less standard
notations are defined here.
a^i: The integer a raised to the integer i-th power.
lg a: The base-2 logarithm of integer a.
floor(x): The largest integer less than or equal to x.
ceil(x): The smallest integer greater than or equal to x.
prime(n): The largest prime number less than 2^n.
The prime numbers used in UMAC are:
+-----+--------------------+---------------------------------------+
| x | prime(x) [Decimal] | prime(x) [Hexadecimal] |
+-----+--------------------+---------------------------------------+
| 19 | 2^19 - 1 | 0x0007FFFF |
| 32 | 2^32 - 5 | 0xFFFFFFFB |
| 36 | 2^36 - 5 | 0x0000000F FFFFFFFB |
| 64 | 2^64 - 59 | 0xFFFFFFFF FFFFFFC5 |
| 128 | 2^128 - 159 | 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFF61 |
+-----+--------------------+---------------------------------------+
3.3 String-Integer conversion operations
We convert between strings and integers using the following
functions. Each function treats initial bits as more significant
than later ones.
Krovetz et al. Expires April 2001 [Page 10]
INTERNET-DRAFT UMAC October 2000
bit(S,n): Returns the integer 1 if the n-th bit of the string
S is 1, otherwise returns the integer 0 (indices
begin at 1). Here n must be between 1 and the bit-
length of S.
str2uint(S): The non-negative integer whose binary representation
is the string S. More formally, if S is t bits long
then str2uint(S) = 2^{t-1} * bit(S,1) + 2^{t-2} *
bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t).
uint2str(n,i): The i-byte string S such that str2uint(S) = n. If
no such string exists then uint2str(n,i) is unde-
fined.
str2sint(S): The integer whose binary representation in two's-
complement is the string S. More formally, if S is
t bits long then str2sint(S) = -2^{t-1} * bit(S,1) +
2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) +
bit(S,t).
sint2str(n,i): The i-byte string S such that str2sint(S) = n. If
no such string exists then sint2str(n,i) is unde-
fined.
3.4 Mathematical operations on strings
One of the primary operations in the universal hashing part of UMAC
is repeated application of addition and multiplication on strings.
We use "+_n" and "*_n" to denote the string operations which are
equivalent to addition and multiplication modulo 2^n, respectively.
These operations correspond exactly with the addition and
multiplication operations which are performed efficiently on
registers by modern computers. So, when n is 16, 32 or 64, these
operations can be preformed by computers very quickly.
There are two interpretations of the operators depending on whether
the strings are interpreted as signed or unsigned integers. The
global parameter L1-OPERATIONS-SIGN determines which interpretation
is made.
If strings S and T are interpreted as signed integers (that is,
L1-OPERATIONS-SIGN == SIGNED) then
"S *_n T" as uint2str(str2sint(S) * str2sint(T) mod 2^n, n/8), and
"S +_n T" as uint2str(str2sint(S) + str2sint(T) mod 2^n, n/8).
Krovetz et al. Expires April 2001 [Page 11]
INTERNET-DRAFT UMAC October 2000
If strings S and T are interpreted as unsigned integers (that is,
L1-OPERATIONS-SIGN == UNSIGNED) then we define
"S *_n T" as uint2str(str2uint(S) * str2uint(T) mod 2^n, n/8), and
"S +_n T" as uint2str(str2uint(S) + str2uint(T) mod 2^n, n/8).
In any case, the number n must be divisible by 8. In this document
we use S *_16 T, S *_32 T, S *_64 T, S +_32 T and S +_64 T,
corresponding to multiplication of 2, 4 and 8 byte numbers, and the
addition of 4 and 8 byte numbers.
4 Key and pad derivation functions
UMAC, as described in this document, requires either a 16- or 32-byte
key which is used with a key-derivation function (KDF) to produce
pseudorandom bits needed within the universal hash function.
4.1 KDF: Key derivation function
Stretching the user-supplied key into pseudorandom bits used
internally by UMAC is done with a key-derivation function (KDF). In
this section we define a KDF which is efficiently instantiated with a
block cipher. The Advanced Encryption Standard (AES) is used in
output-feedback mode to produce the required bits. If UMAC-KEY-LEN
is 16, then the 128-bit key/128-bit block-length variant of AES is
used, and if UMAC-KEY-LEN is 32, then the 256-bit key/128-bit block-
length variant is used. The KDF requires an "index" parameter.
Using the same key, but different indices, generates different
pseudorandom outputs.
4.1.1 Interface
Function Name:
KDF
Input:
K, string of length UMAC-KEY-LEN bytes // key to AES
index, a non-negative integer less than 256.
numbytes, a positive integer.
Output:
Y, string of length numbytes bytes.
Krovetz et al. Expires April 2001 [Page 12]
INTERNET-DRAFT UMAC October 2000
4.1.2 Algorithm
Compute Y using the following algorithm.
//
// Calculate number of AES iterations, set indexed starting point
//
n = ceil(numbytes / 16)
T = zeroes(15) || uint2str(index, 1)
Y = <empty string>
//
// Build Y using AES in a feedback mode
//
for i = 1 to n do
T = AES(K, T)
Y = Y || T
Y = Y[1..numbytes]
Return Y
4.2 PDF: Pad-derivation function
The Wegman-Carter MAC scheme used in UMAC requires the exclusive-or
of a pseudorandom string with the output from the universal hash
function. The pseudorandom string is obtained by applying a pad-
derivation function (PDF) to a nonce which, for security reasons,
must change with each authentication-tag computation. Nonces may be
any number of bytes from 1 to 16, but all nonces in a single
authentication session must be of equal length. In this section we
define a PDF which is efficiently instantiated with a block cipher.
Again we use AES with either 16- or 32-bytes keys depending on the
value of UMAC-KEY-LEN.
4.2.1 Discussion
The PDF output is exclusive-or'd with the result of the universal
hash function. AES, however, may provide more or fewer bits per
invocation than are needed for this purpose. For example, UMAC-
OUTPUT-LEN is normally 8 bytes and AES produces an output of 16
bytes. It would save processing time if half of the AES output bits
could be used to generate one tag, and then the second half of the
same AES output could be used for the tag of the next message. For
this reason, we include an optimization which allows the use of
different substrings of the same AES output. This optimization is
Krovetz et al. Expires April 2001 [Page 13]
INTERNET-DRAFT UMAC October 2000
effective only when nonces are sequential. We do so by using the low
bits of the nonce as an index into the AES output, which is generated
using the higher bits of the nonce which are not used for indexing.
This speeds message authentication by reducing the average time spent
by AES for each authentication. Note that if a counter-variable is
used to exploit this optimization, and the variable is stored in
memory, then the variable must be treated as big-endian. If UMAC-
OUTPUT-LEN is larger than 16, then two AES invocations are required
to produce a sufficient number of bits.
4.2.2 Interface
Function Name:
PDF
Input:
K, string of length UMAC-KEY-LEN bytes // key for AES
Nonce, string of length 1 to 16 bytes.
Output:
Y, string of length UMAC-OUTPUT-LEN bytes.
4.2.3 Algorithm
Compute Y using the following algorithm.
//
// Make Nonce 16 bytes by prepending zeroes
//
Nonce = Nonce || zeroes(16 - length(Nonce))
//
// If one AES invocation is enough for more than one
// PDF invocation.
//
if (UMAC-OUTPUT-LEN <= 8) then
//
// Compute number of index bits needed
//
i = floor(16 / UMAC-OUTPUT-LEN)
numlowbits = floor(lg(i))
//
// Extract index bits and zero low bits of Nonce
//
nlowbitsnum = str2uint(Nonce) mod 2^numlowbits
Nonce = Nonce xor uint2str(nlowbitsnum, 16)
Krovetz et al. Expires April 2001 [Page 14]
INTERNET-DRAFT UMAC October 2000
//
// Generate subkey, AES and extract indexed substring
//
K' = KDF(K, 128, UMAC-KEY-LEN)
T = AES(K', Nonce)
Y = T[ nlowbitsnum * UMAC-OUTPUT-LEN + 1 ..
(nlowbitsnum + 1) * UMAC-OUTPUT-LEN]
else
//
// Repeated AES calls to build length
//
K_1 = KDF(K, 128, UMAC-KEY-LEN)
K_2 = KDF(K, 129, UMAC-KEY-LEN)
if (UMAC-OUTPUT-LEN <= 16)
Y = AES(K_1, Nonce)
else
Y = AES(K_1, Nonce) || AES(K_2, Nonce)
Y = Y[1..UMAC-OUTPUT-LEN]
Return Y
5 UHASH-32: Universal hash function for a 32-bit word size
UHASH is a keyed hash function, which takes as input a string of
arbitrary length, and produces as output a string of fixed length
(such as 8 bytes). The actual output length depends on the parameter
UMAC-OUTPUT-LEN.
UHASH has been shown to be epsilon-ASU ("Almost Strongly Universal"),
where epsilon is a small (parameter-dependent) real number.
Informally, saying that a keyed hash function is epsilon-ASU means
that for any two distinct fixed input strings, the two outputs of the
hash function with a random key "look almost like a pair of random
strings". The number epsilon measures how non-random the output
strings may be. For details, see [3, 4, 11].
UHASH has been designed to be fast by exploiting several
architectural features of modern commodity processors. It was
specifically designed for use in UMAC. But UHASH is useful beyond
that domain, and can be easily adopted for other purposes.
UHASH does its work in three layers. First, a hash function called
NH [3] is used to compress input messages into strings which are
typically many times smaller than the input message. Second, the
compressed message is hashed with an optimized "polynomial hash
Krovetz et al. Expires April 2001 [Page 15]
INTERNET-DRAFT UMAC October 2000
function" into a fixed-length 16-byte string. Finally, the 16-byte
string is hashed using an "inner-product hash" into a string of
length WORD-LEN bytes. These three layers are repeated (with a
modified key) until the outputs total UMAC-OUTPUT-LEN bytes.
Note: Because the repetitions of the three-layer scheme are
independent (aside from sharing some internal key), it follows that
each "word" of the final output can be computed independently.
Hence, to compute a prefix of a UMAC tag, one can simply repeat the
three-layer scheme fewer times. Thus, computing a prefix of the tag
can be done significantly faster than computing the whole tag.
5.1 NH-32: NH hashing with a 32-bit word size
The first of the three hash-layers that UHASH uses is the NH hash
function [3]. More than any other part of UHASH, NH is designed to
be fast on modern processors, because it is where the bulk of the
UHASH work is done. The NH universal hash function hashes an input
string M using a key K by considering M and K to be arrays of
integers, each WORD-LEN bytes in length, and performing a sequence of
arithmetic operations on them. See [3] for definitions, proofs and
rationale relating to NH.
The NH-32 algorithm is designed to perform well on processors which
support well multiplications of 32-bit operands into 64-bit results.
NH-32 is also designed to exploit the recent trend of including
instructions for small-scale vector parallelism in uniprocessor CPUs.
Intel's Streaming SIMD 2 instruction set is a good example of this
trend. It supports an instruction, which multiplies two pairs of
32-bit operands into two 64-bit results, which can be used by
UHASH-32 for accelerated hashing. To accommodate this parallelism,
NH-32 accesses data-words in pairs which are 4 words (16 bytes)
apart.
5.1.1 Interface
Function Name:
NH-32
Input:
K, string of length L1-KEY-LEN bytes.
M, string with length divisible by 32 bytes.
Output:
Y, string of length 8 bytes.
Krovetz et al. Expires April 2001 [Page 16]
INTERNET-DRAFT UMAC October 2000
5.1.2 Algorithm
Compute Y using the following algorithm.
//
// Break M and K into 4-byte chunks
//
t = length(M) / 4
Let M_1, M_2, ..., M_t be 4-byte strings
so that M = M_1 || M_2 || .. || M_t.
Let K_1, K_2, ..., K_t be 4-byte strings
so that K_1 || K_2 || .. || K_t is a prefix of K.
//
// Perform NH hash on the chunks, pairing words for multiplication
// which are 4 apart to accommodate vector-parallelism.
//
Y = zeroes(8)
i = 1
while (i < t) do
Y = Y +_64 ((M_{i+0} +_32 K_{i+0}) *_64 (M_{i+4} +_32 K_{i+4}))
Y = Y +_64 ((M_{i+1} +_32 K_{i+1}) *_64 (M_{i+5} +_32 K_{i+5}))
Y = Y +_64 ((M_{i+2} +_32 K_{i+2}) *_64 (M_{i+6} +_32 K_{i+6}))
Y = Y +_64 ((M_{i+3} +_32 K_{i+3}) *_64 (M_{i+7} +_32 K_{i+7}))
i = i + 8
Return Y
5.2 L1-HASH-32: First-layer hash
To limit the length of key required in the first layer of hashing,
L1-HASH-32 breaks the input message into chunks no longer than
L1-KEY-LEN and NH hashes each with a key of the same length.
5.2.1 Discussion
The NH hash function requires a key which is just as long as the
message being hashed. To limit the amount of key used in the NH
hashing layer, we use a key of fixed length (defined by the parameter
L1-KEY-LEN), and process the message in chunks of this length (or
less). The L1-HASH-32 algorithm takes an input message and breaks it
into chunks of L1-KEY-LEN bytes (except the last chuck, which may be
shorter and may need to be zero-padded to an appropriate length).
Each chunk is hashed with NH-32, and the outputs from all the NH
invocations are annotated with some length information and
concatenated to produce the final L1-HASH-32 result.
Krovetz et al. Expires April 2001 [Page 17]
INTERNET-DRAFT UMAC October 2000
If ENDIAN-FAVORITE is LITTLE, then each word in the input message is
required to be endian reversed.
5.2.2 Interface
Function Name:
L1-HASH-32
Input:
K, string of length L1-KEY-LEN bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length (8 * ceil(length(M)/L1-KEY-LEN)) bytes.
5.2.3 Algorithm
Compute Y using the following algorithm.
//
// Break M into L1-KEY-LEN byte chunks (final chunk may be shorter)
//
t = ceil(length(M) / L1-KEY-LEN)
Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. ||
M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t.
//
// For each chunk, except the last: endian-adjust, NH hash
// and add bit-length. Use results to build Y.
//
Len = uint2str(L1-KEY-LEN * 8, 8)
Y = <empty string>
for i = 1 to t-1 do
if (ENDIAN-FAVORITE == LITTLE) then // See endian discussion
ENDIAN-SWAP(M_i) // in section 3.1.1
Y = Y || (NH-32(K, M_i) +_64 Len)
//
// For the last chunk: pad to 32-byte boundary, endian-adjust,
// NH hash and add bit-length. Concatenate the result to Y.
//
Len = uint2str(length(M_t) * 8, 8)
M_t = zeropad(M_t, 32)
if (ENDIAN-FAVORITE == LITTLE) then
ENDIAN-SWAP(M_t)
Y = Y || (NH-32(K, M_t) +_64 Len)
return Y
Krovetz et al. Expires April 2001 [Page 18]
INTERNET-DRAFT UMAC October 2000
5.3 POLY: Polynomial hash
The output from L1-HASH is a string which is shorter than, but still
proportional to, that of its input. The POLY hash algorithm takes an
arbitrary message and hashes it to a fixed length.
5.3.1 Discussion
Polynomial hashing treats an input message as a sequence of
coefficients of a polynomial, and the hash-key is the point at which
this polynomial is evaluated. The security guarantee assured by
polynomial hashing degrades linearly in the length of the message
being hashed. If two messages of n words are hashed, then the
probability they collide when hashed by POLY with a prime modulus of
p is no more than n / p. For more information on the polynomial
hashing schemes used in UMAC see [10].
The parameter 'wordbits' specifies the prime modulus used in the
polynomial as well as the granularity (length of words) in which the
input message should be broken. Because some strings of length
wordbits are greater than prime(wordbits), a mechanism is needed to
fix words which are not in the range 0 .. prime(wordbits) - 1. To
this end, any word larger than 'maxwordrange' is split into two words
guaranteed to be in range, and each is hashed by the polynomial hash.
5.3.2 Interface
Function Name:
POLY
Input:
wordbits, positive integer divisible by 8.
maxwordrange, positive integer less than 2^wordbits.
k, integer in the range 0 .. prime(wordbits) - 1.
M, string with length divisible by (wordbits / 8) bytes.
Output:
y, integer in the range 0 .. prime(wordbits) - 1.
5.3.3 Algorithm
Compute y using the following algorithm.
//
// Define constants used for fixing out-of-range words
//
wordbytes = wordbits / 8
Krovetz et al. Expires April 2001 [Page 19]
INTERNET-DRAFT UMAC October 2000
p = prime(wordbits)
offset = 2^wordbits - p
marker = p - 1
//
// Break M into chunks of length wordbytes bytes
//
n = length(M) / wordbytes
Let M_1, M_2, ..., M_n be strings of length wordbytes bytes
so that M = M_1 || M_2 || .. || M_n
//
// For each input word, compare it with maxwordrange. If larger
// then hash the words 'marker' and (m - offset), both in range.
//
y = 1
for i = 1 to n do
m = str2uint(M_i)
if (m >= maxwordrange) then
y = (k * y + marker) mod p
y = (k * y + (m - offset)) mod p
else
y = (k * y + m) mod p
Return y
5.4 L2-HASH-32: Second-layer hash
Because L1-HASH may produce quite long strings, and POLY's security
guarantee degrades linearly, a scheme is required to allow long
strings while ensuring that the collision probability never grows
beyond a certain pre-set bound. This is accomplished by dynamically
increasing the prime modulus used in the polynomial hashing as the
collision probability bound is approached.
5.4.1 Discussion
The probability of two n-word messages hashing to the same result
when polynomially hashed with prime modulus p is as much as (n / p).
To maintain a limit on the maximum collision probability, a scheme is
needed to disallow (n / p) growing too large. The scheme used here
hashes a number of words n_1 under modulus p_1 until (n_1 / p_1)
reaches a critical point. The result of the hash-so-far is prepended
to the remaining message needing to be hashed, and the hashing
continues, but under a prime modulus p_2 which is substantially
larger than p_1. Hashing continues for n_2 more words until (n_2 /
Krovetz et al. Expires April 2001 [Page 20]
INTERNET-DRAFT UMAC October 2000
p_2) also reaches a critical point, at which time a new larger prime
p_3 could be used.
Because polynomial hashing under a small prime modulus is often
faster than hashing under a large one, this dynamic ramping-up of the
polynomial's modulus provides a hash function which is faster on
short messages, but still accommodates long ones.
The keys used for polynomial hashing are restricted to particular
subsets to allow for faster implementations on 32-bit architectures.
The restrictions allow an implementor to disregard some potential
arithmetic carries during computation.
For more information see [10].
5.4.2 Interface
Function Name:
L2-HASH-32
Input:
K, string of length 24 bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length 16 bytes.
5.4.3 Algorithm
Compute y using the following algorithm.
//
// Extract keys and restrict to special key-sets
//
Mask64 = uint2str(0x01ffffff01ffffff, 8)
Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16)
k64 = str2uint(K[1..8] and Mask64)
k128 = str2uint(K[9..24] and Mask128)
//
// If M no more than 2^17 bytes, hash under 64-bit prime,
// otherwise, hash first 2^17 bytes under 64-bit prime and
// remainder under 128-bit prime.
//
if (length(M) <= 2^17) then // 2^14 64-bit words
//
// View M as an array of 64-bit words, and use POLY modulo
Krovetz et al. Expires April 2001 [Page 21]
INTERNET-DRAFT UMAC October 2000
// prime(64) (and with bound 2^64 - 2^32) to hash it.
//
y = POLY(64, 2^64 - 2^32, k64, M)
else
M_1 = M[1 .. 2^17]
M_2 = M[2^17 + 1 .. length(M)]
M_2 = zeropad(M_2 || uint2str(0x80,1), 16)
y = POLY(64, 2^64 - 2^32, k64, M_1)
y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_2)
Y = uint2str(y, 16)
Return Y
5.5 L3-HASH-32: Third-layer hash
The output from L2-HASH-32 is 16 bytes long. This final hash
function hashes the 16-byte string to a fixed length of 4 bytes using
a simple inner-product hash with affine translation. A 36-bit prime
modulus is used to improve security.
5.5.1 Interface
Function Name:
L3-HASH-32
Input:
K1, string of length 64 bytes.
K2, string of length 4 bytes.
M, string of length 16 bytes.
Output:
Y, string of length 4 bytes.
5.5.2 Algorithm
Compute Y using the following algorithm.
y = 0
//
// Break M and K1 into 8 chunks and convert to integers
//
for i = 1 to 8 do
M_i = M [(i - 1) * 2 + 1 .. i * 2]
K_i = K1[(i - 1) * 8 + 1 .. i * 8]
Krovetz et al. Expires April 2001 [Page 22]
INTERNET-DRAFT UMAC October 2000
m_i = str2uint(M_i)
k_i = str2uint(K_i) mod prime(36)
//
// Inner-product hash, extract last 32 bits and affine-translate
//
y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(36)
y = y mod 2^32
Y = uint2str(y, 4)
Y = Y xor K2
Return Y
5.6 UHASH-32: Three-layer universal hash
The hash functions L1-HASH, L2-HASH and L3-HASH are used together in
a straightforward manner. A message is first hashed by L1-HASH, its
output is then hashed by L2-HASH, whose output is then hashed by
L3-HASH. If the message being hashed is no longer than L1-KEY-LEN
bytes, then L2-HASH is skipped as an optimization. Because L3-HASH
outputs a string whose length is only WORD-LEN bytes long, multiple
iterations of this three-layer hash are used, with different keys
each time, until UMAC-OUTPUT-LEN have been generated. To reduce
memory requirements, L1-HASH and L3-HASH both reuse most of their
key-material between iterations.
5.6.1 Interface
Function Name:
UHASH-32
Input:
K, string of length UMAC-KEY-LEN bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length UMAC-OUTPUT-LEN bytes.
5.6.2 Algorithm
Compute Y using the following algorithm.
//
// Calculate iterations needed to make UMAC-OUTPUT-LEN bytes
//
streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN)
//
Krovetz et al. Expires April 2001 [Page 23]
INTERNET-DRAFT UMAC October 2000
// Define total key needed for all iterations using KDF.
// L1Key and L3Key1 both reuse most key between iterations.
//
L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16)
L2Key = KDF(K, 1, streams * 24)
L3Key1 = KDF(K, 2, streams * 64)
L3Key2 = KDF(K, 3, streams * 4)
//
// For each iteration, extract key and three-layer hash.
// If length(M) <= L1-KEY-LEN, then skip L2-HASH.
//
Y = <empty string>
for i = 1 to streams do
L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN]
L2Key_i = L2Key [(i-1) * 24 + 1 .. i * 24]
L3Key1_i = L3Key1[(i-1) * 64 + 1 .. i * 64]
L3Key2_i = L3Key2[(i-1) * 4 + 1 .. i * 4]
A = L1-HASH-32(L1Key_i, M)
if (length(M) <= L1-KEY-LEN) then
B = zeroes(8) || A
else
B = L2-HASH-32(L2Key_i, A)
C = L3-HASH-32(L3Key1_i, L3Key2_i, B)
Y = Y || C
Y = Y[1 .. UMAC-OUTPUT-LEN]
Return Y
6 UHASH-16: Universal hash function for a 16-bit word size
See Section 5 (UHASH-32) for general discussion of the UHASH
algorithm. Each sub-section of Section 6 will note only differences
between UHASH-32 and UHASH-16.
6.1 NH-16: NH hashing with a 16-bit word size
The NH-16 algorithm is designed to exploit the recent trend of
including instructions for small-scale vector parallelism in
uniprocessor CPUs. Intel's MMX and Mororola's AltiVec instruction
sets are good examples of this trend. Both support single-
instruction multiply-add instructions on vectors of 16-bit words
which can be used by UHASH-16 for accelerated hashing. To
accommodate this parallelism, NH-16 accesses data-words in pairs
which are 8 words (16 bytes) apart.
Krovetz et al. Expires April 2001 [Page 24]
INTERNET-DRAFT UMAC October 2000
6.1.1 Interface
Function Name:
NH-16
Input:
K, string of length L1-KEY-LEN bytes.
M, string with length divisible by 32 bytes.
Output:
Y, string of length 4 bytes.
6.1.2 Algorithm
Compute Y using the following algorithm.
//
// Break M and K into 2-byte chunks
//
t = length(M) / 2
Let M_1, M_2, ..., M_t be 2-byte strings
so that M = M_1 || M_2 || .. || M_t.
Let K_1, K_2, ..., K_t be 2-byte strings
so that K_1 || K_2 || .. || K_t is a prefix of K.
//
// Perform NH hash on the chunks, pairing words for multiplication
// which are 8 apart to accommodate vector-parallelism.
//
Y = zeroes(4)
i = 1
while (i < t) do
Y = Y +_32 ((M_{i+0} +_16 K_{i+0}) *_32 (M_{i+ 8} +_16 K_{i+ 8}))
Y = Y +_32 ((M_{i+1} +_16 K_{i+1}) *_32 (M_{i+ 9} +_16 K_{i+ 9}))
Y = Y +_32 ((M_{i+2} +_16 K_{i+2}) *_32 (M_{i+10} +_16 K_{i+10}))
Y = Y +_32 ((M_{i+3} +_16 K_{i+3}) *_32 (M_{i+11} +_16 K_{i+11}))
Y = Y +_32 ((M_{i+4} +_16 K_{i+4}) *_32 (M_{i+12} +_16 K_{i+12}))
Y = Y +_32 ((M_{i+5} +_16 K_{i+5}) *_32 (M_{i+13} +_16 K_{i+13}))
Y = Y +_32 ((M_{i+6} +_16 K_{i+6}) *_32 (M_{i+14} +_16 K_{i+14}))
Y = Y +_32 ((M_{i+7} +_16 K_{i+7}) *_32 (M_{i+15} +_16 K_{i+15}))
i = i + 16
Return Y
6.2 L1-HASH-16: First-layer hash
To limit the length of key required in the first layer of hashing,
L1-HASH-16 breaks the input message into chunks no longer than
Krovetz et al. Expires April 2001 [Page 25]
INTERNET-DRAFT UMAC October 2000
L1-KEY-LEN bytes and NH hashes each with a key of that same length.
6.2.1 Interface
Function Name:
L1-HASH-16
Input:
K, string of length L1-KEY-LEN bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length (4 * ceil(length(M)/L1-KEY-LEN)) bytes.
6.2.2 Algorithm
Compute Y using the following algorithm.
//
// Break M into L1-KEY-LEN byte chunks (final chunk may be shorter)
//
t = ceil(length(M) / L1-KEY-LEN)
Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. ||
M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t.
//
// For each chunk, except the last: endian-adjust, NH hash
// and add bit-length. Use results to build Y.
//
Len = uint2str(L1-KEY-LEN * 8, 4)
Y = <empty string>
for i = 1 to t-1 do
if (ENDIAN-FAVORITE == LITTLE) then // See endian discussion
ENDIAN-SWAP(M_i) // in section 3.1.1
Y = Y || (NH-16(K, M_i) +_32 Len)
//
// For the last chunk: pad to 32-byte boundary, endian-adjust,
// NH hash and add bit-length. Concatenate the result to Y.
//
Len = uint2str(length(M_t) * 8, 4)
M_t = zeropad(M_t, 32)
if (ENDIAN-FAVORITE == LITTLE) then
ENDIAN-SWAP(M_t)
Y = Y || (NH-16(K, M_t) +_32 Len)
return Y
Krovetz et al. Expires April 2001 [Page 26]
INTERNET-DRAFT UMAC October 2000
6.3 L2-HASH-16: Second-layer hash
L2-HASH-16 differs from L2-HASH-32 by beginning the ramped hash with
a smaller prime modulus. See Section 5.3 for the definition of POLY.
6.3.1 Interface
Function Name:
L2-HASH-16
Input:
K, string of length 28 bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length 16 bytes.
6.3.2 Algorithm
Compute Y using the following algorithm.
//
// Extract keys and restrict to special key-sets
//
Mask32 = uint2str(0x1fffffff, 4)
Mask64 = uint2str(0x01ffffff01ffffff, 8)
Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16)
k_32 = str2uint(K[1..4] and Mask32)
k64 = str2uint(K[5..12] and Mask64)
k128 = str2uint(K[13..28] and Mask128)
//
// If M no more than 2^11 bytes, hash under 32-bit prime.
// Otherwise, hash under increasingly long primes.
//
if (length(M) <= 2^11) then // 2^9 32-bit words
y = POLY(32, 2^32 - 6, k_32, M)
else if (length(M) <= 2^33) then // 2^31 32-bit words
M_1 = M[1 .. 2^11]
M_2 = M[2^11 + 1 .. length(M)]
M_2 = zeropad(M_2 || uint2str(0x80,1), 8)
y = POLY(32, 2^32 - 6, k_32, M_1)
y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2)
else
Krovetz et al. Expires April 2001 [Page 27]
INTERNET-DRAFT UMAC October 2000
M_1 = M[1 .. 2^11]
M_2 = M[2^11 + 1 .. 2^33]
M_3 = M[2^33 + 1 .. length(M)]
M_3 = zeropad(M || uint2str(0x80,1), 16)
y = POLY(32, 2^32 - 6, k_32, M_1)
y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2)
y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_3)
Y = uint2str(y, 16)
Return Y
6.4 L3-HASH-16: Third-layer hash
The L3-HASH-16 algorithm differs from L3-HASH-32 by hashing under a
19-bit prime modulus (instead of a 36-bit prime modulus) and then
returning a 2-byte result (instead of a 4-byte result).
6.4.1 Interface
Function Name:
L3-HASH-16
Input:
K1, string of length 32 bytes.
K2, a string of length 2 bytes.
M, a string of length 16 bytes.
Output:
Y, a string of length 2 bytes.
6.4.2 Algorithm
Compute Y using the following algorithm.
y = 0
//
// Break M and K1 into 8 chunks and convert to integers
//
for i = 1 to 8 do
M_i = M[(i - 1) * 2 + 1 .. i * 2]
K_i = K1[(i - 1) * 4 + 1 .. i * 4]
m_i = str2uint(M_i)
k_i = str2uint(K_i) mod prime(19)
//
Krovetz et al. Expires April 2001 [Page 28]
INTERNET-DRAFT UMAC October 2000
// Inner-product hash, extract last 32 bits and affine-translate
//
y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(19)
y = y mod 2^16
Y = uint2str(y, 2)
Y = Y xor K2
Return Y
6.5 UHASH-16: Three-layer universal hash
The algorithm UHASH-16 differs from UHASH-32 only in the size of its
keys generated, and in that it refers to the 16-bit variants of the
three-layer hash functions.
6.5.1 Interface
Function Name:
UHASH-16
Input:
K, string of length UMAC-KEY-LEN bytes.
M, string of length less than 2^64 bytes.
Output:
Y, string of length UMAC-OUTPUT-LEN bytes.
6.5.2 Algorithm
Compute Y using the following algorithm.
//
// Calculate iterations needed to make UMAC-OUTPUT-LEN bytes
//
streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN)
//
// Define total key needed for all iterations using KDF.
// L1Key and L3Key1 both reuse most key between iterations.
//
L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16)
L2Key = KDF(K, 1, streams * 28)
L3Key1 = KDF(K, 2, streams * 32)
L3Key2 = KDF(K, 3, streams * 2)
//
// For each iteration, extract key and three-layer hash.
Krovetz et al. Expires April 2001 [Page 29]
INTERNET-DRAFT UMAC October 2000
// If length(M) <= L1-KEY-LEN, then skip L2-HASH.
//
Y = <empty string>
for i = 1 to streams do
L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN]
L2Key_i = L2Key [(i-1) * 28 + 1 .. i * 28]
L3Key1_i = L3Key1[(i-1) * 32 + 1 .. i * 32]
L3Key2_i = L3Key2[(i-1) * 2 + 1 .. i * 2]
A = L1-HASH-16(L1Key_i, M)
if (length(M) <= L1-KEY-LEN) then
B = zeroes(12) || A
else
B = L2-HASH-16(L2Key_i, A)
C = L3-HASH-16(L3Key1_i, L3Key2_i, B)
Y = Y || C
Y = Y[1 .. UMAC-OUTPUT-LEN]
Return Y
7 UMAC tag generation
Tag generation for UMAC proceeds as follows. Use UHASH to hash the
message and apply the PDF to the nonce to produce a string to xor
with the UHASH output. The resulting string is the authentication-
tag.
7.1 Interface
Function Name:
UMAC
Input:
K, string of length UMAC-KEY-LEN bytes.
M, string of length less than 2^64 bytes.
Nonce, string of length 1 to 16 bytes.
Output:
AuthTag, string of length UMAC-OUTPUT-LEN bytes.
7.2 Algorithm
Compute AuthTag using the following algorithm.
if (WORD-LEN == 2) then
HashedMessage = UHASH-16(K, M)
else
HashedMessage = UHASH-32(K, M)
Krovetz et al. Expires April 2001 [Page 30]
INTERNET-DRAFT UMAC October 2000
Pad = PDF(K, Nonce)
AuthTag = Pad xor HashedMessage
Return AuthTag
8 Security considerations
As a specification of a message authentication code, this entire
document is about security. Here we describe some security
considerations important for the proper understanding and use of
UMAC.
8.1 Resistance to cryptanalysis
The strength of UMAC depends on the strength of its underlying
cryptographic functions: the key-derivation function (KDF) and the
pad-derivation function (PDF). In this specification it is assumed
that both operations are implemented using the Advanced Encryption
Standard (AES). However, the full design and specification allow for
the replacement of these components. Indeed, it is straightforward
to use other block ciphers or other cryptographic objects, such as
SHA-1 or HMAC for the realization of the KDF or PDF.
The core of the UMAC design, the UHASH function, does not depend on
any "cryptographic assumptions": its strength is specified by a
purely mathematical property stated in terms of collision
probability, and this property is proven in an absolute sense. In
this way, the strength of UHASH is guaranteed regardless of future
advances in cryptanalysis.
The analysis of UMAC [3, 4] shows this scheme to have "provable
security", in the sense of modern cryptography, by way of tight
reductions. What this means is that an adversarial attack on UMAC
which forges with probability significantly exceeding the established
collision probability will give rise to an attack of comparable
complexity which breaks the AES, in the sense of distinguishing AES
from a family of random permutations. This design approach
essentially obviates the need for cryptanalysis on UMAC:
cryptanalytic efforts might as well focus on AES, the results imply.
8.2 Tag lengths and forging probability
A MAC algorithm is used between two parties that share a secret MAC
key, K. Messages transmitted between these parties are accompanied
by authentication tags computed using K and a given nonce. Breaking
Krovetz et al. Expires April 2001 [Page 31]
INTERNET-DRAFT UMAC October 2000
the MAC means that the attacker is able to generate, on its own, a
new message M (i.e. one not previously transmitted between the
legitimate parties) and to compute on M a correct authentication tag
under the key K. This is called a forgery. Note that if the
authentication tag is specified to be of length t then the attacker
can trivially break the MAC with probability 1/2^t. For this the
attacker can just generate any message of its choice and try a random
tag; obviously, the tag is correct with probability 1/2^t. By
repeated guesses the attacker can increase linearly its probability
of success.
UMAC is designed to make this guessing strategy the best possible
attack against UMAC as long as the attacker does not invest the
computational effort needed to break the underlying cipher, e.g. AES,
used to produce the one time pads used in UMAC computation. More
precisely, under the assumed strength of this cipher UMAC provides
for close-to-optimal security with regards to forgery probability as
represented in the next table.
--------------------------------------------------------------------
UHASH-OUTPUT-LEN Forging probability Approximate actual forging
(bytes) using a random tag probability in UMAC
(using a clever tag)
2 2^-16 2^-15
4 2^-32 2^-30
8 2^-64 2^-60
16 2^-128 2^-120
--------------------------------------------------------------------
Recall that the parameter UHASH-OUTPUT-LEN specifies the length of
the UMAC authentication tag. The above table states, for example,
for the case of an 8-byte tag that the ideal forging probability
would be 2^-64 while UMAC would withstand an actual forging
probability of 2^-60. Note that under this tag length (which is the
default length in UMAC) the probability of forging a message is well
under the chance that a randomly guessed DES key is correct. DES is
now widely seen as vulnerable, but the problem has never been that
some extraordinarily lucky attacker might, in a single guess, find
the right key. Instead, the problem is that large amounts of
computation can be thrown at the problem until, off-line, the
attacker finds the right key.
With UMAC, off-line computation aimed at exceeding the forging
probability is hopeless, regardless of tag length, as long as the
underlying cipher is not broken. The only way to forge is to
interact with the entity that verifies the MAC and to try a huge
amount of forgeries before one is likely to succeed. The system
Krovetz et al. Expires April 2001 [Page 32]
INTERNET-DRAFT UMAC October 2000
architecture will determine the extent to which this is possible. In
a well-architected system there should not be any high-bandwidth
capability for presenting forged MACs and determining if they are
valid. In particular, the number of authentication failures at the
verifying party should be limited. If a large number of such
attempts are detected the session key in use should be dropped and
the event be recorded in an audit log.
Let us reemphasize: a forging probability of 1 / 2^60 does not mean
that there is an attack that runs in 2^60 time - as long as AES
maintains its believed security there is no such attack for UMAC.
Instead, a 1 / 2^60 forging probability means that if an attacker
could try out 2^60 MACs, then the attacker would probably get one
right. But the architecture of any real system should render this
infeasible. One can certainly imagine an attacker having a high
bandwidth channel (e.g., 1 Gbit/second or more) over which it can
continually present attempted forgeries, the attacker being signaled
when a correct tag is found, but realizing such a scenario in a real
system would represent a major lapse in the security architecture.
It should be pointed out that once an attempted forgery is
successful, it is entirely possible that all subsequent messages
under this key may be forged, too. This is important to
understanding in gauging the severity of a successful forgery.
In conclusion, the default 64-bit tags seem appropriate for most
security architectures and applications. In cases where when the
consequences of an authentication failure are not extremely severe,
and when the system architecture is designed to conscientiously limit
the number of forgery attempts before a session is torn down, 32-bit
authentication tags may be adequate. For the paranoid, or if an
attacker is allowed a fantastic number of forgery tests, 96- or
128-bits may be utilized.
8.3 Selective-assurance authentication
We have already remarked about the flexibility built into UMAC to use
authentication tags of various lengths: shorter tags are faster to
compute and one needs to transmit fewer bits, but the forging
probability is higher. There is an additional degree of flexibility
built into the design of UMAC: even if the sender generates and
transmits a tag of 8 bytes, say, a receiver may elect to verify only
the first 4 bytes of the tag, and computing that 4-byte prefix by the
receiver will be substantially faster than computing what the full
8-byte tag would be. Indeed when WORD-LEN is 2 one can more quickly
check the 2-byte prefix of the tag than the 4-byte prefix of the tag,
one can more quickly check the 4-byte prefix of the tag than the
Krovetz et al. Expires April 2001 [Page 33]
INTERNET-DRAFT UMAC October 2000
6-byte prefix of the tag, and so forth. When WORD-LEN is 4 one can
more quickly check the 4-byte prefix of the tag than an entire 8-byte
tag, and so forth. This type of flexibility allows different parties
who receive a MAC (as in a multicast setting) to authenticate the
transmission to the extent deemed necessary and to the extent
consistent with any computational limits of the receiver.
In a scenario where receivers are allowed to verify short prefixes of
longer tags, it is even more important that conservative policies are
followed when a bad tag is presented to the receiver. Because short
prefixes are easier to forge than are long ones, an attacker may
attempt to forge short prefixes and then leverage information gained
from these attacks to forge longer tags. If the attacker can learn
which short tags are good and which are bad, the attacker may be able
to learn enough to allow longer forgeries.
One salient feature of the security-performance trade-off offered by
UMAC is its usability in contexts where performance is severely
constrained. In such cases, using a mild-security authentication tag
can be of significant value especially if the alternative would be
not to use authentication at all (a possible such scenario could be
the high-speed transmission of real-time multimedia streams).
Another potential scenario where short and fast-to-compute tags can
be useful is for fast detection of data forgery intended as a denial
of service attack. In this case, even a moderate increase in the
attacker's difficulty to produce forgeries may suffice to make the
attack worthless for the attacker. Moreover, being able to detect
just a portion of attempted forgeries may be enough to identify the
attack.
8.4 Nonce considerations
The function UMAC (Section 7) requires a nonce with length in the
range 1 to 16 bytes. All nonces in an authentication session must be
equal in length. For secure operation, no nonce value should be
repeated within the life of a single UMAC session-key.
To authenticate messages over a duplex channel (where two parties
send messages to each other), a different key could be used for each
direction. If the same key is used in both directions, then it is
crucial that all nonces be distinct. For example, one party can use
even nonces while the other party uses odd ones. The receiving party
must verify that the sender is using a nonce of the correct form.
This specification does not indicate how nonce values are created,
updated, or communicated between the entity producing a tag and the
entity verifying a tag. The following exemplify some of the
Krovetz et al. Expires April 2001 [Page 34]
INTERNET-DRAFT UMAC October 2000
possibilities:
1. The nonce is an eight-byte [resp., four-byte] unsigned number,
Counter, which is initialized to zero, which is incremented by
one following the generation of each authentication tag, and
which is always communicated along with the message and the
authentication tag. An error occurs at the sender if there is an
attempt to authenticate more than 2^64 [resp., 2^32] messages
within a session.
2. The nonce is a 16-byte unsigned number, Counter, which is
initialized to zero and which is incremented by one following the
generation of each authentication tag. The Counter is not
explicitly communicated between the sender and receiver.
Instead, the two are assumed to communicate over a reliable
transport, and each maintains its own counter so as to keep track
of what the current nonce value is.
3. The nonce is a 16-byte random value. (Because repetitions in a
random n-bit value are expected at around 2^(n/2) trials, the
number of messages to be communicated in a session using n-bit
nonces should not be allowed to approach 2^(n/2).)
We emphasize that the value of the nonce need not be kept secret.
When UMAC is used within a higher-level protocol there may already be
a field, such as a sequence number, which can be co-opted so as to
specify the nonce needed by UMAC. The application will then specify
how to construct the nonce from this already-existing field.
Note that if the nonce starts at zero and is incremented with each
message then an attacker can easily ascertain the number of messages
which have been sent during a session. If this is information which
one wishes to deny the attacker then one might have the sender
initialize the nonce to a random value, rather than to zero.
Inspecting the current nonce will no longer reveal to the attacker
the number of messages which have been sent during this session.
This is a computationally cheaper approach than enciphering the
nonce.
8.5 Guarding against replay attacks
A replay attack entails the attacker repeating a message, nonce, and
authentication tag. In systems, replay attacks may be quite
damaging, and many applications will want to guard against them. In
UMAC, this would normally be done at the receiver by having the
receiver check that no nonce value is used twice. On a reliable
Krovetz et al. Expires April 2001 [Page 35]
INTERNET-DRAFT UMAC October 2000
connection, when the nonce is a counter, this is trivial. On an
unreliable connection, when the nonce is a counter, one would
normally cache some "window" of recent nonces. Out-of-order message
delivery in excess of what the window allows will result in rejecting
otherwise valid authentication tags.
We emphasize that it is up to the receiver when a given message,
nonce and tag will be deemed authentic. Certainly the tag should be
valid for the message and nonce, as determined by UMAC, but the
message may still be deemed inauthentic because the nonce is detected
to be a replay.
9 Acknowledgments
Thanks are due to David Balenson and David Carman of NAI Labs, who
suggested the advantages of allowing a receiver to verify
authentication tags to various forgery probabilities. Thanks are
also due to David McGrew and Scott Fluhrer of Cisco Systems for
encouraging us to improve UMAC performance on short messages.
Phillip Rogaway, John Black, and Ted Krovetz were supported in this
work under Rogaway's NSF CAREER Award CCR-962540, and under MICRO
grants 97-150, 98-129, and 99-103 funded by RSA Data Security, Inc.,
and ORINCON Corporation. Much of Rogaway's work was carried out
during two sabbatical visits to Chiang Mai University. Special thanks
to Prof. Darunee Smawatakul for helping to arrange these visits.
10 References
[1] ANSI X9.9, "American National Standard for Financial
Institution Message Authentication (Wholesale)", American
Bankers Association, 1981. Revised 1986.
[2] M. Bellare, R. Canetti, and H. Krawczyk, "Keyed hash functions
and message authentication", Advances in Cryptology - CRYPTO
'96, LNCS vol. 1109, pp. 1-15. Full version available from
http://www.research.ibm.com/security/keyed-md5.html/
[3] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway,
"UMAC: Fast and provably secure message authentication",
Advances in Cryptology - CRYPTO '99, LNCS vol. 1666, pp.
216-233. Full version available from
http://www.cs.ucdavis.edu/~rogaway/umac
[4] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway,
"The UMAC message authentication code", work in progress, 2000.
Krovetz et al. Expires April 2001 [Page 36]
INTERNET-DRAFT UMAC October 2000
To be available from http://www.cs.ucdavis.edu/~rogaway/umac
[5] L. Carter and M. Wegman, "Universal classes of hash functions",
Journal of Computer and System Sciences, 18 (1979), pp.
143-154.
[6] O. Goldreich, S. Goldwasser and S. Micali, "How to construct
random functions", Journal of the ACM, 33, No. 4 (1986), pp.
210-217.
[7] S. Halevi and H. Krawczyk, "MMH: Software message
authentication in the Gbit/second rates", Fast Software
Encryption, LNCS Vol. 1267, Springer-Verlag, pp. 172-189, 1997.
[8] ISO/IEC 9797-1, "Information technology - Security techniques -
Data integrity mechanism using a cryptographic check function
employing a block cipher algorithm", International Organization
for Standardization, 1999.
[9] H. Krawczyk, M. Bellare, and R. Canetti, "HMAC: Keyed-hashing
for message authentication", RFC-2104, February 1997.
[10] T. Krovetz, and P. Rogaway, "Fast universal hashing with small
keys and no preprocessing", work in progress, 2000. To be
available from http://www.cs.ucdavis.edu/~rogaway/umac
[11] T. Krovetz, and P. Rogaway, "Variationally universal hashing",
work in progress, 2000. To be available from
http://www.cs.ucdavis.edu/~rogaway/umac
[12] M. Wegman and L. Carter, "New hash functions and their use in
authentication and set equality", Journal of Computer and
System Sciences, 22 (1981), pp. 265-279.
11 Author contact information
Authors' Addresses
John Black
Department of Computer Science
University of Nevada
Reno NV 89557
USA
EMail: jrb@cs.unr.edu
Shai Halevi
Krovetz et al. Expires April 2001 [Page 37]
INTERNET-DRAFT UMAC October 2000
IBM T.J. Watson Research Center
P.O. Box 704
Yorktown Heights NY 10598
USA
EMail: shaih@watson.ibm.com
Alejandro Hevia
Department of Computer Science & Engineering
University of California at San Diego
La Jolla CA 92093
USA
EMail: ahevia@cs.ucsd.edu
Hugo Krawczyk
Deprtment of Electrical Engineering
Technion
Haifa 32000
ISRAEL
EMail: hugo@ee.technion.ac.il
Ted Krovetz
Intel Corporation
1900 Prairie City Road
Folsom CA 95630
USA
EMail: tdk@acm.org
Phillip Rogaway
Department of Computer Science
University of California
Davis CA 95616
USA
EMail: rogaway@cs.ucdavis.edu
A Suggested application programming interface (API)
/* umac.h */
typedef struct UMAC_CTX *umac_ctx_t;
umac_ctx_t umac_alloc(char key[]);
/* Dynamically allocate UMAC_CTX struct */
Krovetz et al. Expires April 2001 [Page 38]
INTERNET-DRAFT UMAC October 2000
/* initialize variables and generate */
/* subkeys for default parameters. */
int umac_free(umac_ctx_t ctx);
/* Deallocate the context structure. */
int umac_set_params(umac_ctx_t ctx, void *params);
/* After default initialization, */
/* optionally set parameters to */
/* different values and reset for */
/* new message. */
int umac_update(umac_ctx_t ctx, char *input, long len);
/* Incorporate len bytes pointed to by */
/* input into context ctx. */
int umac_final(umac_ctx_t ctx, char tag[], char nonce[]);
/* Incorporate nonce value and return */
/* tag. Reset ctx for next message. */
int umac(umac_ctx_t ctx, char *input, long len,
char tag[], char nonce[]);
/* All-in-one (non-incremental) */
/* implementation of the functions */
/* umac_update() and umac_final(). */
Each routine returns zero if unsuccessful.
B Reference code and test vectors
See the UMAC World Wide Web homepage for reference code and test
vectors.
http://www.cs.ucdavis.edu/~rogaway/umac/
Krovetz et al. Expires April 2001 [Page 39]