Network Working Group Rolf Blom, Ericsson
INTERNET-DRAFT Elisabetta Carrara, Ericsson
Expires: April 2001 Karl Norrman, Ericsson
Mats Naslund, Ericsson
Sweden
November 15, 2000
RTP Encryption for 3G Networks
<draft-blom-rtp-encrypt-00.txt>
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
This document describes a method for confidentiality protection
(encryption) of the payload in conversational multimedia applications
running over the Real-time Transport Protocol [RTP].
The proposal is based on the 3GPP (3rd Generation Partnership
Proposal) confidentiality algorithm "f8", and the new Advanced
Encryption Standard (AES). The proposed scheme satisfies all the
requirements put forward in [CMSec], such as being error-robust and
allowing for bandwidth-saving header compression. Most important, the
solution is based on a security mechanism that has undergone public
scrutiny, and is widely accepted to be secure.
Blom, Carrara, Norrman, Naslund [Page 1]
INTERNET-DRAFT RTP-encrypt November 15, 2000
TABLE OF CONTENTS
1. Introduction..................................................2
1.1. Conventions.................................................3
1.2. Notation....................................................3
2. Background....................................................4
2.1. 3GPP Confidentiality, the f8-algorithm......................4
2.1.1 f8-mode of operation.......................................4
2.1.2. Misty/Kasumi..............................................7
2.1.3. Performance...............................................7
2.2. AES: an alternative to Kasumi...............................7
3. Proposal for RTP Payload Encryption...........................8
3.1. F8-mode for RTP Payload Encryption..........................8
3.1.1. Parameter negotiation.....................................9
3.1.2. Cipher Initialization....................................10
3.1.3. IV Calculation...........................................10
3.1.4. Encryption at sending end................................11
3.1.5. Decryption at receiving end..............................11
3.1.6. Multicast................................................11
4. Key Management...............................................11
5. Security Considerations......................................13
5.1. Confidentiality of the RTP Payload.........................13
5.2. Confidentiality of the RTP Header..........................14
5.3. Message Integrity..........................................14
6. Implementation experience and simulation results.............16
7. Conclusions..................................................17
8. Intellectual Property Rights Statement.......................17
9. Acknowledgments..............................................17
10. Authors addresses...........................................18
11. References..................................................18
Appendix A. Example Test-vectors................................20
1. Introduction
As discussed in [CMSec], there are a number of requirements that
immediately arise when designing an encryption scheme for packet-
switched, real-time data sent on wireless (unreliable) media. The
requirements are:
- The encryption scheme should avoid error-propagation (error-
robustness)
- The mechanism must be fast, and must be implemented efficiently in
thin clients
- The encryption scheme has to show a "fast-forward/rewind" property
Blom, Carrara, Norrman, Naslund [Page 2]
INTERNET-DRAFT RTP-encrypt November 15, 2000
- The encryption scheme should not expand the message size
- To allow for header compression over the air link, e.g. [ROHC], it
is necessary to avoid end-to-end (e2e) encryption of RTP headers
(the security aspects of this are discussed below, in Section 5.)
Therefore, as noted in [CMSec], this leads to the choice of employing
e2e encryption only on the RTP payload, using either a block cipher
operating in a suitable feedback mode, or a pure stream cipher with a
random-access property into different locations of the keystream. We
propose a scheme of the former type, and motivate this as follows.
In terms of security, the components of the proposal have undergone a
fair amount of public scrutiny without the detection of any
weaknesses, see Section 5.
In a feedback type mode, the central ingredient is the block-cipher
used. As far as modern cryptology knows, the security basically
stands (and falls) with the security of the block cipher if
implemented wisely. This means that if a weakness is found, replacing
the block cipher with a new one will most likely remedy the security
problems.
Good stream ciphers available for public use are rare: there are far
more (presumed) secure block ciphers than stream ciphers in this
cathegory. In particular, the "random-access" property into the
keystream that we desire, disqualifies all but a few stream ciphers
such as the Software Encryption Algorithm [SEAL].
Finally, the encryption mechanism (mode of operation) we propose was
tailor-made for confidentiality protection in 3G networks.
In section 2, we describe a solution that is used in a similar
environment. In section 3, we apply a modified version of this
solution in the context of RTP payload encryption, and propose to use
the AES algorithm as the cryptographic core of this solution.
1.1. Conventions
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" that
appear in this document are to be interpreted as described in [RFC-
2119].
1.2. Notation
Except when otherwise noted, we use the same nomenclature as in
[CMSec].
Blom, Carrara, Norrman, Naslund [Page 3]
INTERNET-DRAFT RTP-encrypt November 15, 2000
The symbol || denotes concatenation of two binary strings, and XOR
denotes bitwise addition modulo 2.
The prefix 0x is used to denote hexadecimal numbers. Data and
variables are presented with their most significant bytes (or bits)
to the left, and bit (and byte) indices are 0,1,2,.. counting left to
right. In general, X[i] denotes the ith bit of X, i.e. X = X[0] ||
X[1] || X[2] ...
2. Background
2.1. 3GPP Confidentiality, the f8-algorithm
To encrypt UMTS data, the 3GPP has developed a solution (see [ES3D])
known as the f8-algorithm. On a high level, the proposed scheme is a
variant of Output Feedback Mode (OFB), see [HAC], with a more
elaborate initialization and feedback function. As in normal OFB, the
core consists of a block cipher. 3GPP specifies this block cipher to
be Kasumi, see [ES3D], which is a derivative of Matsui's Misty
algorithm, [MAT]. Kasumi may, in principle, be replaced by any secure
block cipher (possibly after adjusting some parameters).
In the next sections, we describe f8, Kasumi, its precursor Misty and
the recently selected AES.
2.1.1. f8-Mode of Operation
Figure 1 shows the structure of an arbitrary b-bit block size cipher,
E, running in what we shall call "f8-mode of operation". (In the 3GPP
specifiaction, E is the 64-bit block cipher "Kasumi", see below).
Blom, Carrara, Norrman, Naslund [Page 4]
INTERNET-DRAFT RTP-encrypt November 15, 2000
IV
|
|
\|/
+------+
| |
--->| E |
| | |
| +------+
| |
m --> * |--------------------------- ... -------|
_____ | IV' | | | |
| | ct=1 --> * ct=2 --> * ... ct=L-1 --> *
| | | | |
| | --> * --> * ... --> *
| \|/ | \|/ | \|/ | \|/
| +------+ | +------+ | +------+ | +------+
| | | | | | | | | | | |
k -------->| E | | | E | | | E | | | E |
| | | | | | | | | | |
+------+ | +------+ | +------+ | +------+
| | | | | | |
|------ |-------- | ... ---- |
| | | |
\|/ \|/ \|/ \|/
S(0) S(1) S(2) . . . S(L-1)
Figure 1. In the figure, asterisk, *, denotes bitwise XOR.
Let E(k,B) be the b-bit output of E when applied to the l-bit key k
and plaintext block B. Let ct, IV, IV', and S() denote b-bit
quantities, and m be an l-bit string (determined below). The
encryption of an n-bit plaintext P = P[0] || P[1] || .. || P[n-1], is
then performed as follows.
Set IV' = E(k XOR m, IV), and ct = S(-1) = 00..0. For j = 0,1,.., L-1
where L = n/b(rounded up to nearest integer) compute
S(j) = E(k,IV' XOR ct XOR S(j-1)), (Eq. 1)
ct = ct + 1 mod 2^b (Eq. 2)
Blom, Carrara, Norrman, Naslund [Page 5]
INTERNET-DRAFT RTP-encrypt November 15, 2000
Let S = S(0) || S(1) || .. || S(L-1), the concatenation of successive
outputs from E. Then, the ciphertext, C, is determined from the
plaintext P and this S by
C[i] = S[i] XOR P[i], i = 0,1,..,n-1 (Eq. 3)
i.e. the normal bitwise XOR.
Thus, in the figure, k is the key, and IV is the initialization
vector. Notice that the IV is not used directly. Instead it is fed
through E under another key to produce an internal, "salted" value
(denoted IV'). The motivation for this is explained below. Thus, the
input supplied to each consecutive application of E (besides the key
k) consists of the salted IV', XORed with the b-bit block-counter,
ct, and the previously generated keystream block (if available). The
value m is the fixed mask 0x555...(repeated to give as many bits as
the key-length). The IV is by 3GPP determined as
IV = COUNT || BEARER || DIR || 00..0
(padded with as many zeros as needed to fill the block length of E).
COUNT is a 32-bit value derived from a layer 1 frame counter and a
"hyper-frame" counter (to avoid reuse of key stream). BEARER is a 4-
bit bearer identifier, and DIR, finally, is a single bit,
distinguishing the up-link (terminal to radio base station) from the
down-link (base station to terminal).
If the supplied key k is shorter than what is prescribed by E, the
key bit-pattern is repeated in the least significant bits to the
desired length.
It is clear that the f8-mode of operation has the same properties as
normal OFB mode as far as error propagation and synchronization is
concerned. From a security point of view, the modified feedback has
been included to improve security. Basically, the first application
of E to the supplied IV is done to reduce effects of the fact that
the IV may be known or perhaps even controlled by an adversary. In
fact, even if this is the case, without knowledge of the key, it is
hard to predict the internal IV', that is then the initialization
vector actually used. Thus, this so called "whitening" or "salting"
makes it hard for an adversary to obtain known input-output pairs to
E, and makes analysis more difficult. The incorporation of the block
counter is there to prevent reuse of keystream (the sequence will not
have a short period). (It is assumed by 3GPP that no data of length >
5114 bits are to encrypted, and hence, in practice the block counter
will never wrap modulo 2^64. In fact, ct may then be implemented as a
7-bit counter.)
We next discuss Kasumi and its precursor, Misty.
Blom, Carrara, Norrman, Naslund [Page 6]
INTERNET-DRAFT RTP-encrypt November 15, 2000
2.1.2. Misty/Kasumi
Misty is a block cipher having block length 64 bits and a 128-bit
key. Thus, in OFB-type modes, 192 bits are needed to specify the key
and the IV. The design criteria of Misty were speed and, most
importantly, provable security against the most common cryptanalytic
attacks on block ciphers: linear and differential attacks. See [HAC]
for description of these attacks. There are currently no known
feasible attacks against Misty.
The differences between Misty and Kasumi are mainly that
modifications were made to improve hardware gate count and
performance, see [ES3E]. No real security weaknesses have been found,
arising from these modifications, again see [ES3E].
2.1.3. Performance
In [ES3D] it is claimed that the f8-design meets the following
hardware requirements:
- implementable in hardware in less than 10000 gates (an actual
figure of 3000 gates for Kasumi is mentioned)
- at least 2Mb/s encryption speed at 20MHz clock rate
- (re-)initialization within a 10ms time frame
An important thing to note is that most mobile terminals will have
the Kasumi based f8-algorithm implemented in hardware, offering very
good performance. It should be noted, however, that if we use f8 also
for application layer encryption, then we may be competing with the
UMTS air-link for the hardware resources. In particular, a fast
context switch is needed.
We give some actual performance figures in Section 6.
2.2. AES: an alternative to Kasumi
NIST has recently selected the new Advanced Encryption Standard, AES.
This is a block cipher known as "Rijndael" which has a block size of
128 bits, and allows for 128, 192, and 256 bit keys. For further
technical details, see [NIST,Ri]. We see AES/Rijndael as an excellent
alternative to Kasumi for the cryptographic core of the f8-mode of
operation.
- During the last three years, it has undergone extensive cryptologic
analysis, without revealing weaknesses
- It is a very fast algorithm (see Section 6)
Blom, Carrara, Norrman, Naslund [Page 7]
INTERNET-DRAFT RTP-encrypt November 15, 2000
- Its larger block and key sizes have advantages (discussed later)
3. Proposal for RTP Payload Encryption
With the above discussion in mind, we see an f8-type scheme as the
best solution for confidentiality protection of conversational
multimedia carried by the RTP protocol.
1) There is no error propagation. Concerning the error rate of the
channel, the only assumption is that packets that do arrive at the
application layer have no errors in their headers. The error control
is typically taken care of by the header compression scheme, [ROHC],
and possibly UDP Lite [UDPL]. (Thus, the negligible residual bit
error rate over the air link is even further decreased by the header
compression.)
2)There is no message expansion since there is no padding. Hence,
bandwidth is conserved.
3)As the mechanism is used on the application layer, additional IP
level security association numbers and sequence numbers are avoided.
4)Synchronization is easy to achieve via the use of the IV, giving
the desired random-access property.
5)It is flexible, as it can be based on any secure block cipher with
varying key and block sizes. In particular the AES, Rijndael [Ri] is
a good choice. Though slower than a designated stream cipher, speeds
of several Mb/s are possible both in hardware and software based on
benchmarks done on popular block ciphers, see e.g. [BOS,LIP] and
Section 6.
6)It is highly probable that hardware support for Rijndael will exist
in terminals.
Next, we discuss the adaptation of the f8-mode of operation for use
with RTP.
3.1. F8-Mode for RTP Payload Encryption
Assumptions:
1. We assume that a key exchange/parameter negotiation protocol
exists (see also Section 4).
2. For simplicity, we assume the communication to be unicast.
(Extension to multicast is mainly a key management problem and is
discussed below.)
Blom, Carrara, Norrman, Naslund [Page 8]
INTERNET-DRAFT RTP-encrypt November 15, 2000
3. The transport layer uses UDP. Some modifications to the
initialization steps may be needed if some other protocol is used,
and we discuss them later.
4. We assume that the application has knowledge of the 16-bit
port_number, on which the RTP packets are received. Typically, port
numbers are negotiated via a signalling protocol, such as SIP. We
furthermore assume that the combination (port_number, SSRC), where
SSRC is the 32-bit RTP Synchronization Source, is unique for each
media flow belonging to the same multimedia session between the two
communicating parties. This must be assured by the signalling
protocol during set-up, otherwise security is compromised (see
discussion below). Observe that the "key" may then be the same for
all media types/flows belonging to the same multimedia session
between the two parties, since the distinction responsibility lies on
the (port_number, SSRC)-pair.
In case the transport layer does not use UDP, a 16-bit quantity
serving as "port_number" and having the uniqueness properties
mentioned above must be agreed upon during initial set-up.
The encryption shall be done using the following steps.
3.1.1. Parameter negotiation
Using the key-exchange protocol, the following parameters shall be
agreed:
cipher: the algorithm E. The AES algorithm must be supported, other
algorithms, with block size at least 64, may be supported.
block_size: almost always determined uniquely by "cipher". It is 64
in the case of Kasumi, and usually 128 for AES (though Rijndael
supports larger block sizes) and is therefore normally implicit.
Note: as this parameter varies, different numbers of output keystream
bits will be generated per application of "cipher", depending on what
algorithm is used. This means that a 128-bit block algorithm will be
about twice as fast in encrypting a fixed amount of data as a 64-bit
algorithm. Notice that since AES must be supported, its definition
implies that block_size 128 must be supported in connection with AES.
Note that the internal block counter (this is "ct" in Figure 1)
should be implemented as a counter of block_size bits. However, if it
is known that the application never generates RTP packets of
(payload) size exceeding 2^t times the block size (bits), ct may be
implemented as a t-bit counter and then extended with leading zeros
to fill the block_size before use. In almost all cases, t = 32 or
even 16 will suffice.
Blom, Carrara, Norrman, Naslund [Page 9]
INTERNET-DRAFT RTP-encrypt November 15, 2000
key_size: the length of the key. The default shall be 128.
key: a (pseudo)randomly chosen binary string of key_size bits to be
used as the key. Important: we stress that the key must be random or
pseudo-random, otherwise security is compromised. For details on
pseudo-random generators and guidelines for key generation, see
[HAC,FIPS140].
If the key size is not in the set of key lengths supported by the
cipher, the key shall be appended to itself as many times as neeeded
to obtain, possibly after a final truncation to the most significant
bits, a length which is the set of supported key sizes (this may be
due to export restrictions). For example, if the minimum key size is
16 bits, the key 0x123 (of length 12) is first turned into 0x123123
(of length 24), and then truncated to 0x1231 which has the desired
length.
3.1.2. Cipher Initialization
Given the key and the IV, initialization is performed as in Figure 1,
computing the internal IV'. If the cipher allows, pre-computing the
cipher's key-schedules (one with the key, and one with key XOR
0x555...) may offer performance enhancement on the actual encryptions
that is follow. Thus, it remains to specify how the IV is formed.
3.1.3. IV Calculation
For each RTP packet to be transmitted, the following encryption shall
be performed. The payload of each RTP packet (i.e. starting
immediately after its RTP header) shall be encrypted by taking
information from the RTP header of said RTP packet, and the 16-bit
receiving port number (as negotiated inside the signalling protocol),
thereby forming the IV:
IV = port_number || SSRC || SEQ || TS || M || PT || 00...0
The SEQ (Sequence Number, 16 bits), SSRC (Synchronization Source, 32
bits), TS (Timestamp, 32 bits), PT (Payload Type, 7 bits), and M
(Marker Bit, 1 bit) fields are as specified by [RTP]. If the
block_size is between 64 and 104, the IV shall be truncated after the
corresponding number of bits in the generic IV (for example, if the
block_size is 64, then IV = port_number || SSRC || SEQ). If the
block_size is greater than 104, as many zeros as needed to fill the
block_size shall be appended.
By this choice of IV and the assumptions on the (port_number,SSRC)-
pairs made above, the flow from A -> B will automatically be
encrypted by a keystream distinct from the flow in the opposite
direction, B -> A. Some care is thus needed in avoiding reuse of the
Blom, Carrara, Norrman, Naslund [Page 10]
INTERNET-DRAFT RTP-encrypt November 15, 2000
same IV inside a multimedia session, in case the latter shares the
same key between its RTP sessions. See Sections 4 and 5 for further
considerations.
This IV shall then be entered as the IV to the f8-scheme and the
internal IV' is computed as described earlier (c.f. Figure 1).
3.1.4. Encryption at sending end
The RTP-payload only (not the RTP-header) shall then be encrypted in
accordance with Eq. 1, 2, and 3 above, i.e. be bitwise XORed with the
output keystream of the f8-scheme. That is, if the payload length is
n, n/"block_size" (rounded up) applications of "cipher" is needed to
produce sufficient keystream data. If the packet length is not an
integer multiple of block_size, any extra keystream bits generated
shall be ignored.
3.1.5. Decryption at receiving end
As the XOR operation is an involution, decryption shall be performed
in exactly the same way by the receiver.
3.1.6. Multicast
Observe that in a multicast scenario, each packet belonging to a
media flow will be sent to the same port_number for all intended
receivers. Thus, as long as the receiving parties share a key with
the sender, the above method (IV formation) may be used.
This concludes the formal description of the proposed scheme. Example
test vectors of the AES based implementation can be found in Appendix
A.
4. Key Management
Though beyond the scope of this draft, we make the following
observations that must be taken into account when finding a suitable
key-exchange protocol.
First note that as discussed, the IV forming allows us (if desired)
to use only one key for the bi-directional flows, and for all media
types in the overall multimedia session. Again, we stress that this
is due to the fact that we assume that the (port_number/SSRC)-pairs
are chosen uniquely during initialization, see above. If not, some
media flow may be encrypted by the same keystream as another, and
security is compromised. Distinct keys for different flows may of
Blom, Carrara, Norrman, Naslund [Page 11]
INTERNET-DRAFT RTP-encrypt November 15, 2000
course also be used, though efficiency will then be reduced due to
frequent context switches.
Key refresh: If a feedback type construction is used (as we propose
to do), one must be aware that generating a very long keystream from
the same IV/key pair, will eventually lead into "degeneration" of the
keystream, making it possible to distinguish it from a random string.
Whence, security might be compromised. The maximum allowable length
(for a good block cipher) depends only on the block_size, b.
Specifically, such bad behaviour will start to occur after about
2^(b/2) iterative applications of the cipher. With b = 128 (for AES)
or even b = 64 (Kasumi), this will in practice never happen, as
typical packet sizes for, say conversational audio are typically 32
bytes only. (The 3GPP f8-solution was designed with a maximum data
length of 5000 bits in mind and thus, a large security margin
exists.) If a shorter keystream is generated, there is still a slight
chance of having collisions in the keystream. However, the
probability of this occurring while encrypting a normal size packet
is (with b = 64) on the order 2^(-52) (see [ES3E]).
In [ES3E], it is also advised that the same key should not be used
together with more than about 50 million IVs. It is noted that this
corresponds to about 40 hours of data at 2Mbit/s. However, in our
case, since the sequence number (SEQ) is only 16 bits in length,
after that many IVs (packets) the keystream will start to repeat
itself. Therefore, if only the sequence number is used (as is the
case for a 64-bit block size), the key must be refreshed at least for
every 2^16 packets. Some practical implications of this can be
examined. Consider for instance an audio stream with 50 packets/sec.
The SEQ field will then wrap modulo 2^16 once every 20 minutes or so.
If a larger block size is used, then also the 32-bit timestamp (TS)
is included in the IV, thereby extending the "effective" sequence
numbers to a (theoretical) maximum of 48-bits. This maximum may not
always be achieved however. For instance, for an audio application
with 20 ms samples, the RTP timestamp typically increments by 160 for
each increment in SEQ. The combination SEQ || TS will then have the
form (j mod 2^16) || (160j mod 2^32), j = 0,1,.., and hence have
period 2^27 (the least common multiple of 2^16 and
2^32/gcd(2^32,160)) rather than 2^48. Still, this is very large
(about twice the recommended maximum of 50 million), and the need to
refresh the key due to exhaustion of the IV-space should be rare. For
other applications, e.g. video, the relation between SEQ and TS may
look different.
In summary, assisted by the key management functionality, the
application must keep track of when a key-refresh is needed, either
due to the fact that the RTP Sequence Number (possibly in combination
with the RTP Timestamp) wraps/re-cycles, or because so many IVs (e.g.
> 50 million) have been used with the same key that security may
start to be endangered. Obviously, the mechanism must ensure that
synchronization of the key refresh is obtained between
Blom, Carrara, Norrman, Naslund [Page 12]
INTERNET-DRAFT RTP-encrypt November 15, 2000
sender/receiver. It seems to be a reasonable solution to have the key
exchange protocol exchange a "master secret", from which consecutive
keys can be derived pseudo-randomly with the master secret as a seed,
similar to what is done in [TLS, WTLS].
5. Security Considerations
For general security overview of f8 (and Kasumi), we refer to the
analysis in [ES3E] and for the AES algorithm, see [Ri,AES]. (If
another block cipher is used, the security of that algorithm will of
course be an issue.)
However, there are some specific issues that arise in this
application that we discuss in the next paragraphs.
In a real-time scenario, key management must not be overloading, both
for time and bandwidth reasons. To use a fixed key for all the RTP
sessions which belong to the same multimedia session, special
attention has to be placed on possible collisions in the way the IV
is formed (see also above). The requirement is not to end up with two
identical IVs, for the same key. The RTP standard defines some
recommendations on how to use SSRC (unique inside a single RTP
session, also accompanied by an anti-collision algorithm, and port
numbers are recommended for demultiplexing). However, without special
collision detection, there may be unlucky (port_number,SSRC)-
combinations. We therefore believe it is up to the implementation not
to end up in such a situation, and as noted above, we recommend a
careful choice of the receiving ports (being negotiated inside the
call control protocol, e.g. SIP). Again, the required property is
that port numbers should be chosen to avoid the same combination of
SSRC/port number inside RTP sessions which belong to the same
multimedia session.
Notice that the uniqueness requirement on the (port_number,SSRC)-
combination allows for a theoretical maximum of 2^48 paralell flows,
belonging to the same multimedia session.
Security may, of course, also be obtained by asserting that distinct
keys are used for the different streams (and their two directions)
belonging to the same RTP session if one is willing to perform more
frequent context switches.
5.1. Confidentiality of the RTP Payload
It is important to be aware that, as with any stream cipher, the
exact length of the payload is revealed by the encryption. This means
that it may be possible to deduce certain "formatting bits" of the
payload, as the length of the CODEC output might vary due to certain
parameter settings etc. This, in turn, implies that the corresponding
Blom, Carrara, Norrman, Naslund [Page 13]
INTERNET-DRAFT RTP-encrypt November 15, 2000
bit of the keystream can be deduced. However, if the stream cipher is
secure, knowledge of a few bits of the keystream will not aid an
attacker in predicting the following keystream bits. Thus, the
payload length (and information deducible from this) will leak, but
nothing else.
5.2. Confidentiality of the RTP Header
With our proposal, RTP headers are sent in the clear to allow for
header compression. This means that data such as payload type,
synchronization source identifier, and timestamp are available to an
eavesdropper. Moreover, since RTP allows for future extensions of
headers, we cannot foresee what kind of possibly sensitive
information might also be "leaked".
Our proposal is a low-cost method, which allows header compression to
reduce bandwidth. It is up to the endpoints policies to decide about
the security scheme to employ. If the header compression is omitted,
other solutions might be applicable, e.g. [IPsec]. In other words, we
provide a solution that works in the most demanding scenario:
conversational multimedia over low-bandwidth, unreliable media. Of
course the solution will then also work in less restricted
environments, but we suggest that if one really needs to protect
headers, and is allowed to do so by the surrounding environment, then
he should also look at alternatives. In addition, we strongly
recommend the use of profiles to select the right trade-off for the
required level of security.
5.3. Message Integrity
The purpose of this draft is only to treat confidentiality, not
integrity. However, we realize the importance of this cryptographic
primitive, and a few things can be said.
Adding a "full strength" message authentication code (MAC) to each
packet is unacceptable for bandwidth reasons. For instance, a 160-bit
HMAC [HAC] field will in principle double the data size for a typical
audio packet. If size is reduced to bandwidth-acceptable values by
truncating the MAC, perhaps to one or two bytes, a very low security-
level is obtained.
Moreover, due to possible errors induced by the transmission medium,
an integrity check will put down packets that only have minor errors
inflicted "non-adversarialy" by the channel, and will reduce the
quality of the received signal.
If bandwidth considerations allow it, adding a standard integrity
mechanism is of course the right thing to do, but again, if we are in
Blom, Carrara, Norrman, Naslund [Page 14]
INTERNET-DRAFT RTP-encrypt November 15, 2000
this more favourable situation, one can perhaps use another, more
complete security mechanism, e.g. [IPsec].
Recall that we are aiming at providing security for conversational
multimedia. What kind of "useful" attacks can be mounted to violate
data integrity in such cases?
First, notice that some amount of integrity for the RTP headers is
obtained for the parts that enter into the IV-formation (SSRC, SEQ,
TS, PT, and M). Namely, should any of these be modified during
transmisson, the decryption will only produce "random garbage",
reducing an attack to the integrity of these fields to pure denial-
of-service (DoS) aspects.
If an attacker records a session for a later replay-attack,
possibilities of obtaining something useful seem remote. First of
all, we may assume that the replay takes place at a later time within
the same session. (Otherwise, with overwhelming probability, the new
session will be running with a new distinct key and the receiver
would then decrypt with the, for the attacker, wrong key and only
"garbage" data is produced). But if the replay occurs within the same
session, that would imply that the attacker needs to modify the
header sequence numbers and timestamps. As these fields enter the
encryption algorithm, an uncorrelated keystream will then be used by
the receiver, and garbage will again be produced. We conclude that
attempting to do a replay attack, and any attempt in altering
important parts of the RTP header will mainly have DoS effects.
In addition, due to the real-time aspects, there will be a natural
window mechanism implemented by the application, making alterations
of synchronization data (e.g. timestamps) to have mainly DoS aspects.
If a window mechanism and/or buffering for received packets is used,
a perhaps more serious attack (also of DoS type), would be to modify
the sequence number and timestamp fields, causing the receiver to
move his window too far away. The worst case seems to be if this
attack is launched at the beginning of the session.
Another attack could be to try to modify bits of the encrypted
payload in real-time. Due to the absence of error propagation in the
decryption process, it is indeed possible for an attacker to "flip"
individual bits for the receiver. However, he does not know the
outcome of the flip. Minor alterations (one or two bits) will
probably still allow audio/video decoders to reconstruct data at an
acceptable quality level for the receiver. However, we cannot exclude
the fact that flipping bits will be of some use to an attacker, for
instance if those bits have some "simple" meaning to the CODEC. We
are aiming at a general solution for RTP data, and one therefore has
to study the format of each possible payload type to understand
exactly what can be gained by an attacker in each case. Moreover, we
cannot foresee possible future formats, not yet defined.
Blom, Carrara, Norrman, Naslund [Page 15]
INTERNET-DRAFT RTP-encrypt November 15, 2000
Again, in most conversational multimedia applications we foresee, we
feel that it will be hard to do harmful attacks against the
integrity, since data is presented in real-time to the users, and
since there is also real-time interaction between the communicating
parties. One could say that there is, to some extent, an automatic
and "manual" sanity-check done by the users as the data has direct
meaning and can be interpreted by them.
In summary, integrity protection may be needed, but it is to a large
extent up to the application to decide this, taking bandwidth costs
and quality into the budget. The only application-independent
observation we can make at this point is that an attacker
manipulating the data will be able to launch DoS attacks. However,
adding a MAC field does not prevent this, it merely provides a
detection method before data enters the CODEC.
6. Implementation experience and simulation results
We have implemented the f8-mode of operation using the AES as the
underlying block cipher with 128 bit block size and 128 bit key size.
Both algorithms were implemented in C (using the Microsoft Visual C++
Compiler).
The implementation of AES was made compact, that is, we only used
tables for the S-box, its inverse and the round constants, for a
total of 552 bytes. Everything else was calculated in real-time. With
this approach we were able to encrypt at approximatly 22 Mbit/s on a
Pentium II with a clock frequency of 266 Mhz.
When measuring the encryption-speed of the f8-algorithm we used
simulated RTP-packets with a typical 96-bit header (which was not
encrypted) and a 32, 64 and 128 octet random payload (i.e 256, 512
and 1024 bits). With this scenario we need one block encryption to
set the internal IV, and two, four and eight block encryptions
respectively, to produce the keystream for each packet. Heuristicly,
this implies that the f8-algorithm should be able to encrypt
somewhere around 16.5 Mbit/s for 32-byte packets. The actual
throughput will be a bit less because of some overhead. Since
encryption and decryption are the same operation, the same figures
hold for decryption.
We encrypted 10^6 packets of each size and took the average
encryption times, which are presented in the table below.
Blom, Carrara, Norrman, Naslund [Page 16]
INTERNET-DRAFT RTP-encrypt November 15, 2000
Payload Size (bits) Encrypt/Decrypt (Mbit/s)
256 13.6
512 15.9
1024 17.9
Table 1: Performance of AES-based f8.
Calculation of the internal IV (copying of some data from the packet
header and one block encryption) takes around 6 microseconds.
Key-refresh and initialization of the keystream generator (we view
initialization as a key-refresh from a non-existent key) takes
approximatly 12 microseconds.
For comparison, 20Mb/s at 100MHz is reported for Misty in [MIT]. This
is about 3 times faster than typical (optimized) 3DES
implementations.
Estimates for hardware performance for AES can be obtained from
[AES,Ri]. Similar estimates for Kasumi and f8 can be found in [ES3E].
7. Conclusions
We have made a proposal for a secure, fast, and flexible encryption
scheme that has the necessary robustness properties in terms of
fault-tolerance needed in conversational multimedia applications over
low-bandwidth, unreliable media.
8. Intellectual Property Rights Statement
Pursuant to the provisions of [RFC-2026], the authors represent that
they have disclosed the existence of any proprietary or intellectual
property rights in the contribution that are reasonably and
personally known to the authors. The authors do not represent that
they personally know of all potentially pertinent proprietary and
intellectual property rights owned or claimed by the organizations
they represent or third parties.
9. Acknowledgments
We thank Krister Svanbro and Vicknesan Ayadurai for instructing us on
the inner workings of header compression, and Andras Mehes and Jari
Arkko for helpful comments on this draft. We are grateful to Morgan
Blom, Carrara, Norrman, Naslund [Page 17]
INTERNET-DRAFT RTP-encrypt November 15, 2000
Lindqvist, Johan Sjoberg, Torbjorn Einarsson, and Magnus Westerlund
for sorting out various RTP questions.
10. Authors' addresses
Rolf Blom Tel: +46 8 58531707
Ericsson Research
Stockholm, Sweden EMail: rolf.blom@era.ericsson.se
Elisabetta Carrara Tel: +46 8 50877040
Ericsson Research
Stockholm, Sweden EMail: elisabetta.carrara@era.ericsson.se
Karl Norrman Tel: +46 8 58531225
Ericsson Research
Stockholm, Sweden EMail: karl.norrman@era.ericsson.se
Mats Naslund Tel: +46 8 58533739
Ericsson Research
Stockholm, Sweden EMail: mats.naslund@era.ericsson.se
11. References
[AES] NIST, "Advanced Encryption Standard (AES)",
http://csrc.nist.gov/encryption/aes/
[BOS] Bosselaers, A., "Fast Implementations on the Pentium",
http://www.esat.kuleuven.ac.be/~bosselae/fast.html
[ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
Algorithms Group of Experts (SAGE); General Report on the Design,
Specification and Evaluation of 3GPP Standard Confidentiality and
Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999.
[ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
Algorithms Group of Experts (SAGE) Report on the Evaluation of 3GPP
Standard Confidentiality and Integrity Algorithms", Public report,
Draft Version 1.0, Dec 1999.
[FIPS140] NIST, "Security Requirements for Cryptographic Modules",
FIPS PUB 140-1
[HAC] Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of
Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7.
Blom, Carrara, Norrman, Naslund [Page 18]
INTERNET-DRAFT RTP-encrypt November 15, 2000
[IPsec] McGrew, D., Fluhrer, S., Peyravian, M., "The Stream Cipher
Encapsulating Security Payload", Internet Draft, July 2000
[LIP] Lipmaa, H., "AES Ciphers: speed",
http://www.tml.hut.fi/~helger/aes/
[MAT] Matsui, M., "New Block Encryption Algorthm MISTY". In Eli
Biham (Ed.): Fast Software Encryption, 4th International Workshop,
FSE '97. Proceedings. Lecture Notes in Computer Science, Vol. 1267,
Springer-Verlag 1997, pp. 54-68.
[MIT] Mistubishi Electric, "MISTY",
http://www.mitsubishi.com/ghp_japan/misty/
[RFC-2026] Bradner, S., "The Internet Standards Process -- Revision
3", RFC2026, October 1996.
[RFC-2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC-2119, March 1997.
[Ri] Daemen, J. and Rijmen, V.: "AES Proposal: Rijndael", available
at http://www.esat.kuleuven.ac.be/~rijmen/rijndael
[ROHC] Burmeister, C., Clanton, C., Degermark, M., Fukushima, H.,
Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le, K., Liu, Z.,
Martensson, A., Miyazaki, A., Svanbro, K., Wiebke, T., Zheng, H.,
"RObust Header Compression (ROHC)", Internet Draft, October 2000
[RTP] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, V.:
"RTP: a Transport Protocol for Real-Time Applications", RFC 1889,
Jan. 1996.
[CMSec] Blom, R., Carrara, E., and Naslund, M.: "conversational
Multimedia Security in 3G Networks", IETF Draft, November 2000.
[SEAL] Rogaway, P. and Coppersmith, D.: "A Software-Optimized
Encryption Algorithm", Journal of Cryptology, vol 11(4), 1998, 273-
287.
[TLS] Dierks, T., Allen, C., "The TLS Protocol", RFC 1998, Internet
Draft, November 12
[WTLS] Wireless Application Forum: "WAP WTLS, Wireless Application
Protocol Wireless Transport Layer Security Specification", Version
18-Feb-2000.
Blom, Carrara, Norrman, Naslund [Page 19]
INTERNET-DRAFT RTP-encrypt November 15, 2000
Appendix A. Example Test-vectors
Here is an example of the intermediate values during an encryption
using the AES algorithm in f8-mode.
The data encrypted is a single RTP-packet with a 256-bit (pseudo-
randomly generated) payload. All values are in hex. Refer to Figure 1
for notation.
key:
000102030405060708090a0b0c0d0e0f
key XORed with 555... :
55545756515053525d5c5f5e59585b5a
Rijndael-internal expanded key:
00010203 04050607 08090a0b 0c0d0e0f
d6aa74fd d2af72fa daa678f1 d6ab76fe
b692cf0b 643dbdf1 be9bc500 6830b3fe
b6ff744e d2c2c9bf 6c590cbf 0469bf41
47f7f7bc 95353e03 f96c32bc fd058dfd
3caaa3e8 a99f9deb 50f3af57 adf622aa
5e390f7d f7a69296 a7553dc1 0aa31f6b
14f9701a e35fe28c 440adf4d 4ea9c026
47438735 a41c65b9 e016baf4 aebf7ad2
549932d1 f0855768 1093ed9c be2c974e
13111d7f e3944a17 f307a78b 4d2b30c5
Rijndael-internal expanded value of (key XOR 555_):
55545756 51505352 5d5c5f5e 59585b5a
3e6de99d 6f3dbacf 3261e591 6b39becb
2ec3f6e2 41fe4c2d 739fa9bc 18a61777
0e33034f 4fcd4f62 3c52e6de 24f4f1a9
b992d079 f65f9f1b ca0d79c5 eef9886c
30568051 c6091f4a 0c04668f e2fdeee3
447e91c9 82778e83 8e73e80c 6c8e06ef
1d114e99 9f66c01a 11152816 7d9b2ef9
8920d766 1646177c 07533f6a 7ac81193
7aa20bbc 6ce41cc0 6bb723aa 117f3239
9e81193e f26505fe 99d22654 88ad146d
RTP-packet header fields:
version = 2
padding = 0
extension = 0
CSRC count = 0
marker bit = 0
payload type = 0
sequence no. = 3e7a
Blom, Carrara, Norrman, Naslund [Page 20]
INTERNET-DRAFT RTP-encrypt November 15, 2000
timestamp = 7e5d40a7
SSRC = 7b777a8f
Receiver port number:
abcd
IV:
abcd7b777a8f3e7a7e5d40a7003e0000
IV':
7f25578863921e41120b09ebfdd43f1c
Encryption of bits 0 to 127 of the payload
ct: 0
S(-1) : 00000000000000000000000000000000
S(-1) XOR IV' : 7f25578863921e41120b09ebfdd43f1c
plain text P[0..127] : 9979b51c83ac87a3330a6178cc3b1aa6
final keystream S(0) : b57d2b3337b281f04645bed7082af95d
cipher text C[0..127] : 2c049e2fb41e0653754fdfafc411e3fb
Encryption of bits 128 to 255 of the payload
ct: 1
S(0) : b57d2b3337b281f04645bed7082af95c
S(0) XOR IV' : ca587cbb54209fb1544eb73cf5fec640
plain text P[128..255] : 79cddb405384255385f11619ad46e86f
final keystream S(1) : e7d836679304e5c7f0881b067e3d682c
cipher text C[128..255]: 9e15ed27c080c09475790d1fd37b8043
This Internet-Draft expires in April 2001.
Blom, Carrara, Norrman, Naslund [Page 21]