Network Working Group Steven M. Bellovin
Internet Draft AT&T Labs Research
Expiration Date: June 2002 December 2001
Using Bloom Filters for Authenticated Yes/No Answers in the DNS
draft-bellovin-dnsext-bloomfilt-00.txt
1. Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
2. Abstract
Some aspects of DNSSEC, such as NXDOMAIN error messages, require an
authenticated answer. Producing this answer requires complex
mechanisms, online storage of the zone's secret key, expensive online
computations, or massive zone files. As an alternative, we propose
storage of authenticated pointers to Bloom filters. This scheme
provides large reductions in the size of, and computational expense
to produce, partially-signed zone files.
Bellovin FORMFEED[Page 1]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
3. Introduction
Some aspects of DNSSEC [RFC2535], such as NXDOMAIN error messages,
require an authenticated answer. Producing this answer requires
complex mechanisms, online storage of the zone's secret key,
expensive online computations, or massive zone files. The current
scheme relies on NXT records, which have a number of troublesome
properties. Among these are the space required to store and transmit
them; additionally, some people have objected that NXT records make
it possible to dump a zone by chaining through NXT records.
The expense of storing DNSSEC records for a zone as large as .COM has
led some people to suggest an "opt-in" process: only those parties
who wish to have a signed record will have one. This raises the
question of how to receive an authenticated message saying that a
given RRset is not supposed to be signed. Two mechanisms, NOKEY and
NXT records, have been proposed; both have their disadvantages.
Both problems could be solved if the DNS server were to digitally
sign its answers. But this is too expensive in CPU time (and exposes
the server to denial of service attacks), and requires that the
signing key be available online, a serious security risk for major
zones such as the root and .COM.
We suggest using Bloom filters [Bloom70] to store yes/no answers to
such questions. The filter can be precomputed and presigned, but
queries to it are quite efficient. The total amount of storage and
computation required appear to be significantly less than is needed
for today's solutions. Furthermore, it is not possible to recover
names from a filter, thus protecting privacy.
One caveat must be mentioned. The filter needed for a zone such as
.COM is far too large to ship around in DNS responses. Accordingly,
we propose using indirect references to such filters. While this
seems to be an inconvenience, in fact using indirection allows load-
shifting and load-balancing.
Bellovin FORMFEED[Page 2]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
4. Bloom Filters
A Bloom filter is a very efficient way to store information about the
existence of a record in a database. It is susceptible to false
positives; however, the probability of a false positive can be made
as small as desired.
A Bloom filter is an array of m bits, initialized to zero. It
requires a set of k hash functions that are independent and produce
uniformly distributed output in the range [0,m-1] on the possible
inputs.
To add an entry R to the filter, calculate
b[1] = H1(R)
b[2] = H2(R)
...
b[k] = Hk(R)
and set bits b[i] to 1 in the array.
To see if a record exists, calculate the same b[i] and check the bit
values. If all k bits are 1, the record exists; if even a single bit
is 0, the record does not exist.
In a database of any reasonable size, it is not possible to determine
the input records from the bit array. Many different records can set
any one bit; there is no way to tell which records actually did.
If there are n records stored in the Bloom filter, the probability of
a false positive is given by
P = (1 - (1 - 1/m)**(k*n) )**k
which may be approximated by
P = (1 - exp(-k*n/m))**k
From the equations, it is clear that it is the ratio of the number of
records to the bit array size is what is important, rather than the
absolute size of either. Further, there is an optimal range for k;
values that are too small or too large will produce too many false
positives. The exact set of parameters for any filter need to be
chosen with some care. Some possible values for use in the DNS are
given in Appendix A.
Bellovin FORMFEED[Page 3]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
5. A Bloom Filter Resource Record
As mentioned above, a Bloom filter for the DNS will be large: for the
.COM zone, at least 75 Mbytes. We clearly do not wish to transmit
such bit arrays on a routine basis! Accordingly, the filter RR
contains a URL pointing to the actual filter.
In order to query the filter, clients need to know k and m; they also
need to know what hash functions were used. We put these values in
the RR, with m in units of kilobytes. (Note: our initial value for
m will be around 600,000-800,000,000 bits. This is close enough to
what can be held in a 32-bit field that we prefer to buy our factor
of 8192 in advance.)
As is discussed below, it may be desirable to divide the filter into
chunks. It is therefore necessary to include the chunk size in the
RR as well.
Open issue: a public key (or certificate containing a public key) is
necessary to validate the filter. Where should this key be stored?
In a KEY record? In the RR? In some other record? It is not
necessary that this key be the same as the one used to sign the zone;
in one variant discussed below, it is better that the two not be the
same.
Since Bloom filters are specific to any given instance of a zone, the
SOA serial number is an essential part of the authentication process.
Accordingly, name servers that return a Bloom filter RR SHOULD also
return the relevant SOA record in the Additional Information section.
6. Using Bloom Filter RRs
A client who wishes to authenticate an NXDOMAIN response to a secure
query first obtains and authenticates a signed Bloom filter record.
It then calculates the b[i] values for the desired name, and checks
the bit positions in the Bloom filter. Finally, it authenticates the
content of the Bloom filter itself. We present two options for how
to perform these last two steps.
6.1. Using TLS for Filter Queries
To avoid the need to transmit a large bit array, one option is to
query a Bloom filter server via TLS [rfctls]. The client calculates
the bit positions, based on the domain name, k, and m, and then
queries the URL specified in the filter RR:
Bellovin FORMFEED[Page 4]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
https://bloomfilter.ns.example.com?324+3248+23980+89732+...
The server responds with a simple yes/no answer.
To avoid attacks, the client MUST check that the public key in the
server's TLS certificate matches that returned by the RR.
Unfortunately, this scheme requires expensive public key operations
by the server for each query. Furthermore, it requires that a
private signature key be online. Fortunately, there is no need to
make this key the same as the zone-signing key. CPU load concerns
can be ameliorated by replicating the server, using any standard
load-sharing technique. Again, note that in general the contents of
the bit array need not be kept confidential.
6.2. Retrieving Filter Chunks
To avoid the need for online, server-based cryptography, we present
an alternate scheme based on bit array downloads. For this version,
the server divides the bit array into chunks. Each chunk is
digitally signed (the signature is calculated over the bit array
chunk, the metadata in the filter RR, and the SOA serial number).
The chunk size, and hence the number of chunks, is determined by the
tradeoff between download size and the number of chunks that must be
signed by the server.
A client that connects to the specified URL will receive the chunks
that contain the requested bit positions. (Open issue: should the
URL contain bit position numbers or chunk numbers? I suspect that
the former is better, since it means that both options can be
implemented using the same query syntax.)
A client need not query all k values. It can trade off its own need
for certainty, up to the limits set by k, against download time.
This determination can be done dynamically, since the chunk size and
k are in the Bloom filter RR.
Since the content of any given chunk is static during the lifetime of
that zone instance, chunk URLs can be cached or distributed by any
content distribution network. To avoid confusion and cache coherency
issues at zone change time, we recommend that the SOA serial number
for the zone be included in the filename portion of the URL.
Bellovin FORMFEED[Page 5]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
7. Dealing with False Positives
As noted, false positives are possible with Bloom Filters. The
implications differ for different possible uses of the technology.
For authenticating NXDOMAIN requests, there is no ready recourse. A
false positive means that a name server has returned an error report
for a domain that the Bloom filter claims exists. This could be a
false positive, or it could be the exact form of attack that the
Bloom filter mechanism is intended to prevent. Palliative measures
include retrying the query from a client not believed to be under
attack, or waiting for a new instance of the zone file, and hence a
different bit array (see below).
The situation is brighter for opt-in secure zones. In this latter
case, the bit array represents zones for which signature records
should exist. A server building a bit array can check the remaining
names in the zone to see if there are any false positives. If so, a
NOKEY record can be generated for such names. This greater tolerance
of false positives permits selection of filter parameters that yield
smaller bit array sizes and/or fewer hash function calculations. Of
course, that must be traded off against the extra signed NOKEY
records.
8. Hash Function Families
The behavior of Bloom filters depends strongly on the quality of the
hash functions that are used. One good choice is to use SHA2-512
[SHA2], which produces 512 bits of uniformly distributed output.
This can be divided into 16 32-bit hash values, which in turn can be
reduced modulo m to represent the output of 16 hash functions. If
more hash functions are needed, a counter can be concatenated with
the the name:
SHA2("1" || name)
SHA2("2" || name)
...
If we set k to 8, we can use SHA2-256, which is significantly
cheaper.
To avoid persistent problems from false positives, it may be
desirable to change the hash function for each new zone instance.
This is most easily done by including the zone serial number in the
hash:
SHA2("1" || name || serial)
Bellovin FORMFEED[Page 6]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
SHA2("2" || name || serial)
...
Open issue: is this desirable? It leads to more unpredictability in
behavior from day to day. On the other hand, addition of any new
records to the zone can generate new false positives.
It is not clear that the cryptographic properties of SHA2 are helpful
here. There does not appear to be any advantage to an enemy who can
deliberately cause collisions. Accordingly, it might make sense to
investigate cheaper hash functions.
9. Performance Issues
To process a zone as suggested above, one or two invocations of
SHA2-512 per name are needed. Signing an RRset would require a
single invocation of a cheaper hash function (probably SHA1
[RFC3174]) per name, plus a very expensive digital signature
operation. Until the percentage of signed RRsets becomes quite high,
it is clear that Bloom filters are much cheaper.
Chunk size determination relies on the tradeoff between the number of
chunk signatures that must be computed versus the bandwidth needed to
retrieve chunks. In general, one should opt for more signatures and
smaller chunks; the signing operations happen once per zone, while
the chunks are retrieved frequently. A 1 KB chunk size is not
unreasonable, but that would require 75,000 signatures for a 75 MB
bit array.
Bit array size per se is not a major issue, unless many replicas of
the data are desired.
10. Dynamic Update
Bloom filters are not compatible with certain forms of dynamic
updates. However, the problem caused is bounded and often
manageable. Ultimately, the acceptability will depend on the rate of
dynamic updates and the number of chunks used.
Deleting records is easy: do nothing. A deleted record will still
appear in the bit array, but that manifests itself as a false
positive, a problem inherent to Bloom filters. A new filter should
be calculated and distributed when the number of deletions has raised
the false positive response probability to an unacceptable level. If
necessary, the initial selection of k and m can be adjusted to
compensate for some predicted rate of deletions.
Bellovin FORMFEED[Page 7]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
Additions are trickier. Conceptually, all that is necessary is to
calculate the new bit positions that need to be set to 1; however,
the existence of signed chunks complicates the matter. The exact
behavior will depend on the addition rate and upon the number of
chunks.
Without trying to calculate the exact probability distribution, it is
clear that the number of chunks changed per unit time is bounded by
the product of k and the number of update operations. As long as the
computer signing the chunks can keep up with that rate of operations,
there should not be a problem. And the network bandwidth required is
minimal; all that needs to be sent out is a set of new signatures,
plus the bit positions that must be turned on in each chunk. In
particular, it is not necessary to redistribute the entire bit array.
11. IANA Considerations
New families of hash functions may be defined and registered with
IANA. Registration may be done only by means of a standards-track
RFC.
12. Security Considerations
If the signatures specified here are checked, there do not appear to
be any correctness issues. However, the chunk retrieval protocol may
be abused to flood the server. Note that this server need not be co-
located with the zone servers; doing that would limit the effect of
such an attack.
As with other aspects of DNSSEC, replay attacks are possible. An
enemy could return a stale -- but signed -- Bloom filter RR in
response to a query. Similar attacks can be carried out against the
chunk retrieval protocol, but not against the TLS variant.
Bellovin FORMFEED[Page 8]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
13. References
[Bloom70] "Space/time trade-offs in hash coding with allowable
errors". Bloom, B. H. Communications of ACM 13, 7 (July 1970),
pp. 422-426.
[RFC2535] "Domain Name System Security Extensions". D. Eastlake.
March 1999.
[RFC3174] "US Secure Hash Algorithm 1 (SHA1)". D. Eastlake. September
2001.
[SHA2] "Secure Hash Standard", Draft Federal Information
Processing Standards Publication 180-2, May 2001.
14. Author Information
Steven M. Bellovin
AT&T Labs Research
Shannon Laboratory
180 Park Avenue
Florham Park, NJ 07932
Phone: +1 973-360-8656
email: bellovin@acm.org
Bellovin FORMFEED[Page 9]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
A. Appendix: Bloom Filter Parameters
Below is a table showing the false positive probability p for various
values of k and ratios of m/n. We set n -- the number of entries in
the zone -- to 25,000,000, the approximate size of .COM at this time.
While the actual choices dependon the acceptable false positive rate,
the choices of (8,32), (8,40), (16,32), and (16,40) seem to produce
favorable ratios of false positive rate to chunk size and bit array
size.
p k m/n
--------------------------
0.000000005065 24 40
0.000000005112 32 40
0.000000010765 40 40
0.000000019475 16 40
0.000000216758 24 32
0.000000330046 16 32
0.000000422277 32 32
0.000001165717 8 40
0.000001366603 40 32
0.000005731505 8 32
0.000009874368 16 24
0.000016565253 24 24
0.000041690856 8 24
0.000055936473 32 24
0.000230939749 40 24
0.000574496205 8 16
0.000649828308 16 16
0.002335383727 24 16
0.009530761107 32 16
0.025491730341 8 8
0.032516117391 40 16
0.097625617676 16 8
0.293563779663 24 8
0.553477428654 32 8
0.763051324818 40 8
The following (ugly) ASCII graph summarizes the calculations:
Bellovin FORMFEED[Page 10]
Internet Draft draft-bellovin-dnsext-bloomfilt-00.txt December 2001
1 ++--+----------+----------+---------+----------+------***************
+ + + + **************(1 - exp(-k/8))**k ****** +
0.1 ++ ************** (1 - exp(-k/16))**k ######++
+************** (1 - exp(-k/24))**k $$$$####
0.01 ** (1 - exp(-k/32))**k######%++
+ ###(1 - exp(-k/40))**k @@@@@@ +
+ ############# +
0.001 ########################## ++
+ $$$$$$
0.0001 ++ $$$$$$$$$$$$$ ++
$$$$$$ $$$$$$$$$$$$$$$ +
1e-05 ++ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ ++
%%%% +
1e-06 @@ %%%%%% %%%%%%%
+@@@ %%%%%%%%%% %%%%%%%%%%%%%%%%% +
+ @@@@@ %%%%%%%%%%%%%%%%%%%%%%%%%% +
1e-07 ++ @@@@@ ++
+ @@@@@@@@ +
1e-08 ++ @@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@
+ + + + +@@@@@@@@@@@ + +
1e-09 ++--+----------+----------+---------+----------+---------+---------++
10 15 20 25 30 35 40
Bellovin FORMFEED[Page 11]