C-DNS: A DNS Packet Capture Format
draft-ietf-dnsop-dns-capture-format-10

Summary: Has enough positions to pass.

Warren Kumari Yes

Ignas Bagdonas No Objection

Deborah Brungard No Objection

Ben Campbell No Objection

Comment (2018-11-20 for -08)
I support Benjamin's DISCUSS point about privacy considerations.

§2: Is there a reason not to use the boilerplate from RFC 8174? There are a number of lower case instances of normative keywords. These are ambiguous under the RFC 2119 boilerplate.

Alissa Cooper No Objection

Comment (2018-11-20 for -08)
I support Benjamin's first DISCUSS point. In addition to documenting the privacy considerations, I think it's important for this document to be crystal clear about who is meant to be doing the data collection -- namely, the server operator. There are some statements in the document that otherwise could be construed to be encouraging third-party passive monitoring of DNS traffic without explaining why, which seems like a problem:

Section 1:

"There has long been a need to collect DNS queries and responses on
   authoritative and recursive name servers for monitoring and analysis."

Section 3:

"In an ideal world, it would be optimal to collect full packet
   captures of all packets going in or out of a name server."

Spencer Dawkins No Objection

Benjamin Kaduk (was Discuss) No Objection

Comment (2018-12-02 for -09)
Thank you for addressing (or being in the process of addressing) my Discuss points!
Adding IANA registries for the bitmaps will address that concern, so I am proactively
clearing my position on the basis that those registry creations are in the works.

[Original COMMENT section preserved below]

Section 2

Please consider using the RFC 8174 version of the BCP 14 boilerplate.

Section 3

   Because of these considerations, a major factor in the design of the
   format is minimal storage size of the capture files.

maybe "storage and transmission"?

Section 6

In Figure 2, the Query name is marked as "(q)" (only present if there is a
query), but the running text in Section 4 (bullet 1) says that the Question
section from the response can be used as an identifying QNAME if there is a
response with no corresponding query.  Am I misexpanding QNAME here, or is
there a disagreement between these two parts of the text?  In particular, I
do not see a part of Figure 2 that would correspond to a Question section
in the response, given the various "(q)"/"(r)" markings.

Section 6.2.2

   Messages with OPCODES known to the recording application but not
   listed in the Storage Parameters are discarded (regardless of whether
   they are malformed or not).

(Do we need to say anything that the "discarded" is only w.r.t. the capture
process, and not meant to imply that DNS queries would not get a normal
response?)

Section 6.2.4

Please consider using IPv6 examples, per
https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ .

Section 7.2

   o  The column T gives the CBOR data type of the item.

      *  U - Unsigned integer

      *  I - Signed integer

This is venturing a bit far from my normal area of expertise, but my
understanding is that CBOR native major types are only provided for
unsigned integer and negative integer, with "signed integer" being an
abstraction at a slightly higher layer that needs to be managed in the
application.  Do we need to add any clarifying text here or will the
meaning be clear to the reader?

Section 7.4

Should probably forward-reference section 8 for the format version numbers'
semantics.

Section 7.4.1.1

We should we reference the IANA registries by name for any of these fields
(e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)

Are the storage flags going to be allocated in sequence by updating
standards-track documents, or some other mechanism?  (Is a registry
necessary?)

For the various address prefix fields, do we need to specify that the full
addresses are stored when the corresponding prefix field is absent?

Section 7.4.1.1.1

Am I parsing the "query-response-hints" text correctly to say that a bit is
set in the bitmap if the corresponding field is recorded (if present) by
the collecting implementation?  The causality of "if the field is omitted
the bit is unset" goes in a direction that is not what I expected.
(Similarly for the other fields in this table.)

Section 7.4.2

Do we need a reference for "promiscuous mode"?

Just to check: in "server-addresses", I just infer the IP version from the
length of the byte string?

Do we need to say more about where the vlan-ids identifiers are taken from?

Is the "generator-id" string intended to only be human readable?  Only
within a specific (administrative) context?

Section 7.5.1

Does "earliest-time" include leap seconds?

Section 7.5.3

The "ip-address" description seems to imply that very short ipv6 prefix
lengths could cause confusion as to the address type being indicated (e.g.,
setting to 32 when no ipv4 prefix length is set, or setting to the same
value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
lengths to being 33 or larger?

Are the "name-rdata" contents in wire format or presentation format?

Section 7.5.3.2

What's the allocation policy/procedure for the remaining
qr-transport-flags transport values?  For additional bits in any/all of the
flags fields listed here?

Something of a side note, what's the mnemonic for the "sig" in
"qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
seem like it's a cryptographic signature, which may be what is confusing
me)?

For "query-rcode"/"response-rcode", should there be a reference for "OPT",
and/or for any of the EDNS stuff in here?  (The Terminology section only
mentions using the naming from RFC 1035, that I can see.)

The "mm-transport-flags" here bear a striking resemblance to the
"qr-transport-flags" from Section 7.5.3.2; should there be a shared
registry for their contents?  (I guess the TransportFlags CDDL to some
extent serves this function.)

Section 7.7

How is the value of the "ae-code" determined?

Appendix A

We could perhaps apply some constraints on (e.g.) the address-prefex length
fields to be .le the relevant lengths.

Appendix C.6

                                           Using a strong compression,
   block sizes over 10,000 query/response pairs would seem to offer
   limited improvements.

nit: Using a strong compression scheme

Suresh Krishnan No Objection

Comment (2018-11-21 for -08)
* Section 7.4.1.1. 

Looks like you can limit the {client,server}-address-prefix-{ipv4,ipv6} fields to one byte to restrict the range. e.g.

client-address-prefix-ipv6 => uint .size 1

Similar restrictions can be used for port (2) and TTL/hop limit (1) fields.

Alexey Melnikov (was Discuss) No Objection

Comment (2018-11-30 for -09)
Thank you for addressing my DISCUSS and comments.

Eric Rescorla No Objection

Comment (2018-11-20 for -08)
Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D4670


I support Ben's DISCUSS.

Am I understanding the design correctly in that you can't have
literals in the individual per-query values, but instead references to
the block tables, so you can't stream inside a block?

IMPORTANT
S 7.5.1.
>      | earliest-time    | O | A | A timestamp (2 unsigned integers,      |
>      |                  |   |   | "Timestamp") for the earliest record   |
>      |                  |   |   | in the "Block" item. The first integer |
>      |                  |   |   | is the number of seconds since the     |
>      |                  |   |   | Posix epoch ("time_t"). The second     |
>      |                  |   |   | integer is the number of ticks since   |

I don't know what a "tick" is, so you need some definitionf or this.

COMMENTS
S 7.2.
>   
>      o  The column O marks whether items in a map are optional.
>   
>         *  O - Optional.  The item may be omitted.
>   
>         *  M - Mandatory.  The item must be present.

FWIW, I think you might be happier with a mandatory "X" and optional
nothing, but that's up to you.


S 7.4.1.1.1.
>   
>      +------------------+---+---+----------------------------------------+
>      | Field            | O | T | Description                            |
>      +------------------+---+---+----------------------------------------+
>      | query-response   | M | U | Hints indicating which "QueryResponse" |
>      | -hints           |   |   | fields are omitted, see section        |

Nit: generally, you would say "indicating which fields are included"
if 1 means included.


S 7.5.3.
>      +---------------------+---+---+-------------------------------------+
>   
>   7.5.3.  "BlockTables"
>   
>      Arrays containing data referenced by individual "QueryResponse" or
>      "MalformedMessage" items in this "Block".  Each element is an array

This is not a sentence.


S 7.5.3.
>      | qrr               | O | A | Array of type "Question". Each entry  |
>      |                   |   |   | is the contents of a single question, |
>      |                   |   |   | where a question is the second or     |
>      |                   |   |   | subsequent question in a query. See   |
>      |                   |   |   | Section 7.5.3.3.                      |
>      |                   |   |   |                                       |

So if I understand this correctly:

QRR is a map of integers to question
QLIST is a map of integers to question lists, with each question list
consisting of a set of indexes int o QRR?



S 7.5.3.2.
>   
>      +--------------------+---+---+--------------------------------------+
>      | Field              | O | T | Description                          |
>      +--------------------+---+---+--------------------------------------+
>      | server-address     | O | U | The index in the item in the "ip-    |
>      | -index             |   |   | address" array of the server IP      |

So am I understanding correctly that you can't put the server address
literally in here. It has to be in the block tables?


S 7.5.3.2.
>      +--------------------+---+---+--------------------------------------+
>      | server-address     | O | U | The index in the item in the "ip-    |
>      | -index             |   |   | address" array of the server IP      |
>      |                    |   |   | address. See Section 7.5.3.          |
>      |                    |   |   |                                      |
>      | server-port        | O | U | The server port.                     |

Isn't the server port generally constant? It seems like you might save
some bits by having this attached to the server and then indixed
abvoe.



S 7.5.3.2.
>      |                    |   |   | used to service the query.           |
>      |                    |   |   | Bit 0. IP version. 0 if IPv4, 1 if   |
>      |                    |   |   | IPv6                                 |
>      |                    |   |   | Bit 1-4. Transport. 4 bit unsigned   |
>      |                    |   |   | value where 0 = UDP, 1 = TCP, 2 =    |
>      |                    |   |   | TLS, 3 = DTLS. Values 4-15 are       |

You might want to specify DoH


S 7.5.3.5.
>      |                    |   |   | Bit 0. IP version. 0 if IPv4, 1 if   |
>      |                    |   |   | IPv6                                 |
>      |                    |   |   | Bit 1-4. Transport. 4 bit unsigned   |
>      |                    |   |   | value where 0 = UDP, 1 = TCP, 2 =    |
>      |                    |   |   | TLS, 3 = DTLS. Values 4-15 are       |
>      |                    |   |   | reserved for future use.             |

Again, probably want to specify DoH.


S 17.3.
>      [18] https://github.com/dns-stats/draft-dns-capture-
>           format/blob/master/file-size-versus-block-size.svg
>   
>   Appendix A.  CDDL
>   
>      This appendix gives a CDDL [I-D.ietf-cbor-cddl] specification for

Is this a normative appendix?

Alvaro Retana No Objection

Comment (2018-11-19 for -08)
I support Benjamin's DISCUSS.

Adam Roach No Objection

Comment (2018-11-20 for -08)

I support Benjamin's DISCUSS regarding a treatment of the privacy issues related
to this capture format.

---------------------------------------------------------------------------

id-nits reports:

  ** There are 11 instances of too long lines in the document, the longest
     one being 9 characters in excess of 72.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?

(See https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ for more information
about the second issue)

---------------------------------------------------------------------------

§5:

>  o  CBOR is an IETF standard and familiar to IETF participants.  It is

While CBOR is standards-track, it's nowhere near standard yet. Suggest:
"...is an IETF specification..." (See BCP 9)

---------------------------------------------------------------------------

§9.1:

>  DNS style name compression is used on the individual names within the

Nit: "DNS-style"

---------------------------------------------------------------------------

Appendix A:

>   file-type-id  : tstr .regexp "C-DNS",

I'm far from a CDDL expert, but I just read through that specification, and it
seems to me that this is a bit overwrought. I think you can accomplish the same
with the much simpler production:

    file-type-id  : "C-DNS",

Similarly:

>   major-format-version => uint .eq 1,
>   minor-format-version => uint .eq 0,

would seem to mean the same as the simpler:

>   major-format-version => 1,
>   minor-format-version => 0,

---------------------------------------------------------------------------

Appendix B:

>  The next name added is bar.com.  This is matched against example.com.

bar.com is allocated to a private individual who has already had to contend with
a lot of unwanted traffic (see https://www.bar.com/ for details). We should
consider not making things worse for them. Please use an RFC 2606 address
instead.

Martin Vigoureux No Objection

Comment (2018-11-21 for -08)
few nits:

s/types are are omitted/types are omitted/
s/type is are omitted/type is omitted/
s/common collection and and storage parameters/common collection and storage parameters/

Terry Manderson Recuse

Comment (2018-11-20 for -08)
I'm a co-author and financial sponsor of this work - recusing.

Mirja Kühlewind No Record