Network Time Protocol Best Current Practices
RFC 8633

Note: This ballot was opened for revision 09 and is now closed.

(Eric Rescorla) Discuss

Discuss (2018-12-19 for -10)
Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D3844

I notice that a number of the recommendations here differ from those
in NDSS16. In particular the following recommendations from that paper
do not seem to appear:

- Do not put INIT in the reference ID on restart
- Do not send KoD
- Disable fragmentation
- Randomize source ports

I'm not saying that all of these recommendations need to be in this
document, but I am curious why they are not and would tend to think
that one should document why they are not.



DETAIL
S 2.1.
>      response to a small query, which makes it more attractive as a vector
>      for distributed denial-of-service attacks.  (NTP Control messages are
>      discussed further in Section 3.4).  One documented instance of such
>      an attack can be found here [1], and further discussion in [IMC14]
>      and [NDSS14].  Mitigating source address spoofing attacks should be a
>      priority of anyone administering NTP.

what does this text mean? What practices can I engage in as an NTP
server that mitigate source spoofing attacks? The next paragraph seems
to largely talk about traffic *sources*. Is the assumption that the
NTP server is supposed to do BCP 38 filtering? That seems not that
effective.

As a smaller point, I see that this text says "should", not SHOULD. Is
that a mistake or is this intended not to have any normative force?


S 3.2.
>      [RFC5905].  It is RECOMMENDED that that NTP users select an
>      implementation that is actively maintained.  Users should keep up to
>      date on any known attacks on their selected implementation, and
>      deploy updates containing security fixes as soon as practical.
>   
>   3.2.  Use enough time sources

I note that you don't seem to be recommending that people use Chronos
(http://wp.internetsociety.org/ndss/wp-
content/uploads/sites/25/2018/02/ndss2018_02A-2_Deutsch_paper.pdf),
which, as I understand it, is compatible with existing NTP servers but
far more resistant to spoofing. Is there a reason why? Assuming that
there is a good reason, that seems like it should be covered here.


S 3.2.
>      See Section 3.7.1 for more information.
>   
>      Operators SHOULD monitor all of the time sources that are in use.  If
>      time sources do not generally agree, find out the cause and either
>      correct the problems or stop using defective servers.  See
>      Section 3.5 for more information.

It's not really possible to evaluate this advice without any
description of the threat model, which is pretty sketchily covered
below. In particular, if an attacker controls my network, then it's
basically like having one NTP server, no matter how many people I am
talking to, and even an inaccurate but secure server (e.g., tlsdate)
is superior.



S 11.3.
>      [10] https://support.ntp.org/bin/view/Support/ConfiguringNTP
>   
>   Appendix A.  NTP Implementation by the Network Time Foundation
>   
>      The Network Time Foundation (NTF) provides the reference
>      implementation of NTP, well-known under the name "ntpd".  It is

What makes this the reference implementation? Generally, the IETF does
not bless specific implementations as reference implementations unless
they themselves appear in the RFC (as with Opus).
Comment (2018-12-19 for -10)
S 3.1.
>      implementations, on many different platforms.  The practices in this
>      document are meant to apply generally to any implementation of
>      [RFC5905].  It is RECOMMENDED that that NTP users select an
>      implementation that is actively maintained.  Users should keep up to
>      date on any known attacks on their selected implementation, and
>      deploy updates containing security fixes as soon as practical.

This text is kind of hard to follow. It seems like it is making two
entirely separate points:

1. It is important to have accurate time.
2. It is important to have an up-to-date implementation of NTP.

I agree with both these claims, but they don't seem that closely
connected. It's true that an out-of-date version of NTP might lead to
inaccurate time, but it might also lead to (for instance) arbitrary
code execution on the client. For this reason, I would suggest that it
would be wise to separate these two paragraphs.


S 3.2.
>   
>      o  If there are 2 sources of time and they agree well enough, then
>         the best time can be calculated easily.  But if one source fails,
>         then the solution degrades to the single-source solution outlined
>         above.  And if the two sources don't agree, then it's impossible
>         to know which one is correct by simply looking at the time.

This isn't strictly true. Consider the case where I have an iPhone and
the onboard clock reads 2018-12-19 and the NTP server reads 2001. I
know the NTP server is wrong because iPhones didn't exist in 2001.


S 3.4.
>      optionally authenticated control of NTP and its configuration.  Used
>      properly, these facilities provide vital debugging and performance
>      information and control.  Used improperly, these facilities can be an
>      abuse vector.  For this reason, it is RECOMMENDED that publicly-
>      facing NTP servers should block mode 6 queries from outside their
>      organization.

Why are these facilites an abuse vector


S 3.5.
>   
>      If a system starts getting unexpected time replies from its time
>      servers, that can be an indication that the IP address of the system
>      is being forged in requests to its time server.  The goal of this
>      attack is to convince the time server to stop serving time to the
>      system whose address is being forged.

Why would this work? Some sort of rate limit on the server.


S 3.5.
>      attack is to convince the time server to stop serving time to the
>      system whose address is being forged.
>   
>      If a system is a broadcast client and its system log shows that it is
>      receiving early time messages from its server, that is an indication
>      that somebody may be forging packets from a broadcast server.

You need to provide citations for broadcast client and broadcast
server, even if they are just to some section of the NTP spec.


S 3.5.
>      receiving early time messages from its server, that is an indication
>      that somebody may be forging packets from a broadcast server.
>   
>      If a server's system log shows messages that indicates it is
>      receiving timestamps that are earlier than the current system time,
>      then either the system clock is unusually fast or somebody is trying

Why do you say "unusually fast". My understanding is that it's
actually quite common to be seconds off.


S 4.1.
>      periodically.  However, NTP does not provide a mechanism to assist in
>      doing so.
>   
>      [RFC5905] specifies a hash which must be supported for calculation of
>      the MAC, but other algorithms may be supported as well.  The MD5 hash
>      is now considered to be too weak.  Implementations will soon be

This comment about MD5 kind of comes out of nowhere. some context for
why I would think I should use MD5 would help.


S 4.1.
>   
>      [RFC5905] specifies a hash which must be supported for calculation of
>      the MAC, but other algorithms may be supported as well.  The MD5 hash
>      is now considered to be too weak.  Implementations will soon be
>      available based on AES-128-CMAC [I-D.ietf-ntp-mac], and users are
>      encouraged to use that when it is available.

Do you want to use 8174 language here? Also, I-D.ietf-ntp-mac has
already been approved, so it seems like given the long latency between
here and the RFC, we should write this in the present tense rather
than the future tense.



S 4.1.
>      inclusive, and a label which indicates the chosen digest algorithm.
>      Each communication partner adds this information to its own key file.
>   
>      Some implementations store the key in clear text.  Therefore it
>      SHOULD only be readable by the NTP process.  Different keys are added
>      line by line to the key file.

Does *every* implementation have a key file like this? I'm not sure
what the point of this sentence is.


S 5.2.
>      o  Configure the ntp client to only ignore the panic threshold in a
>         cold start situation.
>   
>      o  Add 'minsane' and 'minclock' parameters to the ntp.conf file so
>         ntpd waits until enough trusted sources of time agree on the
>         correct time.

This seems pretty implementation specific.


S 5.4.
>      when asked to do so by a server.  It is even more important for an
>      embedded device, which may not have an exposed control interface for
>      NTP.
>   
>      That said, a client MUST only accept a KoD packet if it has a valid
>      origin timestamp.  Once a RATE packet is accepted, the client should

What's a RATE packet? It's not defined here or cited.


S 6.2.
>      Vendors are encouraged to invest resources into providing their own
>      time servers for their devices to connect to.
>   
>      Vendors should read [RFC4085], which advises against embedding
>      globally-routable IP addresses in products, and offers several better
>      alternatives.

This seems to kind of duplicate S 4.5.



S 6.2.1.
>      The NTP Pool Project offers a program where vendors can obtain their
>      own subdomain that is part of the NTP Pool.  This offers vendors the
>      ability to safely make use of the time distributed by the Pool for
>      their devices.  Vendors are encouraged to support the pool if they
>      participate.  For more information, visit http://www.pool.ntp.org/en/
>      vendors.html [8] .

This too, duplicates 4.5.


S 7.
>      own potential issues.  It means each client will likely use a single
>      time server source.  A key element of a robust NTP deployment is each
>      client using multiple sources of time.  With multiple time sources, a
>      client will analyze the various time sources, selecting good ones,
>      and disregarding poor ones.  If a single Anycast address is used,
>      this analysis will not happen.

I'm not sure I'm following this. The idea here seems to be that a
client would ordinarily be configured with N addresses, but with
anycast it will be configured with 1? Or that all the anycast
addresses will go to the same place? Presumably all the servers in an
anycast group are run by the same entity, in which case is there a
good reason to believe that whatever errors they have will be
independent? In this case, having unicast addresses would seem not to
help.

Separately, how many clients *actually* use >1 server.


S 7.
>      anycast servers may arbitrarily enter and leave the network, the
>      server a particular client is connected to may change.  This may
>      cause a small shift in time from the perspective of the client when
>      the server it is connected to changes.  It is RECOMMENDED that
>      anycast only be deployed in environments where these small shifts can
>      be tolerated.

Who is this guidance to? It seems like the clients might well not
know, but they are the ones who tolerate the shift (or not).


S 11.3.
>   
>      The Network Time Foundation (NTF) provides the reference
>      implementation of NTP, well-known under the name "ntpd".  It is
>      actively maintained and developed by NTF's NTP Project, with help
>      from volunteers and NTF's supporters.  This NTP software can be
>      downloaded from <http://www.ntp.org/downloads.html>

You probably want to explain why the rest of this section follows. For
instance "The remainder of this section describes how to implement
many of the recommendations in this document using that software"


S 11.3.
>      downloaded from <http://www.ntp.org/downloads.html>
>   
>   A.1.  Use enough time sources
>   
>      In addition to the recommendation given in Section Section 3.2 the
>      ntpd implementation provides the 'pool' directive.  Starting with

Where does this directive go? Some conf file, one assumes.


S 11.3.
>      ntp-4.2.6, this directive will spin up enough associations to provide
>      robust time service, and will disconnect poor servers and add in new
>      servers as-needed.  If you have good reason, you may use the
>      'minclock' and 'maxclock' options of the 'tos' command to override
>      the default values of how many servers are discovered through the
>      'pool' directive.

What would those good reasons be?


S 11.3.
>   
>      restrict default -4 nomodify notrap nopeer noquery
>      restrict default -6 nomodify notrap nopeer noquery
>   
>      restrict source nomodify notrap noquery
>      # nopeer is OK if you don't use the 'pool' directive

I assume this is a comment? What is it doing right below a line that
doesn't mention "nopeer"

(Spencer Dawkins) Yes

Comment (2018-12-17 for -10)
I understand every sentence in this text 

  Many network security mechanisms rely on time as part of their
   operation.  If attackers can spoof the time, they may be able to
   bypass or neutralize other security elements.  For example, incorrect
   time can disrupt the ability to reconcile logfile entries on the
   affected system with events on other systems.  An application which
   is secure today could be insecure tomorrow once an unknown bug (or a
   known behavior) is exploited in the right way.  Even our definition
   of what is secure has evolved over the years, so code which was
   considered secure when it was written may turn out to be insecure
   after some time.

but don't understand how 

   An application which
   is secure today could be insecure tomorrow once an unknown bug (or a
   known behavior) is exploited in the right way.  Even our definition
   of what is secure has evolved over the years, so code which was
   considered secure when it was written may turn out to be insecure
   after some time.

relates to an attack on NTP-provided time. Could you help me understand how this is tied together? 

Is "users" the right term in 

3.6.  Using Pool Servers

   It only takes a small amount of bandwidth and system resources to
   synchronize one NTP client, but NTP servers that can service tens of
   thousands of clients take more resources to run.  Users who want to
   synchronize their computers SHOULD only synchronize to servers that
   they have permission to use.

? If I'm a user, I'm not thinking I've ever consciously chosen an NTP server.

You might consider moving that paragraph lower in 3.6 - the section is about using pool servers, but the explanation about pool servers is in the second paragraph. 

Is the choice of lower case "should" in 

3.7.1.  Leap Smearing

   Some NTP installations make use of a technique called Leap Smearing.
   With this method, instead of introducing an extra second (or
   eliminating a second) on a leap second event, NTP time will be slewed
   in small increments over a comparably large window of time (called
   the smear interval) around the leap second event.  The smear interval
   should be large enough to make the rate that the time is slewed
   small,

intentional? It seemed close enough to some of the SHOULDs in this document that I wanted to ask ...

Is it obvious how a system administrator would detect a mixture of smeared and non-smeared servers, as in 

  System Administrators are advised to be aware of impending leap
   seconds and how the servers (inside and outside their organization)
   they are using deal with them.  Individual clients MUST NOT be
   configured to use a mixture of smeared and non-smeared servers.  If a
   client uses smeared servers, the servers it uses must all have the
   same leap smear configuration.

? I'm asking for the case where you carefully choose your servers so they aren't mixed, but you are using servers you don't control, and the server administrator changes the server behavior.

I don't think 

  Operators SHOULD be aware that when operating with the above two
   conditions, the panic threshold offers no protection from attacks.

needs BCP14 requirements language. When would operators make an informed decision to be unaware? 

In this text, 

  In addition, implementations SHOULD prevent the NTP daemon from
   taking time steps that set the clock to a time earlier than the
   compile date of the NTP daemon.

it would be helpful to me, to explain why this requirement is included. I can imagine a couple of reasons, but I'm guessing.

I wonder if the SUIT working group has any drafts that are stable enough to be used as an informative reference in Section 6.1, "Updating Embedded Devices".

Suresh Krishnan Yes

Mirja Kühlewind Yes

Ignas Bagdonas No Objection

Deborah Brungard No Objection

(Ben Campbell) No Objection

Comment (2018-12-19 for -10)
Hi, thanks for this work. I'm balloting "no objection", but have a few comments/questions:

*** Substantive Comments ***

General observation: I was surprised to find that that a lot of the recommendations here don't seem especially specific to NTP. (E.g. keeping implementations up to date.) But I don't suppose that's really an problem, so I don't expect action here.

§2.1, last paragraph: "It is RECOMMENDED that
large corporate networks (and ISP’s of any size) implement ingress
and egress filtering."

Is that a new normative requirement, or an existing requirement from BCP38? If the latter, please consider using description language rather than normative keywords.

§3.3: This section recommends that operators choose time servers with different implementations/technology. Are time sources expected to publicize that sort of information?

§3.4: Am I correct to assume that "control messages" and "mode 6 messages" are the same thing? Please use consistent terminology.

§3.6:
- First Paragraph: "Users who want to
synchronize their computers SHOULD only synchronize to servers that
they have permission to use."
Why not MUST?

- 2nd paragraph: Is the NTP Pool stabile enough for a plug like this in an RFC? Remember, RFCs live "forever". (see also §6.2.1)

§3.7: "Note well that NTPv4’s longest polling
interval exceeds one day and thus a leap second announcement may be
missed."
Is that okay? Is there any action recommended due to this?

§3.7.1, last paragraph: How does a client know if the server does leap second smearing?

§4.1: 
- "Therefore, for each association, keys SHOULD be exchanged securely by external means, and they SHOULD be protected from disclosure."
Why not MUST (both times)?

- "Implementations will soon be available based on AES-128-CMAC [I-D.ietf-ntp-mac], and users are encouraged to use that when it is available."
Is that worth a normative requirement?

- "Some implementations store the key in clear text"
Wouldn't the better practice to be not to do that?

§5.1: 
- 3rd paragrap: "A host that is not supposed to act as an NTP server that provides
timing information to other hosts MAY additionally log and drop
incoming mode 3 timing queries from unexpected sources."

 i don't understand the point. Also, is the upper-case "MAY''intended as permission to do that?

- last paragraph: "Note well that proper monitoring of an NTP server instance includes
checking the time of that NTP server instance."
Should there be normative guidance here? (Also, the sentence seems out of place.)

§6.1: "Vendors of embedded devices MUST pay attention"
Can you recommend something more concrete (and verifiable) than "pay attention"?

*** Editorial Comments ***

§2.1: "more susceptible to spoofing attacks then other connection-oriented protocols":
s/then/than
Also, it seems like "other" is not descriptive here, since UDP is not a connection-oriented protocol.

§3.4:
-  first paragraph: The last sentence will not hold up well to the passage of time. Please consider adding something to the effect of "At the time of this writing..."
- Last paragraph: The last sentence seems redundant to the section on BCP38.

§5.1, 3rd paragraph: It is recommended that operators SHOULD filter mode 3 queries
at the edge
"recommended that...SHOULD" is redundant. Please consider just saying "Operators SHOULD..."

§5.4: Is a KoD packet and a RATE packet the same thing? (Please use consistent terminology)

§11.2: Is there a reason [BCP38INFO] is here and not in the URL references?

Alissa Cooper (was Discuss, No Objection) No Objection

Comment (2019-02-12 for -12)
Thanks for addressing my DISCUSS. Previous COMMENT:

Section 2.1: 

s/BCP 38 [RFC2827] was approved/BCP 38 [RFC2827] was published/ (presumably the approval was not the seminal thing)

"It is RECOMMENDED that large corporate networks (and ISP's of any size) implement ingress and egress filtering."

I'm not really sure what the parentheses are meant to imply here. If this is a normative recommendation for both ISPs and large corporate networks, why doesn't it say "ISPs and large corporate networks"?

Section 3.2:

"If time sources do not generally agree, find out the cause and either
   correct the problems or stop using defective servers."

It seems odd to frame this as a directive, especially in a paragraph where the subject is made explicit ("operators"). I think this would make more sense if it said "operators should find out" or "operators ought to find out."

Section 3.3:

Please fix the sentence highlighted by the Gen-ART reviewer.

Section 3.4:

"To provide protection for such abuse NTP server
   operators on large networks SHOULD deploy ingress filtering in
   accordance with BCP 38 [RFC2827]."

Why is this recommendation limited to large networks, whereas the normative recommendation to do ingress and egress filtering in Section 2.1 applies to ISPs of any size?

Section 3.6:

I agree with the Gen-ART reviewer that the use of "you" is inappropriate here and should be replaced by a noun (e.g., "operators").

Section 4.1:

"Therefore, for each association, keys SHOULD be exchanged securely by external
   means, and they SHOULD be protected from disclosure."
   
I recognize that this is outside the bounds of the protocol, but if this document is a BCP that is going to make these normative recommendations for what they're worth, shouldn't they be MUSTs? If not, what are the exceptional cases where the exchange of these keys shouldn't be secure and confidential?

Section 4.2:

Same question as Section 4.1.

Section 5:

Same comment as Section 3.2. The subject to which the directive is being given should be named.

Section 5.2:

"It is likely to become the default behavior in other
       systems as they migrate legacy init scripts to process
       supervisors such as systemd."
    
For posterity it may be better to say, "At the time of this writing, it appears likely to ..."

"Operators SHOULD be aware that when operating with the above two
   conditions, the panic threshold offers no protection from attacks."

I don't think it's appropriate to use normative language about being aware.

Section 6.1:

"Vendors of embedded devices MUST pay attention to the current state
   of protocol security issues and bugs in their chosen implementation."

Same comment as 5.2, it's inappropriate to normatively require paying attention.

Section 6.2.1:

"For more information, visit ..."

Same comment as 3.2 and 5 -- this sentence needs a subject.

Benjamin Kaduk (was Discuss) No Objection

Comment (2019-01-17 for -11)
Thank you for addressing my Discuss points!

Warren Kumari No Objection

Comment (2018-12-19 for -10)
Thank you for writing this - I found it a helpful, interesting and pleasant read. 

I do have a few comments - these are just comments, feel free to ignore, etc. 

Firstly, thanks for mentioning BCP-38 -- it feels like you are tilting at windmills here, but I appreciate your optimism :-)

'NTF' should be expanded on first use. 

I don't have any test to suggest, but "For several hours before and after the June 2015 leap second, several operators implemented leap smearing while others did not, ..." sounds like, in June 2015 a whole bunch of operators sat down at their workstations and wrote code to implement leap smearing (the word  "implemented" makes this less than clear). Perhaps "performed leap smearing" would be clearer? 

Section 4.1.  Pre-Shared Key Approach
"The MD5 hash is now considered to be too weak." -- too weak for what? (I agree, but you seem to be missing words).

Much of the test of the document seems to be "motherhood and apple pie" type advice (e.g: "Users should keep up to date on any known attacks on their selected implementation, and deploy updates containing security fixes as soon as practical."), but this is a BCP, this doesn't seem unreasonable :-)

(Terry Manderson) No Objection

Alexey Melnikov No Objection

Alvaro Retana No Objection

Adam Roach No Objection

Comment (2018-12-17 for -10)
Thanks to everyone who worked on this document! It's well-written and easy to
understand. I offer a handful of editorial suggestions below.

---------------------------------------------------------------------------

§1:

>  This document also contains information for protocol implementors who
>  want to develop their own RFC 5905 compliant implementations.

Nit: "RFC-5909-compliant" or "[RFC5909]-compliant".

---------------------------------------------------------------------------

§2.1:

>  UDP-based protocols such as NTP are generally more
>  susceptible to spoofing attacks then other connection-oriented
>  protocols.

Nit: "...than..."

The use of "other" implies that NTP is a connection-oriented protocol, which
doesn't match my understanding. I think you want to simply remove "other".

---------------------------------------------------------------------------

§3:

>  This section provides Best Practices for NTP configuration and
>  operation.  Best Practices that are specific to the NTF
>  implementation are compiled in Appendix A.

Please expand "NTF" on first use.

---------------------------------------------------------------------------

§4.2:

Maybe cite RFC 5906 here?

---------------------------------------------------------------------------

§5.2:

Some of the mitigations in here seem specific to one implementation of an NTP
daemon (e.g., reference to "minsane" and "minclock" parameters and to the
"ntp.conf" file). As the remainder of the advice in the document to this point
appears to be generic, I propose that these practices either be described in an
implementation-neutral way; or, if that is not possible, moved to Appendix A.

---------------------------------------------------------------------------

§7:

>  With anycast, a single IP address is assigned to multiple interfaces,
>  and routers direct packets to the closest active interface.

This is kind of a confusing use of the word "interface" -- a simple reading of
this sentence is that you have a single server with, say, multiple network
cards, and the router is deciding which of those cards to send a packet to.
If I didn't already know the meaning of "anycast," this description would
leave me scratching my head.  Perhaps use the term "node" or "server" instead.

---------------------------------------------------------------------------

§7:

>  As
>  anycast servers may arbitrarily enter and leave the network, the
>  server a particular client is connected to may change.

It might be worth noting in the document that these changes can happen due to
factors other than NTP servers coming online and offline, such as changes in
routing tables.  In more extreme cases -- e.g., flapping routes --  this could
result in clients switching between two different servers rapidly.

---------------------------------------------------------------------------

§A.7:

>  (This is easy to do
>  because the origin timestamp on broadcast mode packets is not
>  validated by the client.  By contrast, client/server and symmetric
>  modes do require origin timestamp validation, making it more
>  difficult to spoof packets [CCR16].

Nit: This is missing a closing parenthesis.

Martin Vigoureux No Objection