GeneRic Autonomic Signaling Protocol Application Program Interface (GRASP API)
RFC 8991
Yes
Robert Wilton
No Objection
Alvaro Retana
(Barry Leiba)
(Deborah Brungard)
Note: This ballot was opened for revision 07 and is now closed.
Robert Wilton
Yes
Alvaro Retana
No Objection
Erik Kline
No Objection
Comment
(2020-12-01 for -08)
[[ questions ]] [ section 2.3.2.4 ] * It looks like the URI may contain an IP address or FQDN as well as a port number? If so, is there a validation requirement about the presence or value of the port field in the ASA_locator in relation to the port number in the URI? [ section 2.3.3. ] * For deregister_asa(), if the ASA name is redundant, does that mean that a call like deregister_asa(asa_nonce=valid_nonce, name="") should succeed? I suppose one ASA can deregister other ASAs by cycling through the 32-bit numberspace? * For register_objective(), but happens if overlap=False for an objective already registered with overlap=True? And what about the inverse? I guess, what is the trust model of multiple ASAs sharing a GRASP core (i.e. on the same node)? [ section 2.3.4 ] * For objectives that other ASAs on the same node might be trying to discover(), is the cache kept separate per-ASA or shared? If shared, it seems like the TTL<minTTL entries should be ignored and not deleted, maybe? (I haven't read any text describing cache implementation requirements or guidance yet.) * For asynchronous mechanisms, is the callback (if used) called multiple times, as locators are discovered or are they accumulated until the timeout is reached and returned in one callback invocation? If the former, is there one final callback with, if necessary, an empty list to indicate the timeout was reached (as a convenience)? [[ nits ]] [ abstract ] * "adapted to the support for" -> "adapted to add support for", perhaps [ section 2.3.2.4 ] * Perhaps replace "ifi...probably no use to a normal ASA" with something like "probably only of use to an ASA on a node with multiple active interfaces"? [ section 2.3.6 ] * s/caches all flooded objectives that it receive/... that it receives/
Murray Kucherawy
No Objection
Comment
(2020-11-30 for -08)
This might be an implementation detail, but I feel like bringing attention to it to clarify: Looking at this as a guide to API implementers, I'm confused by one aspect to this document. There are portions of the API specification where some of the returned items are conditional. For example, in Section 2.3.3, the response to "register_asa()" always contains an "errorcode" but it will also contain an "asa_nonce" if registration was successful. What does it mean for a response to be sometimes missing a piece of information? I'm thinking, for instance, about python where your response might be a single value or a tuple of values depending on success or failure, and I as the consumer will have to handle each case separately. Wouldn't it be simpler for "asa_nonce" to have a possible sentinel value for use in failures? (Maybe 0, maybe -1, maybe MAXINT; the use of "integer" in the document generally doesn't specify whether it's signed or unsigned or what limits might exist. Or maybe "None".) That way, responses always have the same number of elements and possibly types irrespective of the function's outcome. For a more extreme example, the response to "request_negotiate()" could have anywhere between one and four elements in it too, and of varying types. It's possible this doesn't matter though; you're doing the API implementation, you get to decide and document it and then deal with user response. But as someone who produces and documents APIs a lot, this stuck out to me.
Roman Danyliw
No Objection
Comment
(2020-12-01 for -08)
Thank for responding to the SECDIR reviewer and thank you to Joseph Salowey for performing it. ** Since this is an API spec a few more example pseudo code snippets showing common ASA “tasks” invoking this API from both sides of the connection (like Figure 2) would be very helpful. ** More precise references to draft-ietf-anima-grasp might helpful to implementers (e.g., in Section 2.3.2.3, “… default GRASP_DEF_LOOPCT, see [I-D.ietf-anima-grasp]” ==> “... see Section 2.6 of [I-d.ietf-anima-grasp]”) ** Section 1. Per “An ASA runs in an ACP node and therefore inherits all its security properties, i.e., message integrity, message confidentiality and the fact that unauthorized nodes cannot join the ACP.”, in the spirit of precise, things like message integrity and message confidentiality are not properties of the ASA or of the ACP _node_ but instead properties of the protocol used on the control plane. ** Section 2.1. Recommend using consistent terminology. In this section ASA call a “GRASP module”. However, Section 1 lays out an architecture of GRASP Core + API. ** Section 2.2. I found the placement of this section confusing. There is a discussion of the calling conventions for an API that hasn’t been discussed yet. IMO, this should be after Section 2.3. That said, thanks for describing these different calling conventions. Showing these in examples would be very helpful. ** Section 2.2.2.2. Per the definition of TTL, is it worth clarifying here and in the subsequent descriptions that this is an unsigned of a particular size (unsigned 32-bit at least) per Section 5 of draft-ietf-anima-grasp? ** Section 2.3.2.3. Is it worth clarifying that loop_count should be between 0 and 255 per Section 5 of the draft-ietf-anima-grasp? ** Section 2.3.2.3. Provide a normative reference to which version of C and Python will be used. ** Section 2.3.2.3. If an older C is used, is “char *name” the right way to handle a UTF-8 string? ** Section 2.3.2.3. Per the C data structure of an objective, should loop_count and value_size be unsigned integers of some kind? ** Section 2.3.2.3. Why does the Python implementation set a default value of loop_count but C does not? ** Section 2.3.2.3. Please provide a reference to libcbor ** These examples in C and Python found Section 2.3.2.3 were helpful. I was hoping to find them in the other sections. Also a C-style .h file with function prototypes and constants would also be nice (e.g., GRASP_DEF_TIMEOUT, IPPROTO_*, all the error types) ** Section 2.3.4. Typo. s/tiemout/timeout/ ** Section 2.3.2.4. The constants IPPROTO_TCP and IPPROTO_UDP aren’t defined here. Recommend a reference to the grasp draft. ** Section 2.3.7. Double checking -- per the info input parameter, is the ASA supposed to provide this content or is this something from GRASP Core? ** Appendix A. This list doesn’t appear to be a complete crosswalk of function to error codes to possible APIs. For example, “NotObj” is listed as a general error code, but would that get returned by register_asa()? ** Per the GENART Review, IMO, Paul makes a number of good points, in particular: -- a reference or further explanation of the flow for dry run and how this would be used in other API calls -- additional clarifying language on request_negotiate -- Renaming the “session nonce” to “session handle” (or something like it) might improve clarity so the API doesn’t have to deal with multiple “nonce”
Éric Vyncke
No Objection
Comment
(2020-12-01 for -08)
Thank you for the work put into this document. Please find below one some non-blocking COMMENT points, and one nits. I have also request IoT directorate and INT directorate reviews, so, you may expect more reviews. I hope that this helps to improve the document, Regards, -éric == COMMENTS == -- Section 1 -- In figure 1, is the "GRASP API Library" identical to the "basic GRASP library" mentioned later in the text? -- Section 2.1 -- May I assume that the bulleted list is not exhaustive? Probably worth stating "For example, ..." if this is the case. -- Section 2.2.1 -- This whole section looks more like a tutorial than something useful in an IETF document ;-) but no problem to leave it. Same applies for section 2.2.2 and even to 2.2.3. -- Section 2.3.2.2 -- Should it be specified that the timeout is an *unsigned* integer? Same applies for "loop_count" in section 2.3.2.3 -- Section 2.3 -- Several occurrences of "returned parameters"... should it better be "returned values" ? -- Section 2.3.3 -- "All ASAs must use this call." should it be followed by "before issuing any other API calls" ? "automatically if an ASA crashes" but what about "graceful termination" ? == NITS == -- Section 1 -- Suggestion move figure 1 earlier in the text to improve readability.
Alissa Cooper Former IESG member
No Objection
No Objection
(2020-12-03 for -08)
There are a few outstanding unresolved comments from the Gen-ART review that it would be useful to resolve, particularly clarifications in Section 2.3.5. In general the Gen-ART review made me wonder if it might be useful to get some more implementation experience and interop testing going before trying to extend or build out much more functionality on top of GRASP, since there are many implementation-specific decisions left unspecified.
Barry Leiba Former IESG member
No Objection
No Objection
(for -08)
Benjamin Kaduk Former IESG member
No Objection
No Objection
(2020-12-02 for -08)
I have two comments in particular that I would like to call your attention to: my comment on cache flushing in Section 2.3.4, and my comment on the CBOR data model used for validation in Appendix A. Section 1 An ASA runs in an ACP node and therefore inherits all its security properties, i.e., message integrity, message confidentiality and the fact that unauthorized nodes cannot join the ACP. All ASAs within a I agree with Roman's comment that the "it" whose security properties are inhereited is the ACP *node*, not the ACP itself, and thus that some rewording is appropriate. The GRASP API library would need to communicate with the GRASP core via an inter-process communication (IPC) mechanism. The details of Hmm, if the GRASP core is in kernel-space and the API library in userspace, wouldn't we normally refer to that exchange as a system call rather than IPC? (Figure 1 also labels this interaction "IPC".) Section 2.1 * Authorization of ASAs is not defined as part of GRASP and is not supported. Any chance I could interest you in s/not supported/a subject for future work/? It is looking somewhat likely since such a statement is already present in the security considerations... * User-supplied explicit locators for an objective are not supported. The GRASP core will supply the locator, using the ACP address of the node concerned. This would seem to prevent any non-ACP use of GRASP; I suggest adding some language with a caveat about "for example" or similar, unless the intent is to limit the API usage to ACP (or DULL) scenarios. Section 2.2.1 I think that the possibility for a single outbound message to get a sequence of incoming replies (at different times) further complicates the design of an asynchronous mechanism, and we would do well to discuss how such scenarios (e.g., broadcast discovery messages) would be handled by the implementation and API. (I see that we do end up using a timeout in practice to resolve this topic, but would probably still mention it as an issue that has been resolved, here.) Section 2.2.2 ports rather than a separate port per session. Hence the GRASP design includes a session identifier. Thus, when necessary, a 'session_nonce' parameter is used in the API to distinguish simultaneous GRASP sessions from each other, so that any number of sessions may proceed asynchronously in parallel. I do see that there was previous discussion on the 'nonce' terminology here, and I am unsure why there is need to move away from the "session ID" terminology used in GRASP itself. In particular, the "session_nonce" is not a number used *once*, rather, it is used only for one session (but potentially multiple times within that session). That, to me, makes it a (short-lived) identifier, not a nonce. Roman's proposal of 'handle' would resolve this apparent disparity. Section 2.2.3 On the first call in a new GRASP session, the API returns a 'session_nonce' value based on the GRASP session identifier. This What does "based on" mean? Does there need to be a one-to-one correspondence? Or just in one direction? Are we going to be constrained by the (IMO, too limited) 32 bits of randomness limit of the GRASP Session ID? Section 2.3.2.3 - Note 3: In a language such as C the preferred implementation may be to represent the Boolean flags as bits in a single byte, Which aspect(s) of C are relevant for the "such as"? An essential requirement for all language mappings and all implementations is that, regardless of what other options exist for a language-specific representation of the value, there is always an option to use a raw CBOR data item as the value. The API will then wrap this with CBOR Tag 24 as an encoded CBOR data item [RFC7049] for transmission via GRASP, and unwrap it after reception. I'm not sure I understand why the bstr wrapping is mandatory -- I would have thought that the attraction of using a raw encoded CBOR data item would be that it could be used directly, without additional wrapping. int loop_count; int value_size; // size of value in bytes Some people might argue for using unsigned types for at least sizes (e.g., size_t), and often for things like loop counts that cannot be negative (though the argument for an unsigned type there is somewhat weaker). self.value = 0 # Place holder; any valid Python object Wouldn't None be a more conventional placeholder in Python? Section 2.3.2.4 * The following cover all locator types currently supported by GRASP: - is_ipaddress (Boolean) - True if the locator is an IP address - is_fqdn (Boolean) - True if the locator is an FQDN - is_uri (Boolean) - True if the locator is a URI Are these mutually exclusive? Section 2.3.2.6 As for the GRASP session ID, I think that a 32-bit cap is too restrictive. I think we should be in the habit of using 128-bit nonces and needing to justify anything smaller. (64 bits would *probably* be fine here, FWIW, and might make it easier to represent in common language bindings.) Section 2.3.2.7). Another possible implementation is to hash the name of the ASA with a locally defined secret key. I recognize that this is a throwaway line, but the naive keyed hash construction is subject to length-extension attacks (for certain hash constructions such as the Merkle-Damgarg family that includes SHA-2); HMAC is more robust for this type of usage and can be phrased in an similarly concise manner ("compute an HMAC of the name of the ASA under a locally defined secret key"). Section 2.3.3 * deregister_asa() [...] - Note - the ASA name is strictly speaking redundant in this call, but is present for clarity. So what happens if the wrong name is passed? transmit to other ASAs. It is not necessary to register an objective that is only received by GRASP synchronization or [...] Registration is not needed for "read-only" operations, i.e., the ASA only wants to receive synchronization or flooded data for the objective concerned. These seem to have high overlap and thus be candidates for deduplication. - The 'ttl' parameter is the valid lifetime (time to live) in milliseconds of any discovery response for this objective. The (nit?) I'd suggest to add "generated", since it would not apply to any hypothetical received discovery response for the objective in question. - If the parameter 'overlap' is True, more than one ASA may register this objective in the same GRASP instance. Do all ASAs registering this objective have to set it to True, or just the first one, in order for the subsequent registrations to succeed? Section 2.3.4 - If the parameter 'minimum_TTL' is greater than zero, any locally cached locators for the objective whose remaining time to live in milliseconds is less than or equal to 'minimum_TTL' are deleted first. Thus 'minimum_TTL' = 0 will flush all entries. Why does one ASA's request flush entries from the cache shared with other ASAs? I am forced to infer the motivation for including the minimum_TTL parameter in the first place, but it seems like it is useful if the requesting ASA needs to find something that will remain active for a given period of time, but different ASAs may have different needs for the peer's stability, and so flushing the cache in this way could hamper the operation of peer ASAs. If the intent is only to not return those cached locators *for this discovery operation*, then say that, not that they are flushed from the cache entirely. Section 2.3.5 Thanks for the figure (I probably should have put one into RFC 7546, which is basically this section but for the GSS-API). I suggest noting in the first paragraph that the negotiation occurs in lockstep, with the initiator starting the negotiation and preparing a message, the responder processing that message and generating a new negotiation message in turn, with at most one negotiation message in flight at any given time. It seems particularly important to note whether this also applies to negotiate_wait() calls/messages, or if those can be made at any time by either entity. (This probably relates to some of the genart reviewer's comments.) I note that the prospect of the loop count going up (and, thus, risk of infinite looping) was pointed out by the genart review. I share such concerns and am happy to see that improved discussion of this topic (and the related 'lifetime' extension) is planned. For this and any other error code, an exponential backoff is recommended before any retry. Any guidance about whether this should be by doubling vs a different exponent base? I guess the security considerations do say that it's dependent on the semantics of the objective in question, which may be enough (though a pointer or mention here would be appreciated). (Also, any reason to not use the 2119 RECOMMENDED?) - This function must be followed by calls to 'negotiate_step' and/or 'negotiate_wait' and/or 'end_negotiate' until the negotiation ends. 'listen_negotiate' may then be called again to await a new negotiation. We just recommended a few paragraph previously that listen_negotiate() should be called again *immediately* after the first listen_negotiate() returns; I don't see why it's useful to also say that it might be called again after a given negotiation ends. - Executes the next negotation step with the peer. The 'objective' parameter contains the next value being proffered by the ASA in this step. It must also contain the latest 'loop_count' value received from request_negotiate() or negotiate_step(). This is intreseting; negotiate_step() must preserve the loop count from the previous call, so only the initial negotiation response (the request_negotiate() 'proffered_objective' output) can increase the loop count, not any arbitrary negotiation step? That seems to limit concerns about infinite looping (as raised by the genart reviewer and apparently acknowledged in the response to the genart review). o Threaded implementation: Called in the same thread as the preceding 'request_negotiate' or 'listen_negotiate', with the same value of 'session_nonce'. IIUC it is *expected* to be called in the same thread as the previous call, but is not strictly speaking *required* to do so, since the session_nonce tracks the library state for the negotiation in question. Or am I mistaken? 'result' = True for accept (successful negotiation), False for decline (failed negotiation). 'reason' = optional string describing reason for decline. What happens if I pass a reason string with result of True? Section 2.3.6 - If the 'peer' parameter is null, and the objective is already available in the local cache, the flooded objective is returned immediately in the 'result' parameter. In this case, the 'timeout' is ignored. - Otherwise, synchronization with a discovered ASA is performed. If successful, the retrieved objective is returned in the 'result' parameter. From context this 'otherwise' seems to be the "'peer' parameter is null but the objective is not available in the local cache" case (as opposed to also covering the "'peer' parameter is not null" case). It might be possible to clarify this with formatting and/or rewording. * synchronize() [...] - Since this is essentially a read operation, any ASA can do it, unless an authorization model is added to GRASP in future. Therefore the API checks that the ASA is registered, but the objective does not need to be registered by the calling ASA. [...] - Since this is essentially a read operation, any ASA can use it. Therefore GRASP checks that the calling ASA is registered but the objective doesn't need to be registered by the calling ASA. These seem redundant and candidates for de-duplication. - In the case of failure, an exponential backoff is recommended before retrying. [same remark as previously] Section 2.3.7 'info' = optional diagnostic data. May be raw bytes from the invalid message. This means it does not have to be well-formed CBOR, and will be wrapped in a bstr by the library? (The GRASP spec suggests that a different CBOR structure would be permitted, though of course the API need not be required to expose such flexibility.) Section 4 If we're going to keep the 32-bit nonce/handle/etc, it's probably worth a mention of collision/guessing probability. It might be worth a reference to the RFC 3986 security considerations since we do allow URI locators. This is not really any different than for GRASP itself, but the URI is exposed to the API consumer and so reminding them about it seems worthwhile. The session_nonce is nominally opaque to (non-ACP, at least) ASAs, but is likely to be implemented in a way that does preserve some state. Is there a risk if an ASA attempts to "peek through the abstraction barrier"? (I am not sure I see one, but you're the expert!) GRASP objective concerned. These precautions are intended to assist the detection of malicious denial of service attacks. I suggest to drop the word "malicious"; such denial of service conditions need not be malicious and can occur by accident. As a general precaution, all ASAs able to handle multiple negotiation or synchronization requests in parallel may protect themselves against a denial of service attack by limiting the number of requests they can handle simultaneously and silently discarding excess requests. I think that best practices would also include some limit on the number of objectives registered by a given ASA and possibly the number of ASAs registered, to protect the core library/kernel resources. (nit?) I suggest dropping 'can'. Appendix A There was some discussion with the genart reviewer about the CBORfail error code as being particularly useful. I note that draft-ietf-cbor-7049bis is in AUTH48 and introduces a hierarchy of "levels of validation" (in the form of different data models). CBOR that is valid in the generic data model might not be valid in the extended data model or a data model specific to a given application. I strongly encourage this document to update to referencing 7049bis and giving an indication of what data model is in use for processing both information received from the peer and any CBOR-encoded data received from the ASA. 'noSecurity' error will be returned to most calls if GRASP is running in an insecure mode (no ACP), except for the specific DULL usage mode My understanding of the text in the GRASP spec itself was that non-ACP security services were allowed. Is the API intended to be limited to only ACP usage? ASAfull 4 "ASA registry full" (register_asa) dupASA 5 "Duplicate ASA name" (register_asa) noASA 6 "ASA not registered" notYourASA 7 "ASA registered but not by you" Giving this much detail is making things much easier for malicious ASAs ... but given that the deployment model basically assumes that such things don't exist (even if we do give some small consideration to the possibility in some places), I will not complain about retaining this level of detail in the error messages. noDiscReply 17 "No reply to discovery" (req_negotiate) There is perhaps some explanation to give about the distinction between noReply and noDiscReply, i.e., in the body text. Maybe it is self-explanatory, though, provided that the author of the code notices that noDiscReply exists at all. Likewise for noNegReply, noSynchReply, noValidSynch, and, possibly, noValidStep.
Deborah Brungard Former IESG member
No Objection
No Objection
(for -08)
Magnus Westerlund Former IESG member
No Objection
No Objection
(2020-12-03 for -08)
So I didn't have time to read your document in detail, thus I can easily have missed something. Hopefully a bit of clarification on what I might have missed will resolve this issue. I do wonder over one aspect of this API surface. How does it handles when the GRASP layer is unable to send the messages in a timely fashion based on the API calls? Looking at GRASP I understand that it is using either UDP or TCP. The rate limiting of UDP does not appear to be more well specified that to follow RFC 8085 recommendations. So my concern here is that you actually have some risk of running into that the upper layer using this API tries to become a bit to active and do everything at once, thus resulting in that TCP congestion control and flow control might block timely transmissions, and for UDP the rate limiter / congestion control of the UDP messages. What happens in this API when this occurs?