jsonpath IETF Working Group May 2021 Interim

Date: 2021-05-11
Time: 9:00 - 11:00 UTC

Attendees

  1. Carsten Bormann, TZI

Agenda Bashing

None

Discussion

Terminology - #84

Tim: On email you said we are good on the terminology issue
Stefan: We need to some work and finish in quite a short time
Tim: If I understand the state of the discussion, we should stick to 8259 terminology where possible, and introduce terms as necessary
Everything is a JSON value, except member names and reasonable to have something shorter, candidates such as "node"
Calling for opinions on this?
Carsten: We don't have to start from January, we have a few things in the document and a few open issues for indexing filters and syntactic elements
We have had "indexing", "key" for the first thing
There are a few details we have to understand about selectors and grouping
Carsten: Maybe we need a term for things with indexing, including all three kinds
Maybe it's not helpful to separate the kinds into different categories
Greg: The selector is the operator, the encompassing thing
Index is the bracket selector, indexes are the things in there
The selector is broad, the path is a sequence of selectors
Carsten: I think we have consensus on overall structure of this
Stefan: We need a specific term for index-value pair
Glyn: I have a concern about the use of "index", it carries array connotations
Greg: I see where you're coming from, having the understanding that any data type can be an index makes it more a "general use term"
Tim: "key" has object connotation, I'm okay with either of these
I'm not hearing a strong objection to the alternatives
Marko: This summary doesn't capture array slice selector is an index, which can be repeated
The slice expression is a valid way to select things
Carsten: Good idea to make a union a comma-separated list of slices without weakening the term index
Marko: The index cannot be expressed just as generic form as it's not a valid start value
Carsten: There is something to solve at the union selector
Marko: The union selector is index or key, filtered
Greg: A filter selector would also work
Tim: I'm not hearing strong issues of principle here, but an editorial issue and this should be handed to the editors
Carsten: We should give this to the editors to handle, but have a leeway, let's agree direction
Greg: Supposing we call an index a number, the union is a comma-delimited sequence or filters, or slices.
Carsten: You're making a technical change
Marko: Is a wildcard a special case?
Stefan: We are looking for a term for this index value pair I think
Greg: The difference with this terminology is we're not talking about values, only the value or numerical index
Stefan: If we take the normalised expression, brackets are always keys until we find the final value
Stefan: If we call it "index" I'm fine with it, but if Glyn says index is more connected to array
Glyn: I am happy to wait for a PR
Carsten: I'm happy to do it
Tim: I'm not detecting any disagreement on this
Greg: I'll write the terminology PR
Tim: Are we still sticking with the term "node"?
Greg: It is the pairing of the location and the value

Consensus: Sticking with the term "node" implying the location and the value

Action: Greg to PR the terminology

Relative Path Support - #59

Tim: I think the use of relative paths is not controversial, but the issue is using '@'
Glyn: I thought it was using @ outside the filter expression?
Marko: To supply a dynamic value to an index?
Glyn: The use case is an "escape hatch" to provide current value to the processor
Carsten: Do we have an example?
Glyn: David provides an example in the issue but I don't favour it
Stefan: If you allow query expressions with an @ symbol that part much be defined elsewhere
Greg: That is David's point, and that some external tool that has navigated partially into an input, and wants to navigate relatively
Stefan: If you define a relative path, will you always need the root path? If not, it would suffice to redefine the root
Greg: Are you suggesting getting rid of the $ and just using @?
Stefan: Just using $ and don't talk about @
Carsten: If there's a difference between notating $ and @, if there is no difference we can always use $
Greg: There's a distinct difference wanting relative and root
On March 12th I give an example where both are used, in some cases back to root, in other to specify the local
Marko: What I was trying to write is the difference; we could just re-evaluate with the current node
Where we wouldn't want to do that, is when we need to go back above the current root in a filter expression or such
Stefan: There should be consensus about the fact we cannot go back
Greg: There a data models that don't support navigating to the parent of a value
Marko: It's not about having navigation, it's about having two handles
Carsten: I can imagine a use case where we need to address this
Tim: There's an argument to be made this is outside our charter, of known implementations 25% support them
Greg: Mine does not
We could say the spec says "start with the $", and optionally support starting with an @
Tim: I hate optional behaviours in standards due to interoperability
This smells outside our charter
Carsten: We need to define our extensibility model before we finish this issue
One way is to nail down the feature set once, everything not in that feature set is not valid
Another way is to introduce versions, with linear progression
Both are extremely unattractive, so we should group optional things into small number of features
Greg: Another option is to just allow and support $, and fail if it doesn't start with $
Glyn: I think this is outside our agenda
Greg: I'm saying we don't move forward with this, but if implementations use it in future we decide to update the spec we can include it
Stefan: A very cheap thing might be is to always start the query with the $, but the $ addresses not just the root
Tim: It's not obvious that relative path and @ falls into the "common semantics" and other aspects of the existing implementations
Carsten: This does not relieve us of the extensibility model
Glyn: If we make @ at the beginning of a path a syntax error, one model is to allow implementations to override syntax errors
Tim: I haven't seen a specific proposal, Carsten did I miss something?
Carsten: I haven't written this up yet, but we should apply existing IETF experience here
We should identify extension points so it can be done in an orderly way
Tim: One extension point might be the first character of a query?
Carsten: Yes, and an IANA registry of first characters with a registry policy
We should have a defined model for implementors or this WG to define extensions
Tim: I don't get the feeling that anyone on the call strongly feels we should add support for queries to begin with @
Greg: I agree so long as we can update this spec at some later point
Stefan: I think we should only support $ for now

Consensus: Queries should not begin with @, but this is a natural extension point.

Action: Chairs to update the Github issue and allow for non-attendees to reply
Action: Write an extensibility model

Regular Expressions in Filters - #70

Greg: There is two aspects here, do we support them and what kind do we support?
Carsten: We should answer the second aspect first
Carsten: Two kinds of regular expressions - matching vs/ computing
And that we need a mechanism for matching
Carsten: Need to refer to existing standard and subset
Tim: Is there anyone who wants regular expressions removed
Marko: That would be a deal breaker, but it would be hard to make something interoperable
Greg: As an implementor my preference would be get a library that does regular expressions for me, matching exactly what we specify
Carsten: For a library developer adding a feature is a win, for a standards developer is a lose, lose, lose
Greg: This is a big enough feature worth including, we just have to figure out how
Tim: Unless we can have a specification of what we mean, we shouldn't have regular expressions
Those who want regular expressions need to propose exactly what that means
Glyn: If we pick a subset of W3C, any implementation would have to massage that into implementations

Tim: Are there any other IETF RFCs that use regular expressions?
Carsten: YANG for example defines using W3C, but implementations use PCRE
Marko: When we say subsets, does the standard define invalid/malformed expression? Would the standard require the library to refuse it?
Carsten: That is answered by the extensibility concept
Tim (summary): If we get a specific proposal, we can go ahead
Stefan: Computed REs or literal REs only, proposal to stick with literal only now
Carsten: It's likely an extension would come up
Glyn: DoS issues?
Could make our subset a common subset of W3C and RE2 to address that.

Consensus: Regular expressions can be included once we have a specific proposal
Consensus: Only permit literals in regular expressions

Action: Carsten to attempt a PR
Action: Cover DoS in the security considerations of the document

Duplicates in Selector output - #23

Greg: Two of the same value in the input, if it's extensive object it doesn't make sense remain portion of the path
Questions raised around identifying not only value but location
If somehow the path evaluated the same location in both, those would be removed
... but if the same value but different location, those would remain in the output
Carsten: Can you clarify if by "same" you mean "equal" or "identical"?
Greg: Two nodes with the same value should not be collapsed, but two selectors return the same node, those should be collapsed
What I wrote on the issue is not what we landed on, what I wrote was strict value equality not node equality.
From a performance perspective I don't want to evaluate the same object twice
Stefan: I agree with this and see overall agreement
Tim: When I'm writing a JSONPath usually I know the JSON I'm writing against looks like
I'm having trouble imaging a scenario where I would want duplicate removal to happen
Greg: The argument of identifying a count of something is a really good argument against this
Tim: The removal of duplicates in something that happens at another level of the program
Greg: Maybe this is out of scope, and let the application handle this afterwards
Glyn: I would be very happy to allow duplicates
Carsten: We may have a simplified view of expressions actually mean, and might assume duplicates being removed
Stefan: I can agree to just allow duplicates for now
Marko: Isn't descendants selector kind of implicitly removing duplicates?
Greg: It wouldn't remove them
Tim: The draft does not well describe this

Consensus: Duplicate removal won't be part of the document

Action: Marko to write the descendants PR

Filter Expressions - #64

Carsten: terminology: filter selectors contain filter expressions (vs. indexing selectors that contain indexing expressions, which may or may not be similar)
Stefan: I proposing first we do the filter expressions, then return to indexing expressions
I'm not sure if it's possible to do valid parsing without the parenthesis
Greg: Lots of questions in the air, we should coalesce the related into a single issue
#17, #57, #92
It might help for someone to condense these down into list of open questions
Carsten: These should be several issues in the end
Stefan: I don't see the necessity in doing them in new issues
Greg: Maybe we can look at closing the other ones, there's a lot to unfold
Stefan: One central discussion point is the filter selector expression syntax, which must resolve to a boolean expression
I would start with just using comparison and boolean operators
Greg: However the current node is equal to some other node that I reference from the root minus 1, I should be able to do that
Tim: We should look at support in existing implementations
Greg: If you're going to be passing boolean operations it's not a big stretch to support these arithmetic operations
Glyn: Numeric operations don't pay for the cost, agree implementation wouldn't be too bad but increased verbiage in the specification
Tim: If it turns out this is not widely supported, I'd be negative of us inventing it
Stefan: Also comparing weak or strong type comparisons, I would prefer weak type
Greg: I imagine loose typing to be divisive, and would prefer strict equality
Glyn: Does anyone support loose equality?
Marko: What we have to define is what happens when input has incompatible types, and how to model the behaviour of ignoring those cases
Greg: We already have verbiage in the specification to that effect
Stefan: Is it our task to do provisions to avoid security issue point, or the task of the implementors
Greg: Either the implementors or the client of the implementation
It's our responsibility to highlight known potential risks
I don't know if it our responsibility to mitigate those risks
Carsten: If avoiding the risk is feasible, we should avoid it
Glyn: Another consideration is non-scalar operations
Tim: I would like to rule it out, as it's a fruitful source of baffling behaviour
Marko: What about lexicographical sorting or collation? (for > etc.)
Tim: Unicode code points, or madness
Marko: We have to say that explicitly
Stefan: Can we agree to use for operators C-based syntax
Tim: Do check existing implementations

Action: Stefan to write PR

AOB

Next Interim meeting

Action: Chairs to arrange a Doodle for an interim meeting in mid-June