Minutes IETF110: jsonpath
minutes-110-jsonpath-00

Meeting Minutes JSON Path (jsonpath) WG
Title Minutes IETF110: jsonpath
State Active
Other versions markdown
Last updated 2021-03-20

Meeting Minutes
minutes-110-jsonpath

jsonpath IETF 110 Working Group

Date: 2021-03-10 Time: 12:00 - 14:00 UTC (to be confirmed)

  • Chair: Tim Bray
  • Chair: James Gruessing
  • Area Director: Murray Kucherawy
  • Scribe: Jeffrey Yasskin

Agenda Bashing

None

Discussion

On the future shape of the working group draft. It’s a bit messy now, what’s our vision of what the IESG -00 looks like?

Update from editors:

  • Glyn: Formed the working group a while ago, with a merged working draft: https://datatracker.ietf.org/doc/html/draft-ietf-jsonpath-base-00.
  • A few editorial changes.
  • Flurry of issues, and discussion going off on tangents. No sign of pull requests: no writing.
  • I intend to work on a reference implementation. Will leave the writing to Stefan and Marco.
  • James: Neither Stefan nor Marco are here. Some issues are more contentious than others. Are there issues worth discussing here?
  • Carsten: I'm not an author or coordinating, but I do intend to provide pull requests. This IETF in particular was challenging, but the silence will not continue. Lots to do editorially. We need to decide some issues before we can make substantial progress.
  • Issue: What will the processing model be? Haven't managed to structure those discussions to get progress. Also some little issues.
  • James: Discuss that now?
  • Carsten: Would be better if discussion were prepared, but we don't have that, so let's do some impromptu discussing.
  • Carsten: 23 open issues, which aren't labeled.

Processing model

Issues: #27, #23, #21, #14, #15

  • Carsten: Sequences, duplicates in selector results, unions, which raise questions about the processing model.
  • JSONPath learned from XPath in its youth, so let's look at the XPath processing model, which is about nodesets. And "steps" go through these nodesets. DSSSL came up with a lot of this in the late 1980s. Have ways to step from the nodes you're sitting on, and ways to select subsets and compute values. Had an expression language that turned into a special thing for XPath and XSLT. The important part is to know where you are and how the sequence of steps are connected together. For XPath, that's the "nodeset". A subset of the set of positions that are defined in the document.

    • It doesn't record duplicates, so you don't mark a node twice.
    • The set isn't ordered.
  • Carsten: I've called the nodeset-equivalent a "collection".

James adds a processing-model label to the named issues.

  • Carsten: strawman: Let's import the nodeset idea from XPath wholesale, and structure everything around nodesets.
  • Julian and Jeffrey in favor in chat, but admit that they're just tourists.
  • Glyn: It depends on the type of reference that the nodeset makes into the JSON document that's input. How do you see that going?
  • Carsten: It's a reference.
  • Glyn: In my input draft, the processing model was an ordered list of elements, as we go through the selector, and didn't remove duplicates. Current implementations don't remove duplicates. An ordered collection of nodes? References into the tree rather than values make sense, since they're all immutable.
  • Carsten: a "node" is a subtree of the original document, not the value.
  • Glyn: Prefer to allow duplicates.
  • Carsten: Also get order.
  • Glyn: Tradeoff: the user might prefer not to get duplicates. But it matters how quickly they want the result to be returned.

  • Carsten: Next we should go through the issues assuming we have a nodelist, to see if that exposes issues. If this were a classroom, I'd divide people into groups, assign issues, and have folks report back.

  • James: I was assuming editors would attend. But I can do that.
  • Carsten: Too bad meetecho doesn't have breakout rooms.

  • James: Don't want to do that realtime.

  • Carsten: #14

    • https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-jsonpath/issues/14#issuecomment-694687029
    • Side discussion about filtering and recursion (somewhat orthogonal)
    • Strawman (slightly fixed): ~~~~ current = [input] for each selector in path results = [] for each item in current results.append(selector(item)) current = results.distinct()

      return current ~~~~

    • append is a concatenate operation

    • distinct runs in linear order; removes "duplicates"
      • "duplicate" is a reference to a node that already occurred in that node list
    • So selector gets a single node (item) and returns a nodelist; these are in turn appended to build the result nodelist ("flatmap")

    • open: what exactly does distinct() do here?

    • nicely shifted to selectors: what order does a selector return out of an unordered map (JSON object)

    • Discussion

      • Carsten: #14 derailed into a discussion of nested filters, and that part doesn't affect this.
      • Example above starts with a root item, and is then repeatedly updated. We linearly apply each selector. Each selector starts with an empty result collection. Appends its result to the result list. Proposal says the results are put through "distinct()" which isn't clearly defined, and that becomes the current nodelist. When the expression is completed, that becomes the starting point for the next selector.
      • Selector doesn't get the whole nodelist. It gets each element in turn. So you can't write selectors that reorder nodelists, or build an average. You only ever get selectors that work on a single node.
      • Glyn: "append" means "concatenate".
      • Glyn: Hope that distinct runs through in linear order, and removes the later copy of any duplicate.
      • Carsten: What's a duplicate?
      • Glyn: A reference to the same node. One that was already in the nodelist.
  • Jeffrey: #15

    • The "select all" operators don't have an obvious order for objects... unless JSON objects are semantically ordered?
    • Discussion isn't describing the order of results, just the set. No mention of duplication.
    • If we keep an order here, we have to specify this which could be ambiguous
    • It doesn't mention duplicates or de-duplication
    • Discussion:
      • Carsten: Whatever the spec says, the JSON data may not be presented in any ordering
      • Python reorders a lot. JSON objects don't have order. Arrays, of course, do have order. If you filter an array, the result should have the same order as the original array.
      • Glyn: IIRC, the only selector that cares is '*'. And then the order is non-deterministic. Have to accept that if you use that construct, result is non-deterministic. I could be made to work.
      • Jeffrey: Make sure to mention it in the spec.
      • Carsten: Could define an order. Do we expect JSONPath to be deterministic? If that's really important, then we'll have to define the order in which * returns its values. Volunteers to file an issue.
      • Glyn: Should not be deterministic, because that'll constrain implementations to much. Doesn't matter either: applying * to an object is rare. Recursive descent maybe can define it.
  • Glyn: #27
    • This refers mainly to #23 (Duplicates in selector output).
    • There isn't a consensus, but #23 is tending towards removing duplicate nodes, but not duplicate values.
    • There was discussion of whether this would affect ordering (i.e. would it make the spec non-deterministic?), but it seems not.
    • Duplicate nodes could be discarded in favour of earlier nodes in a node list.
    • I think this could be done during the processing of the selector stages without affecting the overall outcome
    • Discussion:
      • Might be difficult to distinguish nodes from values in some implementations, especially in languages without pointers.
      • Glyn would like to see a proof that removing duplicates during processing selector stages produces the same result as removing duplicates at the end.

(mjkoster) Could we introduce the notion of an ordered nodeset, which would define an ordering and would be used to process nodesets where ordering is important? There could be an ordered nodeset that represents the source document, as well as for the nodes selected by some processing operation.

  • James: Are we commenting on the issues?
  • Jeffrey: Suggest that James as chair adds a comment to each issue summarizing this discussion with a link to the minutes.
  • Carsten: Someone should produce a PR with a concrete proposal to make nodelists happen.
  • Glyn: We've been mostly on the mailing list and issues, and the editors haven't been sending PRs.
  • Carsten: I volunteer to send a PR before Easter.
  • James: Will talk to Time, and leave comments. Will leave the issues open for further discussion. Looking forward to Carsten's PR.
  • Carsten: Go over more issues? The expression language issue?

#53

  • Glyn: Someone should make a choice on each line and send a PR. Was hoping someone else would do that.
  • Carsten: Getting terminology right is important. Some decisions have been made for the current document. That's another PR I'd have a lot of fun writing.
  • Carsten: Is there disagreement or strong opinion?
  • James: Can also reiterate this on the mailing list.
  • Carsten: Better to have a PR that makes the decisions.
  • Carsten: - characters in names is asked in the Review section.
  • Carsten: Talk about filter and index expressions, so "script" expressions aren't a term we'd use today.
  • Carsten: Assign me #53, but comment that if someone else wants to do it, that's fine.

#54

  • Carsten: 10 issues in 1!
  • James: Shall we split this into several issues, after the meeting?
  • Carsten: Yes.
  • Carsten: Going through subissues.

Title of the specification

Rough agreement for "JSONPath: Query expressions for JSON"

Terminology

  • Carsten: We've mostly implemented the bullet list at the end of the item; should check this in the document. "Node" and "item" are synonyms, but they emphasize "position in tree" vs "what's at the position". Agreed on "member" for the "thing that's in an object, the key-value pair". Agreed to use "name/value pair" instead of "key". Thing in an array is an "element".
  • Glyn: A mathematical option for "name/value pair" might be "maplet". Has a mathematical history.
  • Carsten: Prefer "member".
  • Carsten: There's an existing terminology section, and this reaffirms that. We haven't defined "node" in the document. Do we want to always talk about "items"? I think we should have "node". Need a (small) PR. Cannot fix the term "union" at this time.

(mjkoster) would it not make more sense to use "item" for an element of an array, since it's already called that in json schema, and use "element" for the more general reference? i.e. element == node, array(items), map(members)

Differentiation from JSON Pointer

Carsten: Happy with the +1

References to XPath

  • Carsten: The table helped me.
  • James: No strong thoughts.
  • Carsten: Could move it to an appendix.
  • Glyn: Appendix sounds good. Putting the XPath comparison early risks depending on XPath for semantics, when we want to define semantics in our document.

Array Slice Operator

  • Carsten: No opinion.
  • James: AI: split this out into a new issue.

Unions

  • Carsten: Interesting question is around duplicate removal. Daniel disagrees with the processing model discussion,
  • Glyn: Union of node references? The test suite includes assertions of duplicated output.
  • Carsten: "Union" implies a lack of order, but we said we do want an order. AI: Ask for a replacement term on the mailing list?

Duplicates and Ordering

  • Carsten: Focuses on arrays, where ordering is obvious. Push this off to the processing model issue?

Filter Expressions

  • Carsten: Like Glyn's proposal to have them only work on arrays. Makes it easier to deal with ordering, but I don't know how much people are relying on filtering map members. Need a survey. How much damage do we do by adopting Glyn's recommentation?
  • James: Can we learn from the interop tests?
  • Glyn: The interop tests are really all we have.
  • Carsten: They don't tell us whether applications depend on the behavior.
  • James: But if implementations don't support it, they can't be using it.
  • Glyn: There may still be an obscure implementation that lots of apps depend on.
  • James: AI: Split this to another issue.

Respect Implementations

  • Carsten: Should add introductory text to explain what the considerations are that went into this process. This is good material, but mostly editorial. Not much disagreement on the principle.
  • James: AI: File an issue to do the editorial work.

Error Handling

  • Carsten: Converged, but have to write it up.
  • James: AI: File another issue to track doing this.
<hr />
  • Carsten: Then close issue #54, in favor of the replacing issues.
  • James: Except for unions and duplicates, each piece goes into a new issue. WG, please check my and Tim's work doing this.

  • Kristina, via Jabber: Is there any way we could quickly resolve not to change the most basic obj.first.second and other pathing syntax, so that implementers who are using those things today can feel safer that they're not going to get hit with a breaking change?

  • Carsten: Could start an examples appendix where we all agree what the result should be. E.g. $.a.b is one such example where we all agree. But there are questions about what happens when one such item is an array.
  • Kristina volunteers to help with that.
  • James: AI: Create an issue for this.
  • Glyn: Kristina, are you happy with the current draft behavior?
  • Kristina: Yes. we (Microsoft) use it here: https://identity.foundation/present…xchange/#jsonpath-syntax-definition05:27:44 In our DIF standard

  • Francesca: Chairs should organize an interim meeting. Some of the people in the github issues are in a different timezone. Do a Doodle poll to make sure the right people can be present.

  • James: Would the first week of May be appropriate? Next IETF meeting is in July.
  • Glyn: Second week would be ok.
  • James: AI: Will put it to the mailing list. Will favor the times&dates for the editors and chairs.

#55

  • Carsten: Not something we can act on, but it's a statement of interest from another standards body about how they're going to make use of this. Need to think about how to respond to users. Do they have unfulfillable expectations? We should say that. Are there activities that need to be split off? Detect that.
  • James: Don't see this on the mailing list; should we do that?
  • Carsten: Yes.

Issue tracker vs mailing list

  • James: I and Tim made a decision, but if that's not working for the WG, please chime in.
  • Glyn: Should we distill ML discussion into the GH issues? Some people prefer each, and it's disorganized.
  • Carsten: Issues help to increase granularity. When we notice things in a meandering ML discussion that can be solved, they should be converted into issues. Then people can write PRs, which close issues. Some GH issues are less actionable, and those should be moved to the mailing list.
  • Glyn: Flag issues as needing further ML discussion.
  • Carsten: Labels are free.