IETF85 Precis Meeting notes 2012-11-06 Atlanta, GA, USA MB: Marc Blanchet, co-chair PR Pete Resnick, Area director PSA: Peter Saint-André YY: Yoshiro Yoneya, co-chair JH: Joe Hildebrand JK: John Klensin 1. Administrativia * MB: for problem statement, most comments are editorial. There is one from secdir about having a more comprehensive security considerations, but not sure we want to do something about it. Thinking of not adding any specific text. * PR: I am willing to support you on this if you think it's not something to worry about. * PR: So long as you give a reasonable explanation, I am willing to defend the decision with the IESG. 2. Document updates / Discussion NOTE: All documents discussed in one presentation 2.1 WG I-Ds (1) Framework https://datatracker.ietf.org/doc/draft-ietf-precis-framework/ * YY: In the mapping document we described width mapping. I think width-mapping is very important for asian scripts, so it is mandatory for precis. But I'm not sure it should be in the framework or a separate document. * JH: I would just as soon have it in the framework. At least say if you are not using NFKC, then you MUST(y) be performing width-mapping * PSA: I think it should be MUST * PR: One of the reasons that we called out width-mapping in IDNA2008, is because it was important to do, but it was also very easy to do. NFKC does lots of dicey (at best) things. What we don't want to start down the path of, say Arabic base numerals, map those to Latin Arabic numerals? I don't think we want to go down the MUST path everywhere it happens, but calling it out as a mapping that just about everyone wants to do. This document has been clear about mapping. * PSA: All this document is saying is that if you're doing mapping, here's what you do * PR: Dodes that include, say case mapping * PSA: We have this mappings bucket, where we've got these other things we're doing, but here's what you do if you're doing mapping * JH: Do we have a document that describes the mappings? * JH: I think it was a mistake for UNICODE to label these as compatibility mappings, but now we have to work around it * PR: I'm expressing ambivalence, but I'm ok either way. * PSA: Do we put the mappings here, or do we point to these other documents? Personally I think we should be more explicit about width mapping. * JH: I think that's ok, but we should ask someone why they're not doing width-mapping. * PSA: That will be part of our ongoing review * JH: * MB: I've been trying to get some time from people that are doing discs (i.e. iSCSI) that are customers of stringprep/precis. I want another customer of précis that doing something other than JABBER to verify the usability of the framework * JH: The message that I sent this morning brings up a separate things that we do that are more registry like. I don't want two things that are named confusably to be present. Do I keep a normalized mapping, but never transmit the confusable version? * PR: (no hat) Suggestion: you have normalization, case-mapping, etc. Normalization is just that. But mappings are a little different. Maybe you should have a section on mappings for the template, and include the well-known mappings, and point them to the other mapping documents. That I think sends the right message, and doesn't push well-known mappings to an exhalted level. I can see why you might not want to do width-mapping, but you'll probably want the mappings most of the time. * JH: Like the top two, but maybe add one more for confusable collision. Maybe I don't want to register the same exception for width-mapping or case-mapping or (etc). nickname-04: Open issue on the list, not captured in presentation * JH: What I'm suggesting is this new concept, to ensure some kind of uniqueness (confusables, etc etc) and have that separate from other things you would want separate from comparison. * JK: Reviewing documents brings back a thought I had. NFKC is a problem because it combines various different things. It maps character that people thought were the same but aren't really. but it also maps things that really ought to be in NFC, and maybe we should fix that problem here, and start thinking about an NFI. We have sort of been backing into that one character at a time. But if you start doing it, then you start going down the road of creating a normalization form. Instead of having different per-app rules, we bring them together, or it will drive app writers crazy. * JH: I don't disagree, and we are probably have to do this eventually, but I hope this is not the day. But if we did that, we need two (comparison vs. confusable). The two are relatively separate. * JK: The confusable issues at the character level is a rat hole that trying to deal with will make things works. * JH: Yes, and that's why I want to separate this. Say we'll only get about 10% of the way on confusability, but cover most of comparison. * JK: The confusables problem is particularly troublesome. * JH: I agree, and that's why I want to have a table of things that might be confusable, and how we can track it over time. * JK: ISO-10646 -- The mandate was to create a single universal character set, where any character can be represented in one and only one way. There's two ways to get confusable, by derivation and by representation. But the derivation problem wasn't the case at the time. But over time, the UNICODE spec brought this about. * MB: I want to get back to actual work, and come back to this later. * PSA: That we don't do this in the document, but maybe in a separate document. (2) Preparation and Comparison of Nicknames https://datatracker.ietf.org/doc/draft-ietf-precis-nickname 2.2 Individual Documents (1) Mapping characters for PRECIS classes https://datatracker.ietf.org/doc/draft-yoneya-precis-mappings/ * PR: (ind) The list of case mappings, those are really it? Those are all the special locale-specific characters? * nodding of heads * PR: This document defines the list of ones to be done. I think you need to be more clear that this document defines the ones you will need to do work and PRECIS will not do the work for you. I can maybe suggest text, but you should start with here is where you need to do work, and we are not doing the work for you. Does that help? * YY: For this document, we exhausted the specialcasing.txt and this is the result. We also put the list in the IANA registry. * JH: I think what we want to do is point to specialcasing.txt, and not duplicate it. * PR: The document might say "As of this date, this is what it means". * JH: I would rather put in two or three examples, because implemnters might just cut-n-paste. * JK: I want everyone to be sure this is a seriously moving target. For instance, how many speak ? (no hands) * What happened is most of these were first in Arabic, then converted to latin in the 1920's, then converted to cyrillic in the 1940's. Some have converted to latin, but not using the same table. This is the 1930's table, and not the 2010's table, and many of these countries are still sorting it out. This is a rapidly moving target, that is wrong according to various governments. There aer two- or three- or four-script languages. I'm not trying to derail, but I want the WG to understand the depth of this work. * JH: I think there's a couple actions 1) one input is "at least" the locale you are operating in (plus maybe some other factors). 2) Some sort of implmeentation guidelines points out very explicitly that these are rapidly changing, and you need to be ready for it. 3) The tools taht some people have started to write, that we as a community look for incompatibilities as UNICODE revs to call out, and maybe have a registry of "here's a change, and maybe we should check it more" * JK: If an identifier is actually a personal name, the common Internet user will be astonished if that name does not compare equal in multiple scripts. I don't think you should go there, but for user experience you should call out these might not be equal and be ready for blowback. * JH: I agree, but the only place I might want to take this on, is if we take on confusability then we might want to have more added to the unicode table. But that's a problem for another day. * PSA: I think this example is wrong (slide 6).
* PR: Simple typo in the slide (0069 should be 0049). The conclusion is correct. * PSA: To Joe's point, if we have a registry for this, then the designated expert will need to stay on top of this everytime there's a UNICODE point release. * JH: I recommend that we don't do this particular registry, but rather haveone of test cases, and ones that have known outputs today. Maybe that's another document, or it might be a separate registry. * PSA: I think a separate registry of testing, because difference OS'es will support different versions, et al. * JH: Instead of a registry, we have a wiki page that documents the changes at the moment of time. * JK: There are other cases where we want people to reference our standards, not copy them. If we want a profile of things, then this is not really valuable. But as part of the profile, it lists the characters that you should go back to the tables to determine the mappings. * JH: I disagree, but I think we're going to the same place. If we're going to have a profile, then we should just reference the table. In your explorations, did you find any characters that were not adequately captured in the specialcasing.txt? * (presenter): NO * JK: Example where we might get in trouble. Consider that suddenly there's a huge internet population in one of these converting areas, and start particiapting in the IETF. And they complain that the UNICODE people are screwy because they're out-of-date I want to make sure we don't trap ourselves into a place where we are caught by UNICODE. (missed discussion) * JK: What we found is that if we're in this case, then we have an exception list, but hoped we never had anything in it. * JH: I understand what you're saying, but as an implementer I would just not do it. * JK: As long as the list is NULL, I'm conformant. The decision you're making is trading a few lines now or open it later to put in a lot of lines later. * JH: I'm fine with things either way. * MB: I have a hard time thinking that we'd do that kind of duplication. We should do it by reference. * JH: To be clear - do we want to preserve a place for where we willfully violate the UNICODE spec. I hope we never have that table. * PSA: Because theres's so much potential for change, and it's a complicated point of dicussion. Point out these are a set of special mappings, and make this optional. * JK: I don't care about the outcome, as long as they;re in the minutes. The longest set of issues we've got, the most serious problem is messing around with case mapping with the assumption that everything is the same. We shouldn't try to do this in an algorithmic manner, and just get people used to things being all lower case (for our interpretation of lower case), and users will adapt. This is a serious design tradeoff between a very stable framework but might not track UNICODE, or something that tracks closely, but will astonish users over time. I strongly recommend you not do any sort of case mapping/folding/ etc. * Andrew Sullivan: I might have been the person responsible that they'll just learn these things (laughter). I think we need to make a decision about what length of time for the user of this protocol. The short-term requires case-folding; there's a whole world that expects it to work. However, we've got a potentially must larger group who aren't using this. Would it be better to not do this and break it for those today so that the future won't deal with the problem (18-36 months vs 25 years). * JH: I think this ship has sailed, and it's too late to stop case folding/mapping. If we were to try to say that all thse are lower case, then we'll have more difficulty with those wanting to use this for user interfaces. I understand that's not the problem we're solving, but we've got building blocks that might help solve it. * JK: The population that needs to use these specialcasing.txt, are not yet on the internet. When they come in and we need to create an identifier that doesn't conform, and we say "NO" this is the beginning of an education problem. For existing versus future databases, I don't know what the balance is, but I think the working group needs to think about that balance. * PSA: I think this topc is connected to the line between interface and protocol. For instance, XMPP we do not preserve case; you might type in upper-case, but we throw it out. When someone contacts support, they might spell it out case-preserved, but we don't store it that way. For technologies that don't preserve case this doesn't seem to be an issue. * JK: XMPP is actually in a more-complicated world because you assume you know the conversions from to lowercase. If you really rejected everything that was upper case, then you'd be out of this mess. If you're in a situation where you're mapping, then you're in this mess. If you're rejecting instead, then you're not in the mess. In the rejection model, we say some things just don't work, while the case folding model says they might work based on your interpretation of the model. * PSA: When you registered your XMPP address 5 years ago, we've lost your original value, and we're already into this mess. If you decide to go down the branch of preserving case and do case compare, you'll have one class of problems. But if you went down the XMPP route, then you've got a different class of problems. * JK: The worst case will be with scripts you haven't seen yet, and how those identifiers map, versus something that shows up on undecorated latin. This WG should be very clear what path it takes, but I don't know what path is right. * MB: My summary is that we don't want to do this registry/table, and that we might want to add some words about mapping, and we are looking at the speakers to provide text. * PSA: That is the correct summary, and we are surfacing some issues that we've not encountered before, and we need to have some text go out on the list to discuss and provide realistic guidance for technology for things that exist and things that will come. * MB: The other point is where this text goes. We have the framework (WG item) and the mapping document (ID). One way is to have everything in mapping, and have the framework document clearly point to the mapping document. * PSA: The framework document talks about case mapping, but not special mapping. I think we need to explore the topic on the list. * PSA: I think the complexity, that this be a WG item, so we can try and come to consensus. Even though we're pointing to it in optional ways, we should have WG review. * PR: We've had before chairs and documents, and it's fine. If you want to find another editor then you might want to consider to allow for more time to chair and not edit. * MB: My interpretation is that this is appropriate for the WG. Query for those who have read it (several hands). Does anyone want this to be a WG item (weak hum); anyone NOT want this to be a WG item (no hums) * PSA: I think this is a good start, but it needs a lot of work (2) Username and Password Preparation Algorithms https://datatracker.ietf.org/doc/draft-melnikov-precis-saslprepbis/ * PR: Does it need working group adoption? * MB: It was within the charter to consider profiles of precis * PR: Is this going to fill a slot in our charter, or is there something else that will fill the charter? But that does mean we need to bring in reviewers. * JK: We're better to review the content of the other documents. * MB: anybody opposed to adopt saslprepbis as wg document? No objections. 3. Next steps OPEN MIC: * JH: I think there's a set of confusable problems that we can deal with. It is a small set of the problem that we will know about 10 years from now. If we know less and less, the problem gets easier (-: * PR: I'm confused about what confusable problem we have the ability to solve. * JH: We now have the confusables table that we didn't know about, but it's not very complete. But the stuff that *is* there, about the languages I know anything about, this has interesting things (e.g. lower-case L vs. numeral one). For say chatroom nicknames, I'm just trying to make sure there's not two that are close in the room's "registry". The case folding, space mapping, and confusable might be useful for comparison. * AS: What context do you want to do this? All of these thingsa re kind of brasping at the feature you're talking about. What I'm concerned about is we're offering a facility that isn't very reliable, and that seems worse that not doing anything. * JH: That is a compelling argument, and I think the question is exactly that: Is this better or worse than doing nothing. I don't know what the right path is. There are some things that implementations are and can do that might reduce user surprise. * JK: Now that I understand it better, and now I know why I'm unhappy about it. If you have some cases that won't work for a user, and it's the first thing that is tried doesn't work, because it changes. If you now have something in the database that is "abcl", and another user comes along to register "aye bee see one" and the system says "NO". It's worse than if you say "NO" to all upper-case to "AYE BEE SEE EL". * PR: It is exactly the human factor that caught my attention. This is not something that is ideally a protocol thing to do. It is at some level a user-interface problem. As soon as you make it a protocol-thing, you hide it from the user, while if you make it a user-interface thing it is visible. If you make it a register check, it becomes a policy issue. I don't want this WG to do much policy stuff, documented like its protocol. * JK: If Joe wants to reject an entry in their implementation, it's fine. But if you wrap the protocol around this, we'll be in trouble. It's a bigger problem if one server does one thing and another does something else. * JH: 1) I'm fine being wrong here. 3) I'm weary of declaring it a purely user-interface problem, because it punts the problem to the implementers and could cause problems down the line with the protocol. * PR: As for #3, informational documents get published all the time, and that's fine. If you are saying we should have an informational document on this, that's fine. But I don't want this WG to take this on until the other items are done. I also worry about WGs taking this one as a task. I think it's something we can think about and talk about, but there's a difference between an information document and combining with a protocol document. * TM: For a specific example, in Chinese there's three or four versions of the characters in the confusable table that look the same but mean competely different things. * AS: It occured to me that someone in the room is involved in a project dealing with user expectation. I wonder if there's any evidence from that project has any insights? * MB: Off-topic * AS: It's not really. If we're making claims about the (dis)utility of confusables, then one thing that would be useful would be to look at other attempts to solve this problem. I don't know if we've got more than anecdotal evidence, and I would like more rigorous evidence. * MB: This is out-of-scope for PRECIS. A study done by ICANN is working on the experience with locale-variant TLD's. We have not touched that specific topic. * JK: That report has contributed to my education over the last serveral months. * PSA: IFF we decide to do something, it would need to be based on experimentation and deployment. I'm operating under the operation of chat rooms in XMPP. At an implementation/policy/deployment, we need to crack down on the masqueraders. We might try some stuff with these confusables things. * PR: Is this issue that only a single server needs to deal with, or does it impact multiple servers? * PSA: It's a problem for single servers. For instance, we don't want someone to impersonate John Klensin. For others (e.g. military), they might have differnt policies. I think it's too particular to the various deployments to make decisions now. * JK: One observation from the ICANN study. One thing to consider doing, is to come to your users and say if there's strings that might be confusing, and have the register "blocked synonyms", for some definition of that term. That kind of facility puts the onus onto the users to protect themselves. * PSA: For instance, on XMPP, I'm "stpeter". I've had people masquerade as me, and I'd like to close down people that are doing that. I would like some sort of database of spoofers, and I as a user don't have to know what's going on. If I click on "Joe Hildebrand" as one of those, then I would want to know what's going on. * JH: XMPP has a particular problem because of its real-time nature, which allows us to gather information more rapidly, and we don't necesarily have users involved. * JK: And you have some idea of credentials * Alan DeKok: As pushing off to ther users. For instance, my name is listed under the "D's" in the US, but under the "K"s in Holland. * JK: Qualifying prefixes that can be dropped without changing the name are particularly messy. * PSA: My take away is "this is hard, and there's some domains that might be able to experiment, and we don't know enough to do something right now" * MB: and now everyone is using FB id's for their identifiers. * PR: It's ok if the ones doing the experimenting should tell the IETF people to come over here. We weren't that successful that last time, but maybe we can try and look at the problem over there. * AD: I've been following this for a while, and it's starting to make some sense to me. And this all needs to be documented in a very simple way, so that we don't all go through the same pain again and again. * JK: One other observation -- PRECIS should not become the hammer that makes everything look like a name. One thing we've known about in the IETF is things precisely like something registered needs credentials or you can't believe me. That might be a potential solution here, and not assume every WG has the hammer.