Network Working Group P. Hoffman
Request for Comments: 3454 IMC & VPNC
Category: Standards Track M. Blanchet
Viagenie
December 2002
Preparation of Internationalized Strings ("stringprep")
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document describes a framework for preparing Unicode text
strings in order to increase the likelihood that string input and
string comparison work in ways that make sense for typical users
throughout the world. The stringprep protocol is useful for protocol
identifier values, company and personal names, internationalized
domain names, and other text strings.
This document does not specify how protocols should prepare text
strings. Protocols must create profiles of stringprep in order to
fully specify the processing options.
Table of Contents
1. Introduction....................................................3
1.1 Terminology..................................................4
1.2 Using stringprep in protocols................................4
2. Preparation Overview............................................6
3. Mapping.........................................................7
3.1 Commonly mapped to nothing...................................7
3.2 Case folding.................................................8
4. Normalization...................................................9
5. Prohibited Output..............................................10
5.1 Space characters............................................11
5.2 Control characters..........................................11
5.3 Private use.................................................12
Hoffman & Blanchet Standards Track [Page 1]
RFC 3454 Preparation of Internationalized Strings December 2002
5.4 Non-character code points...................................12
5.5 Surrogate codes.............................................13
5.6 Inappropriate for plain text................................13
5.7 Inappropriate for canonical representation..................13
5.8 Change display properties or deprecated.....................13
5.9 Tagging characters..........................................14
6. Bidirectional Characters.......................................14
7. Unassigned Code Points in Stringprep Profiles..................15
7.1 Categories of code points...................................16
7.2 Reasons for difference between stored strings and queries...17
7.3 Versions of applications and stored strings.................18
8. References.....................................................19
8.1 Normative references........................................19
8.2 Informative references......................................19
9. Security Considerations........................................19
9.1 Stringprep-specific security considerations.................19
9.2 Generic Unicode security considerations.....................20
10. IANA Considerations...........................................21
11. Acknowledgements..............................................22
A. Unicode repertoires............................................23
A.1 Unassigned code points in Unicode 3.2.......................23
B. Mapping Tables.................................................31
B.1 Commonly mapped to nothing..................................31
B.2 Mapping for case-folding used with NFKC.....................32
B.3 Mapping for case-folding used with no normalization.........61
C. Prohibition tables.............................................78
C.1 Space characters............................................78
C.1.1 ASCII space characters..................................78
C.1.2 Non-ASCII space characters..............................79
C.2 Control characters..........................................79
C.2.1 ASCII control characters................................79
C.2.2 Non-ASCII control characters............................79
C.3 Private use.................................................80