1
2
3
4
5
6
7Internet Engineering Task Force (IETF) J. Klensin
8Request for Comments: 5891 August 2010
9Obsoletes: 3490, 3491
10Updates: 3492
11Category: Standards Track
12ISSN: 2070-1721
13
14
15 Internationalized Domain Names in Applications (IDNA): Protocol
16
17Abstract
18
19 This document is the revised protocol definition for
20 Internationalized Domain Names (IDNs). The rationale for changes,
21 the relationship to the older specification, and important
22 terminology are provided in other documents. This document specifies
23 the protocol mechanism, called Internationalized Domain Names in
24 Applications (IDNA), for registering and looking up IDNs in a way
25 that does not require changes to the DNS itself. IDNA is only meant
26 for processing domain names, not free text.
27
28Status of This Memo
29
30 This is an Internet Standards Track document.
31
32 This document is a product of the Internet Engineering Task Force
33 (IETF). It represents the consensus of the IETF community. It has
34 received public review and has been approved for publication by the
35 Internet Engineering Steering Group (IESG). Further information on
36 Internet Standards is available in Section 2 of RFC 5741.
37
38 Information about the current status of this document, any errata,
39 and how to provide feedback on it may be obtained at
40 http://www.rfc-editor.org/info/rfc5891.
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58Klensin Standards Track [Page 1]
59
60RFC 5891 IDNA2008 Protocol August 2010
61
62
63Copyright Notice
64
65 Copyright (c) 2010 IETF Trust and the persons identified as the
66 document authors. All rights reserved.
67
68 This document is subject to BCP 78 and the IETF Trust's Legal
69 Provisions Relating to IETF Documents
70 (http://trustee.ietf.org/license-info) in effect on the date of
71 publication of this document. Please review these documents
72 carefully, as they describe your rights and restrictions with respect
73 to this document. Code Components extracted from this document must
74 include Simplified BSD License text as described in Section 4.e of
75 the Trust Legal Provisions and are provided without warranty as
76 described in the Simplified BSD License.
77
78 This document may contain material from IETF Documents or IETF
79 Contributions published or made publicly available before November
80 10, 2008. The person(s) controlling the copyright in some of this
81 material may not have granted the IETF Trust the right to allow
82 modifications of such material outside the IETF Standards Process.
83 Without obtaining an adequate license from the person(s) controlling
84 the copyright in such materials, this document may not be modified
85 outside the IETF Standards Process, and derivative works of it may
86 not be created outside the IETF Standards Process, except to format
87 it for publication as an RFC or to translate it into languages other
88 than English.
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114Klensin Standards Track [Page 2]
115
116RFC 5891 IDNA2008 Protocol August 2010
117
118
119Table of Contents
120
121 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
122 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
123 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
124 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
125 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
126 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
127 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6
128 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
129 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7
130 4.2. Permitted Character and Label Validation . . . . . . . . . 7
131 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7
132 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8
133 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8
134 4.2.4. Registration Validation Requirements . . . . . . . . . 9
135 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
136 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
137 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
138 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
139 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
140 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
141 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10
142 5.4. Validation and Character List Testing . . . . . . . . . . 11
143 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
144 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
145 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
146 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
147 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
148 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
149 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
150 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
151 10.2. Informative References . . . . . . . . . . . . . . . . . . 15
152 Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170Klensin Standards Track [Page 3]
171
172RFC 5891 IDNA2008 Protocol August 2010
173
174
1751. Introduction
176
177 This document supplies the protocol definition for Internationalized
178 Domain Names in Applications (IDNA), with the version specified here
179 known as IDNA2008. Essential definitions and terminology for
180 understanding this document and a road map of the collection of
181 documents that make up IDNA2008 appear in a separate Definitions
182 document [RFC5890]. Appendix A discusses the relationship between
183 this specification and the earlier version of IDNA (referred to here
184 as "IDNA2003"). The rationale for these changes, along with
185 considerable explanatory material and advice to zone administrators
186 who support IDNs, is provided in another document, known informally
187 in this series as the "Rationale document" [RFC5894].
188
189 IDNA works by allowing applications to use certain ASCII [ASCII]
190 string labels (beginning with a special prefix) to represent
191 non-ASCII name labels. Lower-layer protocols need not be aware of
192 this; therefore, IDNA does not change any infrastructure. In
193 particular, IDNA does not depend on any changes to DNS servers,
194 resolvers, or DNS protocol elements, because the ASCII name service
195 provided by the existing DNS can be used for IDNA.
196
197 IDNA applies only to a specific subset of DNS labels. The base DNS
198 standards [RFC1034] [RFC1035] and their various updates specify how
199 to combine labels into fully-qualified domain names and parse labels
200 out of those names.
201
202 This document describes two separate protocols, one for IDN
203 registration (Section 4) and one for IDN lookup (Section 5). These
204 two protocols share some terminology, reference data, and operations.
205
2062. Terminology
207
208 As mentioned above, terminology used as part of the definition of
209 IDNA appears in the Definitions document [RFC5890]. It is worth
210 noting that some of this terminology overlaps with, and is consistent
211 with, that used in Unicode or other character set standards and the
212 DNS. Readers of this document are assumed to be familiar with the
213 associated Definitions document and with the DNS-specific terminology
214 in RFC 1034 [RFC1034].
215
216 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
217 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
218 document are to be interpreted as described in BCP 14, RFC 2119
219 [RFC2119].
220
221
222
223
224
225
226Klensin Standards Track [Page 4]
227
228RFC 5891 IDNA2008 Protocol August 2010
229
230
2313. Requirements and Applicability
232
2333.1. Requirements
234
235 IDNA makes the following requirements:
236
237 1. Whenever a domain name is put into a domain name slot that is not
238 IDNA-aware (see Section 2.3.2.6 of the Definitions document
239 [RFC5890]), it MUST contain only ASCII characters (i.e., its
240 labels must be either A-labels or NR-LDH labels), unless the DNS
241 application is not subject to historical recommendations for
242 "hostname"-style names (see RFC 1034 [RFC1034] and
243 Section 3.2.1).
244
245 2. Labels MUST be compared using equivalent forms: either both
246 A-label forms or both U-label forms. Because A-labels and
247 U-labels can be transformed into each other without loss of
248 information, these comparisons are equivalent (however, in
249 practice, comparison of U-labels requires first verifying that
250 they actually are U-labels and not just Unicode strings). A pair
251 of A-labels MUST be compared as case-insensitive ASCII (as with
252 all comparisons of ASCII DNS labels). U-labels MUST be compared
253 as-is, without case folding or other intermediate steps. While
254 it is not necessary to validate labels in order to compare them,
255 successful comparison does not imply validity. In many cases,
256 not limited to comparison, validation may be important for other
257 reasons and SHOULD be performed.
258
259 3. Labels being registered MUST conform to the requirements of
260 Section 4. Labels being looked up and the lookup process MUST
261 conform to the requirements of Section 5.
262
2633.2. Applicability
264
265 IDNA applies to all domain names in all domain name slots in
266 protocols except where it is explicitly excluded. It does not apply
267 to domain name slots that do not use the LDH syntax rules as
268 described in the Definitions document [RFC5890].
269
270 Because it uses the DNS, IDNA applies to many protocols that were
271 specified before it was designed. IDNs occupying domain name slots
272 in those older protocols MUST be in A-label form until and unless
273 those protocols and their implementations are explicitly upgraded to
274 be aware of IDNs and to accept the U-label form. IDNs actually
275 appearing in DNS queries or responses MUST be A-labels.
276
277
278
279
280
281
282Klensin Standards Track [Page 5]
283
284RFC 5891 IDNA2008 Protocol August 2010
285
286
287 IDNA-aware protocols and implementations MAY accept U-labels,
288 A-labels, or both as those particular protocols specify. IDNA is not
289 defined for extended label types (see RFC 2671 [RFC2671], Section 3).
290
2913.2.1. DNS Resource Records
292
293 IDNA applies only to domain names in the NAME and RDATA fields of DNS
294 resource records whose CLASS is IN. See the DNS specification
295 [RFC1035] for precise definitions of these terms.
296
297 The application of IDNA to DNS resource records depends entirely on
298 the CLASS of the record, and not on the TYPE except as noted below.
299 This will remain true, even as new TYPEs are defined, unless a new
300 TYPE defines TYPE-specific rules. Special naming conventions for SRV
301 records (and "underscore labels" more generally) are incompatible
302 with IDNA coding as discussed in the Definitions document [RFC5890],
303 especially Section 2.3.2.3. Of course, underscore labels may be part
304 of a domain that uses IDN labels at higher levels in the tree.
305
3063.2.2. Non-Domain-Name Data Types Stored in the DNS
307
308 Although IDNA enables the representation of non-ASCII characters in
309 domain names, that does not imply that IDNA enables the
310 representation of non-ASCII characters in other data types that are
311 stored in domain names, specifically in the RDATA field for types
312 that have structured RDATA format. For example, an email address
313 local part is stored in a domain name in the RNAME field as part of
314 the RDATA of an SOA record (e.g., hostmaster@example.com would be
315 represented as hostmaster.example.com). IDNA does not update the
316 existing email standards, which allow only ASCII characters in local
317 parts. Even though work is in progress to define
318 internationalization for email addresses [RFC4952], changes to the
319 email address part of the SOA RDATA would require action in, or
320 updates to, other standards, specifically those that specify the
321 format of the SOA RR.
322
3234. Registration Protocol
324
325 This section defines the model for registering an IDN. The model is
326 implementation independent; any sequence of steps that produces
327 exactly the same result for all labels is considered a valid
328 implementation.
329
330 Note that, while the registration (this section) and lookup protocols
331 (Section 5) are very similar in most respects, they are not
332 identical, and implementers should carefully follow the steps
333 described in this specification.
334
335
336
337
338Klensin Standards Track [Page 6]
339
340RFC 5891 IDNA2008 Protocol August 2010
341
342
3434.1. Input to IDNA Registration
344
345 Registration processes, especially processing by entities (often
346 called "registrars") who deal with registrants before the request
347 actually reaches the zone manager ("registry") are outside the scope
348 of this definition and may differ significantly depending on local
349 needs. By the time a string enters the IDNA registration process as
350 described in this specification, it MUST be in Unicode and in
351 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
352 zone files ("registries") MUST accept only the exact string for which
353 registration is requested, free of any mappings or local adjustments.
354 They MAY accept that input in any of three forms:
355
356 1. As a pair of A-label and U-label.
357
358 2. As an A-label only.
359
360 3. As a U-label only.
361
362 The first two of these forms are RECOMMENDED because the use of
363 A-labels avoids any possibility of ambiguity. The first is normally
364 preferred over the second because it permits further verification of
365 user intent (see Section 4.2.1).
366
3674.2. Permitted Character and Label Validation
368
3694.2.1. Input Format
370
371 If both the U-label and A-label forms are available, the registry
372 MUST ensure that the A-label form is in lowercase, perform a
373 conversion to a U-label, perform the steps and tests described below
374 on that U-label, and then verify that the A-label produced by the
375 step in Section 4.4 matches the one provided as input. In addition,
376 the U-label that was provided as input and the one obtained by
377 conversion of the A-label MUST match exactly. If, for some reason,
378 these tests fail, the registration MUST be rejected.
379
380 If only an A-label was provided and the conversion to a U-label is
381 not performed, the registry MUST still verify that the A-label is
382 superficially valid, i.e., that it does not violate any of the rules
383 of Punycode encoding [RFC3492] such as the prohibition on trailing
384 hyphen-minus, the requirement that all characters be ASCII, and so
385 on. Strings that appear to be A-labels (e.g., they start with
386 "xn--") and strings that are supplied to the registry in a context
387 reserved for A-labels (such as a field in a form to be filled out),
388 but that are not valid A-labels as described in this paragraph, MUST
389 NOT be placed in DNS zones that support IDNA.
390
391
392
393
394Klensin Standards Track [Page 7]
395
396RFC 5891 IDNA2008 Protocol August 2010
397
398
399 If only an A-label is provided, the conversion to a U-label is not
400 performed, but the superficial tests described in the previous
401 paragraph are performed, registration procedures MAY, and usually
402 will, bypass the tests and actions in the balance of Section 4.2 and
403 in Sections 4.3 and 4.4.
404
4054.2.2. Rejection of Characters That Are Not Permitted
406
407 The candidate Unicode string MUST NOT contain characters that appear
408 in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables
409 document [RFC5892].
410
4114.2.3. Label Validation
412
413 The proposed label (in the form of a Unicode string, i.e., a string
414 that at least superficially appears to be a U-label) is then examined
415 using tests that require examination of more than one character.
416 Character order is considered to be the on-the-wire order. That
417 order may not be the same as the display order.
418
4194.2.3.1. Hyphen Restrictions
420
421 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
422 the third and fourth character positions and MUST NOT start or end
423 with a "-" (hyphen).
424
4254.2.3.2. Leading Combining Marks
426
427 The Unicode string MUST NOT begin with a combining mark or combining
428 character (see The Unicode Standard, Section 2.11 [Unicode] for an
429 exact definition).
430
4314.2.3.3. Contextual Rules
432
433 The Unicode string MUST NOT contain any characters whose validity is
434 context-dependent, unless the validity is positively confirmed by a
435 contextual rule. To check this, each code point identified as
436 CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
437 non-null rule. If such a code point is missing a rule, the label is
438 invalid. If the rule exists but the result of applying the rule is
439 negative or inconclusive, the proposed label is invalid.
440
4414.2.3.4. Labels Containing Characters Written Right to Left
442
443 If the proposed label contains any characters from scripts that are
444 written from right to left, it MUST meet the Bidi criteria [RFC5893].
445
446
447
448
449
450Klensin Standards Track [Page 8]
451
452RFC 5891 IDNA2008 Protocol August 2010
453
454
4554.2.4. Registration Validation Requirements
456
457 Strings that contain at least one non-ASCII character, have been
458 produced by the steps above, whose contents pass all of the tests in
459 Section 4.2.3, and are 63 or fewer characters long in
460 ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.
461
462 To summarize, tests are made in Section 4.2 for invalid characters,
463 invalid combinations of characters, for labels that are invalid even
464 if the characters they contain are valid individually, and for labels
465 that do not conform to the restrictions for strings containing
466 right-to-left characters.
467
4684.3. Registry Restrictions
469
470 In addition to the rules and tests above, there are many reasons why
471 a registry could reject a label. Registries at all levels of the
472 DNS, not just the top level, are expected to establish policies about
473 label registrations. Policies are likely to be informed by the local
474 languages and the scripts that are used to write them and may depend
475 on many factors including what characters are in the label (for
476 example, a label may be rejected based on other labels already
477 registered). See the Rationale document [RFC5894], Section 3.2, for
478 further discussion and recommendations about registry policies.
479
480 The string produced by the steps in Section 4.2 is checked and
481 processed as appropriate to local registry restrictions. Application
482 of those registry restrictions may result in the rejection of some
483 labels or the application of special restrictions to others.
484
4854.4. Punycode Conversion
486
487 The resulting U-label is converted to an A-label (defined in Section
488 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the
489 encoding of the U-label according to the Punycode algorithm [RFC3492]
490 with the ACE prefix "xn--" added at the beginning of the string. The
491 resulting string must, of course, conform to the length limits
492 imposed by the DNS. This document does not update or alter the
493 Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does
494 make a non-normative reference to the information about the value and
495 construction of the ACE prefix that appears in RFC 3490 or Nameprep
496 [RFC3491]. For consistency and reader convenience, IDNA2008
497 effectively updates that reference to point to this document. That
498 change does not alter the prefix itself. The prefix, "xn--", is the
499 same in both sets of documents.
500
501
502
503
504
505
506Klensin Standards Track [Page 9]
507
508RFC 5891 IDNA2008 Protocol August 2010
509
510
511 With the exception of the maximum string length test on Punycode
512 output, the failure conditions identified in the Punycode encoding
513 procedure cannot occur if the input is a U-label as determined by the
514 steps in Sections 4.1 through 4.3 above.
515
5164.5. Insertion in the Zone
517
518 The label is registered in the DNS by inserting the A-label into a
519 zone.
520
5215. Domain Name Lookup Protocol
522
523 Lookup is different from registration and different tests are applied
524 on the client. Although some validity checks are necessary to avoid
525 serious problems with the protocol, the lookup-side tests are more
526 permissive and rely on the assumption that names that are present in
527 the DNS are valid. That assumption is, however, a weak one because
528 the presence of wildcards in the DNS might cause a string that is not
529 actually registered in the DNS to be successfully looked up.
530
5315.1. Label String Input
532
533 The user supplies a string in the local character set, for example,
534 by typing it, clicking on it, or copying and pasting it from a
535 resource identifier, e.g., a Uniform Resource Identifier (URI)
536 [RFC3986] or an Internationalized Resource Identifier (IRI)
537 [RFC3987], from which the domain name is extracted. Alternately,
538 some process not directly involving the user may read the string from
539 a file or obtain it in some other way. Processing in this step and
540 the one specified in Section 5.2 are local matters, to be
541 accomplished prior to actual invocation of IDNA.
542
5435.2. Conversion to Unicode
544
545 The string is converted from the local character set into Unicode, if
546 it is not already in Unicode. Depending on local needs, this
547 conversion may involve mapping some characters into other characters
548 as well as coding conversions. Those issues are discussed in the
549 mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the
550 Rationale document [RFC5894] and in the separate Mapping document
551 [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form.
552
5535.3. A-label Input
554
555 If the input to this procedure appears to be an A-label (i.e., it
556 starts in "xn--", interpreted case-insensitively), the lookup
557 application MAY attempt to convert it to a U-label, first ensuring
558 that the A-label is entirely in lowercase (converting it to lowercase
559
560
561
562Klensin Standards Track [Page 10]
563
564RFC 5891 IDNA2008 Protocol August 2010
565
566
567 if necessary), and apply the tests of Section 5.4 and the conversion
568 of Section 5.5 to that form. If the label is converted to Unicode
569 (i.e., to U-label form) using the Punycode decoding algorithm, then
570 the processing specified in those two sections MUST be performed, and
571 the label MUST be rejected if the resulting label is not identical to
572 the original. See Section 8.1 of the Rationale document [RFC5894]
573 for additional discussion on this topic.
574
575 Conversion from the A-label and testing that the result is a U-label
576 SHOULD be performed if the domain name will later be presented to the
577 user in native character form (this requires that the lookup
578 application be IDNA-aware). If those steps are not performed, the
579 lookup process SHOULD at least test to determine that the string is
580 actually an A-label, examining it for the invalid formats specified
581 in the Punycode decoding specification. Applications that are not
582 IDNA-aware will obviously omit that testing; others MAY treat the
583 string as opaque to avoid the additional processing at the expense of
584 providing less protection and information to users.
585
5865.4. Validation and Character List Testing
587
588 As with the registration procedure described in Section 4, the
589 Unicode string is checked to verify that all characters that appear
590 in it are valid as input to IDNA lookup processing. As discussed
591 above and in the Rationale document [RFC5894], the lookup check is
592 more liberal than the registration one. Labels that have not been
593 fully evaluated for conformance to the applicable rules are referred
594 to as "putative" labels as discussed in Section 2.3.2.1 of the
595 Definitions document [RFC5890]. Putative U-labels with any of the
596 following characteristics MUST be rejected prior to DNS lookup:
597
598 o Labels that are not in NFC [Unicode-UAX15].
599
600 o Labels containing "--" (two consecutive hyphens) in the third and
601 fourth character positions.
602
603 o Labels whose first character is a combining mark (see The Unicode
604 Standard, Section 2.11 [Unicode]).
605
606 o Labels containing prohibited code points, i.e., those that are
607 assigned to the "DISALLOWED" category of the Tables document
608 [RFC5892].
609
610 o Labels containing code points that are identified in the Tables
611 document as "CONTEXTJ", i.e., requiring exceptional contextual
612 rule processing on lookup, but that do not conform to those rules.
613 Note that this implies that a rule must be defined, not null: a
614
615
616
617
618Klensin Standards Track [Page 11]
619
620RFC 5891 IDNA2008 Protocol August 2010
621
622
623 character that requires a contextual rule but for which the rule
624 is null is treated in this step as having failed to conform to the
625 rule.
626
627 o Labels containing code points that are identified in the Tables
628 document as "CONTEXTO", but for which no such rule appears in the
629 table of rules. Applications resolving DNS names or carrying out
630 equivalent operations are not required to test contextual rules
631 for "CONTEXTO" characters, only to verify that a rule is defined
632 (although they MAY make such tests to provide better protection or
633 give better information to the user).
634
635 o Labels containing code points that are unassigned in the version
636 of Unicode being used by the application, i.e., in the UNASSIGNED
637 category of the Tables document.
638
639 This requirement means that the application must use a list of
640 unassigned characters that is matched to the version of Unicode
641 that is being used for the other requirements in this section. It
642 is not required that the application know which version of Unicode
643 is being used; that information might be part of the operating
644 environment in which the application is running.
645
646 In addition, the application SHOULD apply the following test.
647
648 o Verification that the string is compliant with the requirements
649 for right-to-left characters specified in the Bidi document
650 [RFC5893].
651
652 This test may be omitted in special circumstances, such as when the
653 lookup application knows that the conditions are enforced elsewhere,
654 because an attempt to look up and resolve such strings will almost
655 certainly lead to a DNS lookup failure except when wildcards are
656 present in the zone. However, applying the test is likely to give
657 much better information about the reason for a lookup failure --
658 information that may be usefully passed to the user when that is
659 feasible -- than DNS resolution failure information alone.
660
661 For all other strings, the lookup application MUST rely on the
662 presence or absence of labels in the DNS to determine the validity of
663 those labels and the validity of the characters they contain. If
664 they are registered, they are presumed to be valid; if they are not,
665 their possible validity is not relevant. While a lookup application
666 may reasonably issue warnings about strings it believes may be
667 problematic, applications that decline to process a string that
668 conforms to the rules above (i.e., does not look it up in the DNS)
669 are not in conformance with this protocol.
670
671
672
673
674Klensin Standards Track [Page 12]
675
676RFC 5891 IDNA2008 Protocol August 2010
677
678
6795.5. Punycode Conversion
680
681 The string that has now been validated for lookup is converted to ACE
682 form by applying the Punycode algorithm to the string and then adding
683 the ACE prefix ("xn--").
684
6855.6. DNS Name Resolution
686
687 The A-label resulting from the conversion in Section 5.5 or supplied
688 directly (see Section 5.3) is combined with other labels as needed to
689 form a fully-qualified domain name that is then looked up in the DNS,
690 using normal DNS resolver procedures. The lookup can obviously
691 either succeed (returning information) or fail.
692
6936. Security Considerations
694
695 Security Considerations for this version of IDNA are described in the
696 Definitions document [RFC5890], except for the special issues
697 associated with right-to-left scripts and characters. The latter are
698 discussed in the Bidi document [RFC5893].
699
700 In order to avoid intentional or accidental attacks from labels that
701 might be confused with others, special problems in rendering, and so
702 on, the IDNA model requires that registries exercise care and
703 thoughtfulness about what labels they choose to permit. That issue
704 is discussed in Section 4.3 of this document which, in turn, points
705 to a somewhat more extensive discussion in the Rationale document
706 [RFC5894].
707
7087. IANA Considerations
709
710 IANA actions for this version of IDNA are specified in the Tables
711 document [RFC5892] and discussed informally in the Rationale document
712 [RFC5894]. The components of IDNA described in this document do not
713 require any IANA actions.
714
7158. Contributors
716
717 While the listed editor held the pen, the original versions of this
718 document represent the joint work and conclusions of an ad hoc design
719 team consisting of the editor and, in alphabetic order, Harald
720 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
721 draws significantly on the original version of IDNA [RFC3490] both
722 conceptually and for specific text. This second-generation version
723 would not have been possible without the work that went into that
724 first version and especially the contributions of its authors Patrik
725 Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was
726
727
728
729
730Klensin Standards Track [Page 13]
731
732RFC 5891 IDNA2008 Protocol August 2010
733
734
735 actively involved in the creation of this version, Hoffman and
736 Costello were not and should not be held responsible for any errors
737 or omissions.
738
7399. Acknowledgments
740
741 This revision to IDNA would have been impossible without the
742 accumulated experience since RFC 3490 was published and resulting
743 comments and complaints of many people in the IETF, ICANN, and other
744 communities (too many people to list here). Nor would it have been
745 possible without RFC 3490 itself and the efforts of the Working Group
746 that defined it. Those people whose contributions are acknowledged
747 in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]
748 were particularly important.
749
750 Specific textual changes were incorporated into this document after
751 suggestions from the other contributors, Stephane Bortzmeyer, Vint
752 Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,
753 Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken
754 Whistler, Chris Wright, and other WG participants and reviewers
755 including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter
756 Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific
757 errors and recommended corrections. Special thanks are due to Paul
758 Hoffman for permission to extract material to form the basis for
759 Appendix A from a draft document that he prepared.
760
76110. References
762
76310.1. Normative References
764
765 [RFC1034] Mockapetris, P., "Domain names - concepts and
766 facilities", STD 13, RFC 1034, November 1987.
767
768 [RFC1035] Mockapetris, P., "Domain names - implementation and
769 specification", STD 13, RFC 1035, November 1987.
770
771 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
772 Requirement Levels", BCP 14, RFC 2119, March 1997.
773
774 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
775 Unicode for Internationalized Domain Names in
776 Applications (IDNA)", RFC 3492, March 2003.
777
778 [RFC5890] Klensin, J., "Internationalized Domain Names for
779 Applications (IDNA): Definitions and Document
780 Framework", RFC 5890, August 2010.
781
782
783
784
785
786Klensin Standards Track [Page 14]
787
788RFC 5891 IDNA2008 Protocol August 2010
789
790
791 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
792 Internationalized Domain Names for Applications (IDNA)",
793 RFC 5892, August 2010.
794
795 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
796 for Internationalized Domain Names for Applications
797 (IDNA)", RFC 5893, August 2010.
798
799 [Unicode-UAX15]
800 The Unicode Consortium, "Unicode Standard Annex #15:
801 Unicode Normalization Forms", September 2009,
802 <http://www.unicode.org/reports/tr15/>.
803
80410.2. Informative References
805
806 [ASCII] American National Standards Institute (formerly United
807 States of America Standards Institute), "USA Code for
808 Information Interchange", ANSI X3.4-1968, 1968. ANSI
809 X3.4-1968 has been replaced by newer versions with
810 slight modifications, but the 1968 version remains
811 definitive for the Internet.
812
813 [IDNA2008-Mapping]
814 Resnick, P. and P. Hoffman, "Mapping Characters in
815 Internationalized Domain Names for Applications (IDNA)",
816 Work in Progress, April 2010.
817
818 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
819 RFC 2671, August 1999.
820
821 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
822 "Internationalizing Domain Names in Applications
823 (IDNA)", RFC 3490, March 2003.
824
825 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
826 Profile for Internationalized Domain Names (IDN)",
827 RFC 3491, March 2003.
828
829 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
830 Resource Identifier (URI): Generic Syntax", STD 66,
831 RFC 3986, January 2005.
832
833 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
834 Identifiers (IRIs)", RFC 3987, January 2005.
835
836 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
837 and Recommendations for Internationalized Domain Names
838 (IDNs)", RFC 4690, September 2006.
839
840
841
842Klensin Standards Track [Page 15]
843
844RFC 5891 IDNA2008 Protocol August 2010
845
846
847 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
848 Internationalized Email", RFC 4952, July 2007.
849
850 [RFC5894] Klensin, J., "Internationalized Domain Names for
851 Applications (IDNA): Background, Explanation, and
852 Rationale", RFC 5894, August 2010.
853
854 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
855 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
856 0-321-48091-0. This printed reference has now been
857 updated online to reflect additional code points. For
858 code points, the reference at the time this document was
859 published is to Unicode 5.2.
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898Klensin Standards Track [Page 16]
899
900RFC 5891 IDNA2008 Protocol August 2010
901
902
903Appendix A. Summary of Major Changes from IDNA2003
904
905 1. Update base character set from Unicode 3.2 to Unicode version
906 agnostic.
907
908 2. Separate the definitions for the "registration" and "lookup"
909 activities.
910
911 3. Disallow symbol and punctuation characters except where special
912 exceptions are necessary.
913
914 4. Remove the mapping and normalization steps from the protocol and
915 have them, instead, done by the applications themselves,
916 possibly in a local fashion, before invoking the protocol.
917
918 5. Change the way that the protocol specifies which characters are
919 allowed in labels from "humans decide what the table of code
920 points contains" to "decision about code points are based on
921 Unicode properties plus a small exclusion list created by
922 humans".
923
924 6. Introduce the new concept of characters that can be used only in
925 specific contexts.
926
927 7. Allow typical words and names in languages such as Dhivehi and
928 Yiddish to be expressed.
929
930 8. Make bidirectional domain names (delimited strings of labels,
931 not just labels standing on their own) display in a less
932 surprising fashion, whether they appear in obvious domain name
933 contexts or as part of running text in paragraphs.
934
935 9. Remove the dot separator from the mandatory part of the
936 protocol.
937
938 10. Make some currently valid labels that are not actually IDNA
939 labels invalid.
940
941Author's Address
942
943 John C Klensin
944 1770 Massachusetts Ave, Ste 322
945 Cambridge, MA 02140
946 USA
947
948 Phone: +1 617 245 1457
949 EMail: john+ietf@jck.com
950
951
952
953
954Klensin Standards Track [Page 17]
955
956