1
2
3
4
5
6
7Network Working Group A. Phillips, Ed.
8Request for Comments: 5646 Lab126
9BCP: 47 M. Davis, Ed.
10Obsoletes: 4646 Google
11Category: Best Current Practice September 2009
12
13
14 Tags for Identifying Languages
15
16Abstract
17
18 This document describes the structure, content, construction, and
19 semantics of language tags for use in cases where it is desirable to
20 indicate the language used in an information object. It also
21 describes how to register values for use in language tags and the
22 creation of user-defined extensions for private interchange.
23
24Status of This Memo
25
26 This document specifies an Internet Best Current Practices for the
27 Internet Community, and requests discussion and suggestions for
28 improvements. Distribution of this memo is unlimited.
29
30Copyright Notice
31
32 Copyright (c) 2009 IETF Trust and the persons identified as the
33 document authors. All rights reserved.
34
35 This document is subject to BCP 78 and the IETF Trust's Legal
36 Provisions Relating to IETF Documents in effect on the date of
37 publication of this document (http://trustee.ietf.org/license-info).
38 Please review these documents carefully, as they describe your rights
39 and restrictions with respect to this document.
40
41 This document may contain material from IETF Documents or IETF
42 Contributions published or made publicly available before November
43 10, 2008. The person(s) controlling the copyright in some of this
44 material may not have granted the IETF Trust the right to allow
45 modifications of such material outside the IETF Standards Process.
46 Without obtaining an adequate license from the person(s) controlling
47 the copyright in such materials, this document may not be modified
48 outside the IETF Standards Process, and derivative works of it may
49 not be created outside the IETF Standards Process, except to format
50 it for publication as an RFC or to translate it into languages other
51 than English.
52
53
54
55
56
57
58Phillips & Davis Best Current Practice [Page 1]
59
60RFC 5646 Language Tags September 2009
61
62
63Table of Contents
64
65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
66 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4
67 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4
68 2.1.1. Formatting of Language Tags . . . . . . . . . . . . . 6
69 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8
70 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . . 9
71 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11
72 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12
73 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13
74 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15
75 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16
76 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 18
77 2.2.8. Grandfathered and Redundant Registrations . . . . . . 18
78 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 19
79 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 21
80 3.1. Format of the IANA Language Subtag Registry . . . . . . . 21
81 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 21
82 3.1.2. Record and Field Definitions . . . . . . . . . . . . . 23
83 3.1.3. Type Field . . . . . . . . . . . . . . . . . . . . . . 26
84 3.1.4. Subtag and Tag Fields . . . . . . . . . . . . . . . . 26
85 3.1.5. Description Field . . . . . . . . . . . . . . . . . . 26
86 3.1.6. Deprecated Field . . . . . . . . . . . . . . . . . . . 28
87 3.1.7. Preferred-Value Field . . . . . . . . . . . . . . . . 28
88 3.1.8. Prefix Field . . . . . . . . . . . . . . . . . . . . . 31
89 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 32
90 3.1.10. Macrolanguage Field . . . . . . . . . . . . . . . . . 32
91 3.1.11. Scope Field . . . . . . . . . . . . . . . . . . . . . 33
92 3.1.12. Comments Field . . . . . . . . . . . . . . . . . . . . 34
93 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 35
94 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 35
95 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 36
96 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 41
97 3.6. Possibilities for Registration . . . . . . . . . . . . . . 46
98 3.7. Extensions and the Extensions Registry . . . . . . . . . . 49
99 3.8. Update of the Language Subtag Registry . . . . . . . . . . 52
100 3.9. Applicability of the Subtag Registry . . . . . . . . . . . 52
101 4. Formation and Processing of Language Tags . . . . . . . . . . 53
102 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 53
103 4.1.1. Tagging Encompassed Languages . . . . . . . . . . . . 58
104 4.1.2. Using Extended Language Subtags . . . . . . . . . . . 59
105 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 61
106 4.3. Lists of Languages . . . . . . . . . . . . . . . . . . . . 63
107 4.4. Length Considerations . . . . . . . . . . . . . . . . . . 63
108 4.4.1. Working with Limited Buffer Sizes . . . . . . . . . . 64
109 4.4.2. Truncation of Language Tags . . . . . . . . . . . . . 65
110 4.5. Canonicalization of Language Tags . . . . . . . . . . . . 66
111
112
113
114Phillips & Davis Best Current Practice [Page 2]
115
116RFC 5646 Language Tags September 2009
117
118
119 4.6. Considerations for Private Use Subtags . . . . . . . . . . 68
120 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 69
121 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 69
122 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 71
123 6. Security Considerations . . . . . . . . . . . . . . . . . . . 71
124 7. Character Set Considerations . . . . . . . . . . . . . . . . . 72
125 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 73
126 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 76
127 9.1. Normative References . . . . . . . . . . . . . . . . . . . 76
128 9.2. Informative References . . . . . . . . . . . . . . . . . . 78
129 Appendix A. Examples of Language Tags (Informative) . . . . . . . 80
130 Appendix B. Examples of Registration Forms . . . . . . . . . . . 82
131 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 83
132
1331. Introduction
134
135 Human beings on our planet have, past and present, used a number of
136 languages. There are many reasons why one would want to identify the
137 language used when presenting or requesting information.
138
139 The language of an information item or a user's language preferences
140 often need to be identified so that appropriate processing can be
141 applied. For example, the user's language preferences in a Web
142 browser can be used to select Web pages appropriately. Language
143 information can also be used to select among tools (such as
144 dictionaries) to assist in the processing or understanding of content
145 in different languages. Knowledge about the particular language used
146 by some piece of information content might be useful or even required
147 by some types of processing, for example, spell-checking, computer-
148 synthesized speech, Braille transcription, or high-quality print
149 renderings.
150
151 One means of indicating the language used is by labeling the
152 information content with an identifier or "tag". These tags can also
153 be used to specify the user's preferences when selecting information
154 content or to label additional attributes of content and associated
155 resources.
156
157 Sometimes language tags are used to indicate additional language
158 attributes of content. For example, indicating specific information
159 about the dialect, writing system, or orthography used in a document
160 or resource may enable the user to obtain information in a form that
161 they can understand, or it can be important in processing or
162 rendering the given content into an appropriate form or style.
163
164 This document specifies a particular identifier mechanism (the
165 language tag) and a registration function for values to be used to
166
167
168
169
170Phillips & Davis Best Current Practice [Page 3]
171
172RFC 5646 Language Tags September 2009
173
174
175 form tags. It also defines a mechanism for private use values and
176 future extensions.
177
178 This document replaces [RFC4646] (which obsoleted [RFC3066] which, in
179 turn, replaced [RFC1766]). This document, in combination with
180 [RFC4647], comprises BCP 47. For a list of changes in this document,
181 see Section 8.
182
183 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
184 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
185 document are to be interpreted as described in [RFC2119].
186
1872. The Language Tag
188
189 Language tags are used to help identify languages, whether spoken,
190 written, signed, or otherwise signaled, for the purpose of
191 communication. This includes constructed and artificial languages
192 but excludes languages not intended primarily for human
193 communication, such as programming languages.
194
1952.1. Syntax
196
197 A language tag is composed from a sequence of one or more "subtags",
198 each of which refines or narrows the range of language identified by
199 the overall tag. Subtags, in turn, are a sequence of alphanumeric
200 characters (letters and digits), distinguished and separated from
201 other subtags in a tag by a hyphen ("-", [Unicode] U+002D).
202
203 There are different types of subtag, each of which is distinguished
204 by length, position in the tag, and content: each subtag's type can
205 be recognized solely by these features. This makes it possible to
206 extract and assign some semantic information to the subtags, even if
207 the specific subtag values are not recognized. Thus, a language tag
208 processor need not have a list of valid tags or subtags (that is, a
209 copy of some version of the IANA Language Subtag Registry) in order
210 to perform common searching and matching operations. The only
211 exceptions to this ability to infer meaning from subtag structure are
212 the grandfathered tags listed in the productions 'regular' and
213 'irregular' below. These tags were registered under [RFC3066] and
214 are a fixed list that can never change.
215
216 The syntax of the language tag in ABNF [RFC5234] is:
217
218 Language-Tag = langtag ; normal language tags 3282:86 todo: ../imapserver/fetch.go:1046
219 / privateuse ; private use tag
220 / grandfathered ; grandfathered tags
221
222
223
224
225
226Phillips & Davis Best Current Practice [Page 4]
227
228RFC 5646 Language Tags September 2009
229
230
231 langtag = language
232 ["-" script]
233 ["-" region]
234 *("-" variant)
235 *("-" extension)
236 ["-" privateuse]
237
238 language = 2*3ALPHA ; shortest ISO 639 code
239 ["-" extlang] ; sometimes followed by
240 ; extended language subtags
241 / 4ALPHA ; or reserved for future use
242 / 5*8ALPHA ; or registered language subtag
243
244 extlang = 3ALPHA ; selected ISO 639 codes
245 *2("-" 3ALPHA) ; permanently reserved
246
247 script = 4ALPHA ; ISO 15924 code
248
249 region = 2ALPHA ; ISO 3166-1 code
250 / 3DIGIT ; UN M.49 code
251
252 variant = 5*8alphanum ; registered variants
253 / (DIGIT 3alphanum)
254
255 extension = singleton 1*("-" (2*8alphanum))
256
257 ; Single alphanumerics
258 ; "x" reserved for private use
259 singleton = DIGIT ; 0 - 9
260 / %x41-57 ; A - W
261 / %x59-5A ; Y - Z
262 / %x61-77 ; a - w
263 / %x79-7A ; y - z
264
265 privateuse = "x" 1*("-" (1*8alphanum))
266
267 grandfathered = irregular ; non-redundant tags registered
268 / regular ; during the RFC 3066 era
269
270 irregular = "en-GB-oed" ; irregular tags do not match
271 / "i-ami" ; the 'langtag' production and
272 / "i-bnn" ; would not otherwise be
273 / "i-default" ; considered 'well-formed'
274 / "i-enochian" ; These tags are all valid,
275 / "i-hak" ; but most are deprecated
276 / "i-klingon" ; in favor of more modern
277 / "i-lux" ; subtags or subtag
278 / "i-mingo" ; combination
279
280
281
282Phillips & Davis Best Current Practice [Page 5]
283
284RFC 5646 Language Tags September 2009
285
286
287 / "i-navajo"
288 / "i-pwn"
289 / "i-tao"
290 / "i-tay"
291 / "i-tsu"
292 / "sgn-BE-FR"
293 / "sgn-BE-NL"
294 / "sgn-CH-DE"
295
296 regular = "art-lojban" ; these tags match the 'langtag'
297 / "cel-gaulish" ; production, but their subtags
298 / "no-bok" ; are not extended language
299 / "no-nyn" ; or variant subtags: their meaning
300 / "zh-guoyu" ; is defined by their registration
301 / "zh-hakka" ; and all of these are deprecated
302 / "zh-min" ; in favor of a more modern
303 / "zh-min-nan" ; subtag or sequence of subtags
304 / "zh-xiang"
305
306 alphanum = (ALPHA / DIGIT) ; letters and numbers
307
308 Figure 1: Language Tag ABNF
309
310 For examples of language tags, see Appendix A.
311
312 All subtags have a maximum length of eight characters. Whitespace is
313 not permitted in a language tag. There is a subtlety in the ABNF
314 production 'variant': a variant starting with a digit has a minimum
315 length of four characters, while those starting with a letter have a
316 minimum length of five characters.
317
318 Although [RFC5234] refers to octets, the language tags described in
319 this document are sequences of characters from the US-ASCII [ISO646]
320 repertoire. Language tags MAY be used in documents and applications
321 that use other encodings, so long as these encompass the relevant
322 part of the US-ASCII repertoire. An example of this would be an XML
323 document that uses the UTF-16LE [RFC2781] encoding of [Unicode].
324
3252.1.1. Formatting of Language Tags
326
327 At all times, language tags and their subtags, including private use
328 and extensions, are to be treated as case insensitive: there exist
329 conventions for the capitalization of some of the subtags, but these
330 MUST NOT be taken to carry meaning.
331
332 Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-
333 cYrL-Mn" (or any other combination), and each of these variations
334
335
336
337
338Phillips & Davis Best Current Practice [Page 6]
339
340RFC 5646 Language Tags September 2009
341
342
343 conveys the same meaning: Mongolian written in the Cyrillic script as
344 used in Mongolia.
345
346 The ABNF syntax also does not distinguish between upper- and
347 lowercase: the uppercase US-ASCII letters in the range 'A' through
348 'Z' are always considered equivalent and mapped directly to their US-
349 ASCII lowercase equivalents in the range 'a' through 'z'. So the tag
350 "I-AMI" is considered equivalent to that value "i-ami" in the
351 'irregular' production.
352
353 Although case distinctions do not carry meaning in language tags,
354 consistent formatting and presentation of language tags will aid
355 users. The format of subtags in the registry is RECOMMENDED as the
356 form to use in language tags. This format generally corresponds to
357 the common conventions for the various ISO standards from which the
358 subtags are derived.
359
360 These conventions include:
361
362 o [ISO639-1] recommends that language codes be written in lowercase
363 ('mn' Mongolian).
364
365 o [ISO15924] recommends that script codes use lowercase with the
366 initial letter capitalized ('Cyrl' Cyrillic).
367
368 o [ISO3166-1] recommends that country codes be capitalized ('MN'
369 Mongolia).
370
371 An implementation can reproduce this format without accessing the
372 registry as follows. All subtags, including extension and private
373 use subtags, use lowercase letters with two exceptions: two-letter
374 and four-letter subtags that neither appear at the start of the tag
375 nor occur after singletons. Such two-letter subtags are all
376 uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four-
377 letter subtags are titlecase (as in the tag "az-Latn-x-latn").
378
379 Note: Case folding of ASCII letters in certain locales, unless
380 carefully handled, sometimes produces non-ASCII character values.
381 The Unicode Character Database file "SpecialCasing.txt"
382 [SpecialCasing] defines the specific cases that are known to cause
383 problems with this. In particular, the letter 'i' (U+0069) in
384 Turkish and Azerbaijani is uppercased to U+0130 (LATIN CAPITAL LETTER
385 I WITH DOT ABOVE). Implementers SHOULD specify a locale-neutral
386 casing operation to ensure that case folding of subtags does not
387 produce this value, which is illegal in language tags. For example,
388 if one were to uppercase the region subtag 'in' using Turkish locale
389 rules, the sequence U+0130 U+004E would result, instead of the
390 expected 'IN'.
391
392
393
394Phillips & Davis Best Current Practice [Page 7]
395
396RFC 5646 Language Tags September 2009
397
398
3992.2. Language Subtag Sources and Interpretation
400
401 The namespace of language tags and their subtags is administered by
402 the Internet Assigned Numbers Authority (IANA) according to the rules
403 in Section 5 of this document. The Language Subtag Registry
404 maintained by IANA is the source for valid subtags: other standards
405 referenced in this section provide the source material for that
406 registry.
407
408 Terminology used in this document:
409
410 o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or
411 "az-Arab-IR". Examples of tags in this document are enclosed in
412 double-quotes ("en-US").
413
414 o "Subtag" refers to a specific section of a tag, delimited by a
415 hyphen, such as the subtags 'zh', 'Hant', and 'CN' in the tag "zh-
416 Hant-CN". Examples of subtags in this document are enclosed in
417 single quotes ('Hant').
418
419 o "Code" refers to values defined in external standards (and that
420 are used as subtags in this document). For example, 'Hant' is an
421 [ISO15924] script code that was used to define the 'Hant' script
422 subtag for use in a language tag. Examples of codes in this
423 document are enclosed in single quotes ('en', 'Hant').
424
425 Language tags are designed so that each subtag type has unique length
426 and content restrictions. These make identification of the subtag's
427 type possible, even if the content of the subtag itself is
428 unrecognized. This allows tags to be parsed and processed without
429 reference to the latest version of the underlying standards or the
430 IANA registry and makes the associated exception handling when
431 parsing tags simpler.
432
433 Some of the subtags in the IANA registry do not come from an
434 underlying standard. These can only appear in specific positions in
435 a tag: they can only occur as primary language subtags or as variant
436 subtags.
437
438 Sequences of private use and extension subtags MUST occur at the end
439 of the sequence of subtags and MUST NOT be interspersed with subtags
440 defined elsewhere in this document. These sequences are introduced
441 by single-character subtags, which are reserved as follows:
442
443 o The single-letter subtag 'x' introduces a sequence of private use
444 subtags. The interpretation of any private use subtag is defined
445
446
447
448
449
450Phillips & Davis Best Current Practice [Page 8]
451
452RFC 5646 Language Tags September 2009
453
454
455 solely by private agreement and is not defined by the rules in
456 this section or in any standard or registry defined in this
457 document.
458
459 o The single-letter subtag 'i' is used by some grandfathered tags,
460 such as "i-default", where it always appears in the first position
461 and cannot be confused with an extension.
462
463 o All other single-letter and single-digit subtags are reserved to
464 introduce standardized extension subtag sequences as described in
465 Section 3.7.
466
4672.2.1. Primary Language Subtag
468
469 The primary language subtag is the first subtag in a language tag and
470 cannot be omitted, with two exceptions:
471
472 o The single-character subtag 'x' as the primary subtag indicates
473 that the language tag consists solely of subtags whose meaning is
474 defined by private agreement. For example, in the tag "x-fr-CH",
475 the subtags 'fr' and 'CH' do not represent the French language or
476 the country of Switzerland (or any other value in the IANA
477 registry) unless there is a private agreement in place to do so.
478 See Section 4.6.
479
480 o The single-character subtag 'i' is used by some grandfathered tags
481 (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other
482 grandfathered tags have a primary language subtag in their first
483 position.)
484
485 The following rules apply to the primary language subtag:
486
487 1. Two-character primary language subtags were defined in the IANA
488 registry according to the assignments found in the standard "ISO
489 639-1:2002, Codes for the representation of names of languages --
490 Part 1: Alpha-2 code" [ISO639-1], or using assignments
491 subsequently made by the ISO 639-1 registration authority (RA) or
492 governing standardization bodies.
493
494 2. Three-character primary language subtags in the IANA registry
495 were defined according to the assignments found in one of these
496 additional ISO 639 parts or assignments subsequently made by the
497 relevant ISO 639 registration authorities or governing
498 standardization bodies:
499
500 A. "ISO 639-2:1998 - Codes for the representation of names of
501 languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2]
502
503
504
505
506Phillips & Davis Best Current Practice [Page 9]
507
508RFC 5646 Language Tags September 2009
509
510
511 B. "ISO 639-3:2007 - Codes for the representation of names of
512 languages -- Part 3: Alpha-3 code for comprehensive coverage
513 of languages" [ISO639-3]
514
515 C. "ISO 639-5:2008 - Codes for the representation of names of
516 languages -- Part 5: Alpha-3 code for language families and
517 groups" [ISO639-5]
518
519 3. The subtags in the range 'qaa' through 'qtz' are reserved for
520 private use in language tags. These subtags correspond to codes
521 reserved by ISO 639-2 for private use. These codes MAY be used
522 for non-registered primary language subtags (instead of using
523 private use subtags following 'x-'). Please refer to Section 4.6
524 for more information on private use subtags.
525
526 4. Four-character language subtags are reserved for possible future
527 standardization.
528
529 5. Any language subtags of five to eight characters in length in the
530 IANA registry were defined via the registration process in
531 Section 3.5 and MAY be used to form the primary language subtag.
532 An example of what such a registration might include is the
533 grandfathered IANA registration "i-enochian". The subtag
534 'enochian' could be registered in the IANA registry as a primary
535 language subtag (assuming that ISO 639 does not register this
536 language first), making tags such as "enochian-AQ" and "enochian-
537 Latn" valid.
538
539 At the time this document was created, there were no examples of
540 this kind of subtag. Future registrations of this type are
541 discouraged: an attempt to register any new proposed primary
542 language MUST be made to the ISO 639 registration authority.
543 Proposals rejected by the ISO 639 registration authority are
544 unlikely to meet the criteria for primary language subtags and
545 are thus unlikely to be registered.
546
547 6. Other values MUST NOT be assigned to the primary subtag except by
548 revision or update of this document.
549
550 When languages have both an ISO 639-1 two-character code and a three-
551 character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only
552 the ISO 639-1 two-character code is defined in the IANA registry.
553
554 When a language has no ISO 639-1 two-character code and the ISO
555 639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code
556 for that language differ, only the Terminology code is defined in the
557 IANA registry. At the time this document was created, all languages
558 that had both kinds of three-character codes were also assigned a
559
560
561
562Phillips & Davis Best Current Practice [Page 10]
563
564RFC 5646 Language Tags September 2009
565
566
567 two-character code; it is expected that future assignments of this
568 nature will not occur.
569
570 In order to avoid instability in the canonical form of tags, if a
571 two-character code is added to ISO 639-1 for a language for which a
572 three-character code was already included in either ISO 639-2 or ISO
573 639-3, the two-character code MUST NOT be registered. See
574 Section 3.4.
575
576 For example, if some content were tagged with 'haw' (Hawaiian), which
577 currently has no two-character code, the tag would not need to be
578 changed if ISO 639-1 were to assign a two-character code to the
579 Hawaiian language at a later date.
580
581 To avoid these problems with versioning and subtag choice (as
582 experienced during the transition between RFC 1766 and RFC 3066), as
583 well as to ensure the canonical nature of subtags defined by this
584 document, the ISO 639 Registration Authority Joint Advisory Committee
585 (ISO 639/RA-JAC) has included the following statement in
586 [iso639.prin]:
587
588 "A language code already in ISO 639-2 at the point of freezing ISO
589 639-1 shall not later be added to ISO 639-1. This is to ensure
590 consistency in usage over time, since users are directed in
591 Internet applications to employ the alpha-3 code when an alpha-2
592 code for that language is not available."
593
5942.2.2. Extended Language Subtags
595
596 Extended language subtags are used to identify certain specially
597 selected languages that, for various historical and compatibility
598 reasons, are closely identified with or tagged using an existing
599 primary language subtag. Extended language subtags are always used
600 with their enclosing primary language subtag (indicated with a
601 'Prefix' field in the registry) when used to form the language tag.
602 All languages that have an extended language subtag in the registry
603 also have an identical primary language subtag record in the
604 registry. This primary language subtag is RECOMMENDED for forming
605 the language tag. The following rules apply to the extended language
606 subtags:
607
608 1. Extended language subtags consist solely of three-letter subtags.
609 All extended language subtag records defined in the registry were
610 defined according to the assignments found in [ISO639-3].
611 Language collections and groupings, such as defined in
612 [ISO639-5], are specifically excluded from being extended
613 language subtags.
614
615
616
617
618Phillips & Davis Best Current Practice [Page 11]
619
620RFC 5646 Language Tags September 2009
621
622
623 2. Extended language subtag records MUST include exactly one
624 'Prefix' field indicating an appropriate subtag or sequence of
625 subtags for that extended language subtag.
626
627 3. Extended language subtag records MUST include a 'Preferred-
628 Value'. The 'Preferred-Value' and 'Subtag' fields MUST be
629 identical.
630
631 4. Although the ABNF production 'extlang' permits up to three
632 extended language tags in the language tag, extended language
633 subtags MUST NOT include another extended language subtag in
634 their 'Prefix'. That is, the second and third extended language
635 subtag positions in a language tag are permanently reserved and
636 tags that include those subtags in that position are, and will
637 always remain, invalid.
638
639 For example, the macrolanguage Chinese ('zh') encompasses a number of
640 languages. For compatibility reasons, each of these languages has
641 both a primary and extended language subtag in the registry. A few
642 selected examples of these include Gan Chinese ('gan'), Cantonese
643 Chinese ('yue'), and Mandarin Chinese ('cmn'). Each is encompassed
644 by the macrolanguage 'zh' (Chinese). Therefore, they each have the
645 prefix "zh" in their registry records. Thus, Gan Chinese is
646 represented with tags beginning "zh-gan" or "gan", Cantonese with
647 tags beginning either "yue" or "zh-yue", and Mandarin Chinese with
648 "zh-cmn" or "cmn". The language subtag 'zh' can still be used
649 without an extended language subtag to label a resource as some
650 unspecified variety of Chinese, while the primary language subtag
651 ('gan', 'yue', 'cmn') is preferred to using the extended language
652 form ("zh-gan", "zh-yue", "zh-cmn").
653
6542.2.3. Script Subtag
655
656 Script subtags are used to indicate the script or writing system
657 variations that distinguish the written forms of a language or its
658 dialects. The following rules apply to the script subtags:
659
660 1. Script subtags MUST follow any primary and extended language
661 subtags and MUST precede any other type of subtag.
662
663 2. Script subtags consist of four letters and were defined according
664 to the assignments found in [ISO15924] ("Information and
665 documentation -- Codes for the representation of names of
666 scripts"), or subsequently assigned by the ISO 15924 registration
667 authority or governing standardization bodies. Only codes
668 assigned by ISO 15924 will be considered for registration.
669
670
671
672
673
674Phillips & Davis Best Current Practice [Page 12]
675
676RFC 5646 Language Tags September 2009
677
678
679 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private
680 use in language tags. These subtags correspond to codes reserved
681 by ISO 15924 for private use. These codes MAY be used for non-
682 registered script values. Please refer to Section 4.6 for more
683 information on private use subtags.
684
685 4. There MUST be at most one script subtag in a language tag, and
686 the script subtag SHOULD be omitted when it adds no
687 distinguishing value to the tag or when the primary or extended
688 language subtag's record in the subtag registry includes a
689 'Suppress-Script' field listing the applicable script subtag.
690
691 For example: "sr-Latn" represents Serbian written using the Latin
692 script.
693
6942.2.4. Region Subtag
695
696 Region subtags are used to indicate linguistic variations associated
697 with or appropriate to a specific country, territory, or region.
698 Typically, a region subtag is used to indicate variations such as
699 regional dialects or usage, or region-specific spelling conventions.
700 It can also be used to indicate that content is expressed in a way
701 that is appropriate for use throughout a region, for instance,
702 Spanish content tailored to be useful throughout Latin America.
703
704 The following rules apply to the region subtags:
705
706 1. Region subtags MUST follow any primary language, extended
707 language, or script subtags and MUST precede any other type of
708 subtag.
709
710 2. Two-letter region subtags were defined according to the
711 assignments found in [ISO3166-1] ("Codes for the representation
712 of names of countries and their subdivisions -- Part 1: Country
713 codes"), using the list of alpha-2 country codes or using
714 assignments subsequently made by the ISO 3166-1 maintenance
715 agency or governing standardization bodies. In addition, the
716 codes that are "exceptionally reserved" (as opposed to
717 "assigned") in ISO 3166-1 were also defined in the registry, with
718 the exception of 'UK', which is an exact synonym for the assigned
719 code 'GB'.
720
721 3. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
722 reserved for private use in language tags. These subtags
723 correspond to codes reserved by ISO 3166 for private use. These
724 codes MAY be used for private use region subtags (instead of
725 using a private use subtag sequence). Please refer to
726 Section 4.6 for more information on private use subtags.
727
728
729
730Phillips & Davis Best Current Practice [Page 13]
731
732RFC 5646 Language Tags September 2009
733
734
735 4. Three-character region subtags consist solely of digit (number)
736 characters and were defined according to the assignments found in
737 the UN Standard Country or Area Codes for Statistical Use
738 [UN_M.49] or assignments subsequently made by the governing
739 standards body. Not all of the UN M.49 codes are defined in the
740 IANA registry. The following rules define which codes are
741 entered into the registry as valid subtags:
742
743 A. UN numeric codes assigned to 'macro-geographical
744 (continental)' or sub-regions MUST be registered in the
745 registry. These codes are not associated with an assigned
746 ISO 3166-1 alpha-2 code and represent supra-national areas,
747 usually covering more than one nation, state, province, or
748 territory.
749
750 B. UN numeric codes for 'economic groupings' or 'other
751 groupings' MUST NOT be registered in the IANA registry and
752 MUST NOT be used to form language tags.
753
754 C. When ISO 3166-1 reassigns a code formerly used for one
755 country or area to another country or area and that code
756 already is present in the registry, the UN numeric code for
757 that country or area MUST be registered in the registry as
758 described in Section 3.4 and MUST be used to form language
759 tags that represent the country or region for which it is
760 defined (rather than the recycled ISO 3166-1 code).
761
762 D. UN numeric codes for countries or areas for which there is an
763 associated ISO 3166-1 alpha-2 code in the registry MUST NOT
764 be entered into the registry and MUST NOT be used to form
765 language tags. Note that the ISO 3166-based subtag in the
766 registry MUST actually be associated with the UN M.49 code in
767 question.
768
769 E. For historical reasons, the UN numeric code 830 (Channel
770 Islands), which was not registered at the time this document
771 was adopted and had, at that time, no corresponding ISO
772 3166-1 code, MAY be entered into the IANA registry via the
773 process described in Section 3.5, provided no ISO 3166-1 code
774 with that exact meaning has been previously registered.
775
776 F. All other UN numeric codes for countries or areas that do not
777 have an associated ISO 3166-1 alpha-2 code MUST NOT be
778 entered into the registry and MUST NOT be used to form
779 language tags. For more information about these codes, see
780 Section 3.4.
781
782
783
784
785
786Phillips & Davis Best Current Practice [Page 14]
787
788RFC 5646 Language Tags September 2009
789
790
791 5. The alphanumeric codes in Appendix X of the UN document MUST NOT
792 be entered into the registry and MUST NOT be used to form
793 language tags. (At the time this document was created, these
794 values matched the ISO 3166-1 alpha-2 codes.)
795
796 6. There MUST be at most one region subtag in a language tag and the
797 region subtag MAY be omitted, as when it adds no distinguishing
798 value to the tag.
799
800 For example:
801
802 "de-AT" represents German ('de') as used in Austria ('AT').
803
804 "sr-Latn-RS" represents Serbian ('sr') written using Latin script
805 ('Latn') as used in Serbia ('RS').
806
807 "es-419" represents Spanish ('es') appropriate to the UN-defined
808 Latin America and Caribbean region ('419').
809
8102.2.5. Variant Subtags
811
812 Variant subtags are used to indicate additional, well-recognized
813 variations that define a language or its dialects that are not
814 covered by other available subtags. The following rules apply to the
815 variant subtags:
816
817 1. Variant subtags MUST follow any primary language, extended
818 language, script, or region subtags and MUST precede any
819 extension or private use subtag sequences.
820
821 2. Variant subtags, as a collection, are not associated with any
822 particular external standard. The meaning of variant subtags in
823 the registry is defined in the course of the registration process
824 defined in Section 3.5. Note that any particular variant subtag
825 might be associated with some external standard. However,
826 association with a standard is not required for registration.
827
828 3. More than one variant MAY be used to form the language tag.
829
830 4. Variant subtags MUST be registered with IANA according to the
831 rules in Section 3.5 of this document before being used to form
832 language tags. In order to distinguish variants from other types
833 of subtags, registrations MUST meet the following length and
834 content restrictions:
835
836 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be
837 at least five characters long.
838
839
840
841
842Phillips & Davis Best Current Practice [Page 15]
843
844RFC 5646 Language Tags September 2009
845
846
847 2. Variant subtags that begin with a digit (0-9) MUST be at
848 least four characters long.
849
850 5. The same variant subtag MUST NOT be used more than once within a
851 language tag.
852
853 * For example, the tag "de-DE-1901-1901" is not valid.
854
855 Variant subtag records in the Language Subtag Registry MAY include
856 one or more 'Prefix' (Section 3.1.8) fields. Each 'Prefix' indicates
857 a suitable sequence of subtags for forming (with other subtags, as
858 appropriate) a language tag when using the variant.
859
860 Most variants that share a prefix are mutually exclusive. For
861 example, the German orthographic variations '1996' and '1901' SHOULD
862 NOT be used in the same tag, as they represent the dates of different
863 spelling reforms. A variant that can meaningfully be used in
864 combination with another variant SHOULD include a 'Prefix' field in
865 its registry record that lists that other variant. For example, if
866 another German variant 'example' were created that made sense to use
867 with '1996', then 'example' should include two 'Prefix' fields: "de"
868 and "de-1996".
869
870 For example:
871
872 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.
873
874 "de-CH-1996" represents German as used in Switzerland and as
875 written using the spelling reform beginning in the year 1996 C.E.
876
8772.2.6. Extension Subtags
878
879 Extensions provide a mechanism for extending language tags for use in
880 various applications. They are intended to identify information that
881 is commonly used in association with languages or language tags but
882 that is not part of language identification. See Section 3.7. The
883 following rules apply to extensions:
884
885 1. An extension MUST follow at least a primary language subtag.
886 That is, a language tag cannot begin with an extension.
887 Extensions extend language tags, they do not override or replace
888 them. For example, "a-value" is not a well-formed language tag,
889 while "de-a-value" is. Note that extensions cannot be used in
890 tags that are entirely private use (that is, tags starting with
891 "x-").
892
893
894
895
896
897
898Phillips & Davis Best Current Practice [Page 16]
899
900RFC 5646 Language Tags September 2009
901
902
903 2. Extension subtags are separated from the other subtags defined in
904 this document by a single-character subtag (called a
905 "singleton"). The singleton MUST be one allocated to a
906 registration authority via the mechanism described in Section 3.7
907 and MUST NOT be the letter 'x', which is reserved for private use
908 subtag sequences.
909
910 3. Each singleton subtag MUST appear at most one time in each tag
911 (other than as a private use subtag). That is, singleton subtags
912 MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is
913 invalid because the subtag 'a' appears twice. Note that the tag
914 "en-a-bbb-x-a-ccc" is valid because the second appearance of the
915 singleton 'a' is in a private use sequence.
916
917 4. Extension subtags MUST meet whatever requirements are set by the
918 document that defines their singleton prefix and whatever
919 requirements are provided by the maintaining authority. Note
920 that there might not be a registry of these subtags and
921 validating processors are not required to validate extensions.
922
923 5. Each extension subtag MUST be from two to eight characters long
924 and consist solely of letters or digits, with each subtag
925 separated by a single '-'. Case distinctions are ignored in
926 extensions (as with any language subtag) and normalized subtags
927 of this type are expected to be in lowercase.
928
929 6. Each singleton MUST be followed by at least one extension subtag.
930 For example, the tag "tlh-a-b-foo" is invalid because the first
931 singleton 'a' is followed immediately by another singleton 'b'.
932
933 7. Extension subtags MUST follow all primary language, extended
934 language, script, region, and variant subtags in a tag and MUST
935 precede any private use subtag sequences.
936
937 8. All subtags following the singleton and before another singleton
938 are part of the extension. Example: In the tag "fr-a-Latn", the
939 subtag 'Latn' does not represent the script subtag 'Latn' defined
940 in the IANA Language Subtag Registry. Its meaning is defined by
941 the extension 'a'.
942
943 9. In the event that more than one extension appears in a single
944 tag, the tag SHOULD be canonicalized as described in Section 4.5,
945 by ordering the various extension sequences into case-insensitive
946 ASCII order.
947
948 For example, if an extension were defined for the singleton 'r' and
949 it defined the subtags shown, then the following tag would be a valid
950 example: "en-Latn-GB-boont-r-extended-sequence-x-private".
951
952
953
954Phillips & Davis Best Current Practice [Page 17]
955
956RFC 5646 Language Tags September 2009
957
958
9592.2.7. Private Use Subtags
960
961 Private use subtags are used to indicate distinctions in language
962 that are important in a given context by private agreement. The
963 following rules apply to private use subtags:
964
965 1. Private use subtags are separated from the other subtags defined
966 in this document by the reserved single-character subtag 'x'.
967
968 2. Private use subtags MUST conform to the format and content
969 constraints defined in the ABNF for all subtags; that is, they
970 MUST consist solely of letters and digits and not exceed eight
971 characters in length.
972
973 3. Private use subtags MUST follow all primary language, extended
974 language, script, region, variant, and extension subtags in the
975 tag. Another way of saying this is that all subtags following
976 the singleton 'x' MUST be considered private use. Example: The
977 subtag 'US' in the tag "en-x-US" is a private use subtag.
978
979 4. A tag MAY consist entirely of private use subtags.
980
981 5. No source is defined for private use subtags. Use of private use
982 subtags is by private agreement only.
983
984 6. Private use subtags are NOT RECOMMENDED where alternatives exist
985 or for general interchange. See Section 4.6 for more information
986 on private use subtag choice.
987
988 For example, suppose a group of scholars is studying some texts in
989 medieval Greek. They might agree to use some collection of private
990 use subtags to identify different styles of writing in the texts.
991 For example, they might use 'el-x-koine' for documents in the
992 "common" style while using 'el-x-attic' for other documents that
993 mimic the Attic style. These subtags would not be recognized by
994 outside processes or systems, but might be useful in categorizing
995 various texts for study by those in the group.
996
997 In the registry, there are also subtags derived from codes reserved
998 by ISO 639, ISO 15924, or ISO 3166 for private use. Do not confuse
999 these with private use subtag sequences following the subtag 'x'.
1000 See Section 4.6.
1001
10022.2.8. Grandfathered and Redundant Registrations
1003
1004 Prior to RFC 4646, whole language tags were registered according to
1005 the rules in RFC 1766 and/or RFC 3066. All of these registered tags
1006 remain valid as language tags.
1007
1008
1009
1010Phillips & Davis Best Current Practice [Page 18]
1011
1012RFC 5646 Language Tags September 2009
1013
1014
1015 Many of these registered tags were made redundant by the advent of
1016 either RFC 4646 or this document. A redundant tag is a grandfathered
1017 registration whose individual subtags appear with the same semantic
1018 meaning in the registry. For example, the tag "zh-Hant" (Traditional
1019 Chinese) can now be composed from the subtags 'zh' (Chinese) and
1020 'Hant' (Han script traditional variant). These redundant tags are
1021 maintained in the registry as records of type 'redundant', mostly as
1022 a matter of historical curiosity.
1023
1024 The remainder of the previously registered tags are "grandfathered".
1025 These tags are classified into two groups: 'regular' and 'irregular'.
1026
1027 Grandfathered tags that (appear to) match the 'langtag' production in
1028 Figure 1 are considered 'regular' grandfathered tags. These tags
1029 contain one or more subtags that either do not individually appear in
1030 the registry or appear but with a different semantic meaning: each
1031 tag, in its entirety, represents a language or collection of
1032 languages.
1033
1034 Grandfathered tags that do not match the 'langtag' production in the
1035 ABNF and would otherwise be invalid are considered 'irregular'
1036 grandfathered tags. With the exception of "en-GB-oed", which is a
1037 variant of "en-GB", each of them, in its entirety, represents a
1038 language.
1039
1040 Many of the grandfathered tags have been superseded by the subsequent
1041 addition of new subtags: each superseded record contains a
1042 'Preferred-Value' field that ought to be used to form language tags
1043 representing that value. For example, the tag "art-lojban" is
1044 superseded by the primary language subtag 'jbo'.
1045
10462.2.9. Classes of Conformance
1047
1048 Implementations sometimes need to describe their capabilities with
1049 regard to the rules and practices described in this document. Tags
1050 can be checked or verified in a number of ways, but two particular
1051 classes of tag conformance are formally defined here.
1052
1053 A tag is considered "well-formed" if it conforms to the ABNF
1054 (Section 2.1). Language tags may be well-formed in terms of syntax
1055 but not valid in terms of content. However, many operations
1056 involving language tags work well without knowing anything about the
1057 meaning or validity of the subtags.
1058
1059 A tag is considered "valid" if it satisfies these conditions:
1060
1061 o The tag is well-formed.
1062
1063
1064
1065
1066Phillips & Davis Best Current Practice [Page 19]
1067
1068RFC 5646 Language Tags September 2009
1069
1070
1071 o Either the tag is in the list of grandfathered tags or all of its
1072 primary language, extended language, script, region, and variant
1073 subtags appear in the IANA Language Subtag Registry as of the
1074 particular registry date.
1075
1076 o There are no duplicate variant subtags.
1077
1078 o There are no duplicate singleton (extension) subtags.
1079
1080 Note that a tag's validity depends on the date of the registry used
1081 to validate the tag. A more recent copy of the registry might
1082 contain a subtag that an older version does not.
1083
1084 A tag is considered valid for a given extension (Section 3.7) (as of
1085 a particular version, revision, and date) if it meets the criteria
1086 for "valid" above and also satisfies this condition:
1087
1088 Each subtag used in the extension part of the tag is valid
1089 according to the extension.
1090
1091 Older specifications or language tag implementations sometimes
1092 reference [RFC3066]. A wider array of tags was considered well-
1093 formed under that document. Any tags that were valid for use under
1094 RFC 3066 are both well-formed and valid under this document's syntax;
1095 only invalid or illegal tags were well-formed under the earlier
1096 definition but no longer are. The language tag syntax under RFC 3066
1097 was:
1098
1099 obs-language-tag = primary-subtag *( "-" subtag )
1100 primary-subtag = 1*8ALPHA
1101 subtag = 1*8(ALPHA / DIGIT)
1102
1103 Figure 2: RFC 3066 Language Tag Syntax
1104
1105 Subtags designated for private use as well as private use sequences
1106 introduced by the 'x' subtag are available for cases in which no
1107 assigned subtags are available and registration is not a suitable
1108 option. For example, one might use a tag such as "no-QQ", where 'QQ'
1109 is one of a range of private use ISO 3166-1 codes to indicate an
1110 otherwise undefined region. Users MUST NOT assign language tags that
1111 use subtags that do not appear in the registry other than in private
1112 use sequences (such as the subtag 'personal' in the tag "en-x-
1113 personal"). Besides not being valid, the user also risks collision
1114 with a future possible assignment or registrations.
1115
1116 Note well: although the 'Language-Tag' production appearing in this
1117 document is functionally equivalent to the one in [RFC4646], it has
1118
1119
1120
1121
1122Phillips & Davis Best Current Practice [Page 20]
1123
1124RFC 5646 Language Tags September 2009
1125
1126
1127 been changed to prevent certain errors in well-formedness arising
1128 from the old 'grandfathered' production.
1129
11303. Registry Format and Maintenance
1131
1132 The IANA Language Subtag Registry ("the registry") contains a
1133 comprehensive list of all of the subtags valid in language tags.
1134 This allows implementers a straightforward and reliable way to
1135 validate language tags. The registry will be maintained so that,
1136 except for extension subtags, it is possible to validate all of the
1137 subtags that appear in a language tag under the provisions of this
1138 document or its revisions or successors. In addition, the meaning of
1139 the various subtags will be unambiguous and stable over time. (The
1140 meaning of private use subtags, of course, is not defined by the
1141 registry.)
1142
1143 This section defines the registry along with the maintenance and
1144 update procedures associated with it, as well as a registry for
1145 extensions to language tags (Section 3.7).
1146
11473.1. Format of the IANA Language Subtag Registry
1148
1149 The IANA Language Subtag Registry is a machine-readable file in the
1150 format described in this section, plus copies of the registration
1151 forms approved in accordance with the process described in
1152 Section 3.5.
1153
1154 The existing registration forms for grandfathered and redundant tags
1155 taken from RFC 3066 have been maintained as part of the obsolete RFC
1156 3066 registry. The subtags added to the registry by either [RFC4645]
1157 or [RFC5645] do not have separate registration forms (so no forms are
1158 archived for these additions).
1159
11603.1.1. File Format
1161
1162 The registry is a [Unicode] text file and consists of a series of
1163 records in a format based on "record-jar" (described in
1164 [record-jar]). Each record, in turn, consists of a series of fields
1165 that describe the various subtags and tags. The actual registry file
1166 is encoded using the UTF-8 [RFC3629] character encoding.
1167
1168 Each field can be considered a single, logical line of characters.
1169 Each field contains a "field-name" and a "field-body". These are
1170 separated by a "field-separator". The field-separator is a COLON
1171 character (U+003A) plus any surrounding whitespace. Each field is
1172 terminated by the newline sequence CRLF. The text in each field MUST
1173 be in Unicode Normalization Form C (NFC).
1174
1175
1176
1177
1178Phillips & Davis Best Current Practice [Page 21]
1179
1180RFC 5646 Language Tags September 2009
1181
1182
1183 A collection of fields forms a "record". Records are separated by
1184 lines containing only the sequence "%%" (U+0025 U+0025).
1185
1186 Although fields are logically a single line of text, each line of
1187 text in the file format is limited to 72 bytes in length. To
1188 accommodate this, the field-body can be split into a multiple-line
1189 representation; this is called "folding". Folding is done according
1190 to customary conventions for line-wrapping. This is typically on
1191 whitespace boundaries, but can occur between other characters when
1192 the value does not include spaces, such as when a language does not
1193 use whitespace between words. In any event, there MUST NOT be breaks
1194 inside a multibyte UTF-8 sequence or in the middle of a combining
1195 character sequence. For more information, see [UAX14].
1196
1197 Although the file format uses the Unicode character set and the file
1198 itself is encoded using the UTF-8 encoding, fields are restricted to
1199 the printable characters from the US-ASCII [ISO646] repertoire unless
1200 otherwise indicated in the description of a specific field
1201 (Section 3.1.2).
1202
1203 The format of the registry is described by the following ABNF
1204 [RFC5234]. Character numbers (code points) are taken from Unicode,
1205 and terminals in the ABNF productions are in terms of characters
1206 rather than bytes.
1207
1208 registry = record *("%%" CRLF record)
1209 record = 1*field
1210 field = ( field-name field-sep field-body CRLF )
1211 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]
1212 field-sep = *SP ":" *SP
1213 field-body = *([[*SP CRLF] 1*SP] 1*CHARS)
1214 CHARS = (%x21-10FFFF) ; Unicode code points
1215
1216 Figure 3: Registry Format ABNF
1217
1218 The sequence '..' (U+002E U+002E) in a field-body denotes a range of
1219 values. Such a range represents all subtags of the same length that
1220 are in alphabetic or numeric order within that range, including the
1221 values explicitly mentioned. For example, 'a..c' denotes the values
1222 'a', 'b', and 'c', and '11..13' denotes the values '11', '12', and
1223 '13'.
1224
1225 All fields whose field-body contains a date value use the "full-date"
1226 format specified in [RFC3339]. For example, "2004-06-28" represents
1227 June 28, 2004, in the Gregorian calendar.
1228
1229
1230
1231
1232
1233
1234Phillips & Davis Best Current Practice [Page 22]
1235
1236RFC 5646 Language Tags September 2009
1237
1238
12393.1.2. Record and Field Definitions
1240
1241 There are three types of records in the registry: "File-Date",
1242 "Subtag", and "Tag".
1243
1244 The first record in the registry is always the "File-Date" record.
1245 This record occurs only once in the file and contains a single field
1246 whose field-name is "File-Date". The field-body of this record
1247 contains a date (see Section 5.1), making it possible to easily
1248 recognize different versions of the registry.
1249
1250 File-Date: 2004-06-28
1251 %%
1252
1253 Figure 4: Example of the File-Date Record
1254
1255 Subsequent records contain multiple fields and represent information
1256 about either subtags or tags. Both types of records have an
1257 identical structure, except that "Subtag" records contain a field
1258 with a field-name of "Subtag", while, unsurprisingly, "Tag" records
1259 contain a field with a field-name of "Tag". Field-names MUST NOT
1260 occur more than once per record, with the exception of the
1261 'Description', 'Comments', and 'Prefix' fields.
1262
1263 Each record MUST contain at least one of each of the following
1264 fields:
1265
1266 o 'Type'
1267
1268 * Type's field-body MUST consist of one of the following strings:
1269 "language", "extlang", "script", "region", "variant",
1270 "grandfathered", and "redundant"; it denotes the type of tag or
1271 subtag.
1272
1273 o Either 'Subtag' or 'Tag'
1274
1275 * Subtag's field-body contains the subtag being defined. This
1276 field MUST appear in all records whose 'Type' has one of these
1277 values: "language", "extlang", "script", "region", or
1278 "variant".
1279
1280 * Tag's field-body contains a complete language tag. This field
1281 MUST appear in all records whose 'Type' has one of these
1282 values: "grandfathered" or "redundant". If the 'Type' is
1283 "grandfathered", then the 'Tag' field-body will be one of the
1284 tags listed in either the 'regular' or 'irregular' production
1285 found in Section 2.1.
1286
1287
1288
1289
1290Phillips & Davis Best Current Practice [Page 23]
1291
1292RFC 5646 Language Tags September 2009
1293
1294
1295 o 'Description'
1296
1297 * Description's field-body contains a non-normative description
1298 of the subtag or tag.
1299
1300 o 'Added'
1301
1302 * Added's field-body contains the date the record was registered
1303 or, in the case of grandfathered or redundant tags, the date
1304 the corresponding tag was registered under the rules of
1305 [RFC1766] or [RFC3066].
1306
1307 Each record MAY also contain the following fields:
1308
1309 o 'Deprecated'
1310
1311 * Deprecated's field-body contains the date the record was
1312 deprecated. In some cases, this value is earlier than that of
1313 the 'Added' field in the same record. That is, the date of
1314 deprecation preceded the addition of the record to the
1315 registry.
1316
1317 o 'Preferred-Value'
1318
1319 * Preferred-Value's field-body contains a canonical mapping from
1320 this record's value to a modern equivalent that is preferred in
1321 its place. Depending on the value of the 'Type' field, this
1322 value can take different forms:
1323
1324 + For fields of type 'language', 'Preferred-Value' contains
1325 the primary language subtag that is preferred when forming
1326 the language tag.
1327
1328 + For fields of type 'script', 'region', or 'variant',
1329 'Preferred-Value' contains the subtag of the same type that
1330 is preferred for forming the language tag.
1331
1332 + For fields of type 'extlang', 'grandfathered', or
1333 'redundant', 'Preferred-Value' contains an "extended
1334 language range" [RFC4647] that is preferred for forming the
1335 language tag. That is, the preferred language tag will
1336 contain, in order, each of the subtags that appears in the
1337 'Preferred-Value'; additional fields can be included in a
1338 language tag, as described elsewhere in this document. For
1339 example, the replacement for the grandfathered tag "zh-min-
1340 nan" (Min Nan Chinese) is "nan", which can be used as the
1341
1342
1343
1344
1345
1346Phillips & Davis Best Current Practice [Page 24]
1347
1348RFC 5646 Language Tags September 2009
1349
1350
1351 basis for tags such as "nan-Hant" or "nan-TW" (note that the
1352 extended language subtag form such as "zh-nan-Hant" or "zh-
1353 nan-TW" can also be used).
1354
1355 o 'Prefix'
1356
1357 * Prefix's field-body contains a valid language tag that is
1358 RECOMMENDED as one possible prefix to this record's subtag.
1359 This field MAY appear in records whose 'Type' field-body is
1360 either 'extlang' or 'variant' (it MUST NOT appear in any other
1361 record type).
1362
1363 o 'Suppress-Script'
1364
1365 * Suppress-Script's field-body contains a script subtag that
1366 SHOULD NOT be used to form language tags with the associated
1367 primary or extended language subtag. This field MUST appear
1368 only in records whose 'Type' field-body is 'language' or
1369 'extlang'. See Section 4.1.
1370
1371 o 'Macrolanguage'
1372
1373 * Macrolanguage's field-body contains a primary language subtag
1374 defined by ISO 639 as the "macrolanguage" that encompasses this
1375 language subtag. This field MUST appear only in records whose
1376 'Type' field-body is either 'language' or 'extlang'.
1377
1378 o 'Scope'
1379
1380 * Scope's field-body contains information about a primary or
1381 extended language subtag indicating the type of language code
1382 according to ISO 639. The values permitted in this field are
1383 "macrolanguage", "collection", "special", and "private-use".
1384 This field only appears in records whose 'Type' field-body is
1385 either 'language' or 'extlang'. When this field is omitted,
1386 the language is an individual language.
1387
1388 o 'Comments'
1389
1390 * Comments's field-body contains additional information about the
1391 subtag, as deemed appropriate for understanding the registry
1392 and implementing language tags using the subtag or tag.
1393
1394 Future versions of this document might add additional fields to the
1395 registry; implementations SHOULD ignore fields found in the registry
1396 that are not defined in this document.
1397
1398
1399
1400
1401
1402Phillips & Davis Best Current Practice [Page 25]
1403
1404RFC 5646 Language Tags September 2009
1405
1406
14073.1.3. Type Field
1408
1409 The field 'Type' contains the string identifying the record type in
1410 which it appears. Values for the 'Type' field-body are: "language"
1411 (Section 2.2.1); "extlang" (Section 2.2.2); "script" (Section 2.2.3);
1412 "region" (Section 2.2.4); "variant" (Section 2.2.5); "grandfathered"
1413 or "redundant" (Section 2.2.8).
1414
14153.1.4. Subtag and Tag Fields
1416
1417 The field 'Subtag' contains the subtag defined in the record. The
1418 field 'Tag' appears in records whose 'Type' is either 'grandfathered'
1419 or 'redundant' and contains a tag registered under [RFC3066].
1420
1421 The 'Subtag' field-body MUST follow the casing conventions described
1422 in Section 2.1.1. All subtags use lowercase letters in the field-
1423 body, with two exceptions:
1424
1425 Subtags whose 'Type' field is 'script' (in other words, subtags
1426 defined by ISO 15924) MUST use titlecase.
1427
1428 Subtags whose 'Type' field is 'region' (in other words, the non-
1429 numeric region subtags defined by ISO 3166-1) MUST use all
1430 uppercase.
1431
1432 The 'Tag' field-body MUST be formatted according to the rules
1433 described in Section 2.1.1.
1434
14353.1.5. Description Field
1436
1437 The field 'Description' contains a description of the tag or subtag
1438 in the record. The 'Description' field MAY appear more than once per
1439 record. The 'Description' field MAY include the full range of
1440 Unicode characters. At least one of the 'Description' fields MUST be
1441 written or transcribed into the Latin script; additional
1442 'Description' fields MAY be in any script or language.
1443
1444 The 'Description' field is used for identification purposes.
1445 Descriptions SHOULD contain all and only that information necessary
1446 to distinguish one subtag from others with which it might be
1447 confused. They are not intended to provide general background
1448 information or to provide all possible alternate names or
1449 designations. 'Description' fields don't necessarily represent the
1450 actual native name of the item in the record, nor are any of the
1451 descriptions guaranteed to be in any particular language (such as
1452 English or French, for example).
1453
1454
1455
1456
1457
1458Phillips & Davis Best Current Practice [Page 26]
1459
1460RFC 5646 Language Tags September 2009
1461
1462
1463 Descriptions in the registry that correspond to ISO 639, ISO 15924,
1464 ISO 3166-1, or UN M.49 codes are intended only to indicate the
1465 meaning of that identifier as defined in the source standard at the
1466 time it was added to the registry or as subsequently modified, within
1467 the bounds of the stability rules (Section 3.4), via subsequent
1468 registration. The 'Description' does not replace the content of the
1469 source standard itself. 'Description' fields are not intended to be
1470 the localized English names for the subtags. Localization or
1471 translation of language tag and subtag descriptions is out of scope
1472 of this document.
1473
1474 For subtags taken from a source standard (such as ISO 639 or ISO
1475 15924), the 'Description' fields in the record are also initially
1476 taken from that source standard. Multiple descriptions in the source
1477 standard are split into separate 'Description' fields. The source
1478 standard's descriptions MAY be edited or modified, either prior to
1479 insertion or via the registration process, and additional or
1480 extraneous descriptions omitted or removed. Each 'Description' field
1481 MUST be unique within the record in which it appears, and formatting
1482 variations of the same description SHOULD NOT occur in that specific
1483 record. For example, while the ISO 639-1 code 'fy' has both the
1484 description "Western Frisian" and the description "Frisian, Western"
1485 in that standard, only one of these descriptions appears in the
1486 registry.
1487
1488 To help ensure that users do not become confused about which subtag
1489 to use, 'Description' fields assigned to a record of any specific
1490 type ('language', 'extlang', 'script', and so on) MUST be unique
1491 within that given record type with the following exception: if a
1492 particular 'Description' field occurs in multiple records of a given
1493 type, then at most one of the records can omit the 'Deprecated'
1494 field. All deprecated records that share a 'Description' MUST have
1495 the same 'Preferred-Value', and all non-deprecated records MUST be
1496 that 'Preferred-Value'. This means that two records of the same type
1497 that share a 'Description' are also semantically equivalent and no
1498 more than one record with a given 'Description' is preferred for that
1499 meaning.
1500
1501 For example, consider the 'language' subtags 'zza' (Zaza) and 'diq'
1502 (Dimli). It so happens that 'zza' is a macrolanguage enclosing 'diq'
1503 and thus also has a description in ISO 639-3 of "Dimli". This
1504 description was edited to read "Dimli (macrolanguage)" in the
1505 registry record for 'zza' to prevent a collision.
1506
1507 By contrast, the subtags 'he' and 'iw' share a 'Description' value of
1508 "Hebrew"; this is permitted because 'iw' is deprecated and its
1509 'Preferred-Value' is 'he'.
1510
1511
1512
1513
1514Phillips & Davis Best Current Practice [Page 27]
1515
1516RFC 5646 Language Tags September 2009
1517
1518
1519 For fields of type 'language', the first 'Description' field
1520 appearing in the registry corresponds whenever possible to the
1521 Reference Name assigned by ISO 639-3. This helps facilitate cross-
1522 referencing between ISO 639 and the registry.
1523
1524 When creating or updating a record due to the action of one of the
1525 source standards, the Language Subtag Reviewer MAY edit descriptions
1526 to correct irregularities in formatting (such as misspellings,
1527 inappropriate apostrophes or other punctuation, or excessive or
1528 missing spaces) prior to submitting the proposed record to the
1529 ietf-languages@iana.org list for consideration.
1530
15313.1.6. Deprecated Field
1532
1533 The field 'Deprecated' contains the date the record was deprecated
1534 and MAY be added, changed, or removed from any record via the
1535 maintenance process described in Section 3.3 or via the registration
1536 process described in Section 3.5. Usually, the addition of a
1537 'Deprecated' field is due to the action of one of the standards
1538 bodies, such as ISO 3166, withdrawing a code. Although valid in
1539 language tags, subtags and tags with a 'Deprecated' field are
1540 deprecated, and validating processors SHOULD NOT generate these
1541 subtags. Note that a record that contains a 'Deprecated' field and
1542 no corresponding 'Preferred-Value' field has no replacement mapping.
1543
1544 In some historical cases, it might not have been possible to
1545 reconstruct the original deprecation date. For these cases, an
1546 approximate date appears in the registry. Some subtags and some
1547 grandfathered or redundant tags were deprecated before the initial
1548 creation of the registry. The exact rules for this appear in Section
1549 2 of [RFC4645]. Note that these records have a 'Deprecated' field
1550 with an earlier date then the corresponding 'Added' field!
1551
15523.1.7. Preferred-Value Field
1553
1554 The field 'Preferred-Value' contains a mapping between the record in
1555 which it appears and another tag or subtag (depending on the record's
1556 'Type'). The value in this field is used for canonicalization (see
1557 Section 4.5). In cases where the subtag or tag also has a
1558 'Deprecated' field, then the 'Preferred-Value' is RECOMMENDED as the
1559 best choice to represent the value of this record when selecting a
1560 language tag.
1561
1562 Records containing a 'Preferred-Value' fall into one of these four
1563 groups:
1564
1565
1566
1567
1568
1569
1570Phillips & Davis Best Current Practice [Page 28]
1571
1572RFC 5646 Language Tags September 2009
1573
1574
1575 1. ISO 639 language codes that were later withdrawn in favor of
1576 other codes. These values are mostly a historical curiosity.
1577 The 'he'/'iw' pairing above is an example of this.
1578
1579 2. Subtags (with types other than language or extlang) taken from
1580 codes or values that have been withdrawn in favor of a new code.
1581 In particular, this applies to region subtags taken from ISO
1582 3166-1, because sometimes a country will change its name or
1583 administration in such a way that warrants a new region code. In
1584 some cases, countries have reverted to an older name, which might
1585 already be encoded. For example, the subtag 'ZR' (Zaire) was
1586 replaced by the subtag 'CD' (Democratic Republic of the Congo)
1587 when that country's name was changed.
1588
1589 3. Tags or subtags that have become obsolete because the values they
1590 represent were later encoded. Many of the grandfathered or
1591 redundant tags were later encoded by ISO 639, for example, and
1592 fall into this grouping. For example, "i-klingon" was deprecated
1593 when the subtag 'tlh' was added. The record for "i-klingon" has
1594 a 'Preferred-Value' of 'tlh'.
1595
1596 4. Extended language subtags always have a mapping to their
1597 identical primary language subtag. For example, the extended
1598 language subtag 'yue' (Cantonese) can be used to form the tag
1599 "zh-yue". It has a 'Preferred-Value' mapping to the primary
1600 language subtag 'yue', meaning that a tag such as
1601 "zh-yue-Hant-HK" can be canonicalized to "yue-Hant-HK".
1602
1603 Records other than those of type 'extlang' that contain a 'Preferred-
1604 Value' field MUST also have a 'Deprecated' field. This field
1605 contains the date on which the tag or subtag was deprecated in favor
1606 of the preferred value.
1607
1608 For records of type 'extlang', the 'Preferred-Value' field appears
1609 without a corresponding 'Deprecated' field. An implementation MAY
1610 ignore these preferred value mappings, although if it ignores the
1611 mapping, it SHOULD do so consistently. It SHOULD also treat the
1612 'Preferred-Value' as equivalent to the mapped item. For example, the
1613 tags "zh-yue-Hant-HK" and "yue-Hant-HK" are semantically equivalent
1614 and ought to be treated as if they were the same tag.
1615
1616 Occasionally, the deprecated code is preferred in certain contexts.
1617 For example, both "iw" and "he" can be used in the Java programming
1618 language, but "he" is converted on input to "iw", which is thus the
1619 canonical form in Java.
1620
1621
1622
1623
1624
1625
1626Phillips & Davis Best Current Practice [Page 29]
1627
1628RFC 5646 Language Tags September 2009
1629
1630
1631 'Preferred-Value' mappings in records of type 'region' sometimes do
1632 not represent exactly the same meaning as the original value. There
1633 are many reasons for a country code to be changed, and the effect
1634 this has on the formation of language tags will depend on the nature
1635 of the change in question. For example, the region subtag 'YD'
1636 (Democratic Yemen) was deprecated in favor of the subtag 'YE' (Yemen)
1637 when those two countries unified in 1990.
1638
1639 A 'Preferred-Value' MAY be added to, changed, or removed from records
1640 according to the rules in Section 3.3. Addition, modification, or
1641 removal of a 'Preferred-Value' field in a record does not imply that
1642 content using the affected subtag needs to be retagged.
1643
1644 The 'Preferred-Value' fields in records of type "grandfathered" and
1645 "redundant" each contain an "extended language range" [RFC4647] that
1646 is strongly RECOMMENDED for use in place of the record's value. In
1647 many cases, these mappings were created via deprecation of the tags
1648 during the period before [RFC4646] was adopted. For example, the tag
1649 "no-nyn" was deprecated in favor of the ISO 639-1-defined language
1650 code 'nn'.
1651
1652 The 'Preferred-Value' field in subtag records of type "extlang" also
1653 contains an "extended language range". This allows the subtag to be
1654 deprecated in favor of either a single primary language subtag or a
1655 new language-extlang sequence.
1656
1657 Usually, the addition, removal, or change of a 'Preferred-Value'
1658 field for a subtag is done to reflect changes in one of the source
1659 standards. For example, if an ISO 3166-1 region code is deprecated
1660 in favor of another code, that SHOULD result in the addition of a
1661 'Preferred-Value' field.
1662
1663 Changes to one subtag can affect other subtags as well: when
1664 proposing changes to the registry, the Language Subtag Reviewer MUST
1665 review the registry for such effects and propose the necessary
1666 changes using the process in Section 3.5, although anyone MAY request
1667 such changes. For example:
1668
1669 Suppose that subtag 'XX' has a 'Preferred-Value' of 'YY'. If 'YY'
1670 later changes to have a 'Preferred-Value' of 'ZZ', then the
1671 'Preferred-Value' for 'XX' MUST also change to be 'ZZ'.
1672
1673 Suppose that a registered language subtag 'dialect' represents a
1674 language not yet available in any part of ISO 639. The later
1675 addition of a corresponding language code in ISO 639 SHOULD result
1676 in the addition of a 'Preferred-Value' for 'dialect'.
1677
1678
1679
1680
1681
1682Phillips & Davis Best Current Practice [Page 30]
1683
1684RFC 5646 Language Tags September 2009
1685
1686
16873.1.8. Prefix Field
1688
1689 The field 'Prefix' contains a valid language tag that is RECOMMENDED
1690 as one possible prefix to this record's subtag, perhaps with other
1691 subtags. That is, when including an extended language or a variant
1692 subtag that has at least one 'Prefix' in a language tag, the
1693 resulting tag SHOULD match at least one of the subtag's 'Prefix'
1694 fields using the "Extended Filtering" algorithm (see [RFC4647]), and
1695 each of the subtags in that 'Prefix' SHOULD appear before the subtag
1696 itself.
1697
1698 The 'Prefix' field MUST appear exactly once in a record of type
1699 'extlang'. The 'Prefix' field MAY appear multiple times (or not at
1700 all) in records of type 'variant'. Additional fields of this type
1701 MAY be added to a 'variant' record via the registration process,
1702 provided the 'variant' record already has at least one 'Prefix'
1703 field.
1704
1705 Each 'Prefix' field indicates a particular sequence of subtags that
1706 form a meaningful tag with this subtag. For example, the extended
1707 language subtag 'cmn' (Mandarin Chinese) only makes sense with its
1708 prefix 'zh' (Chinese). Similarly, 'rozaj' (Resian, a dialect of
1709 Slovenian) would be appropriate when used with its prefix 'sl'
1710 (Slovenian), while tags such as "is-1994" are not appropriate (and
1711 probably not meaningful). Although the 'Prefix' for 'rozaj' is "sl",
1712 other subtags might appear between them. For example, the tag "sl-
1713 IT-rozaj" (Slovenian, Italy, Resian) matches the 'Prefix' "sl".
1714
1715 The 'Prefix' also indicates when variant subtags make sense when used
1716 together (many that otherwise share a 'Prefix' are mutually
1717 exclusive) and what the relative ordering of variants is supposed to
1718 be. For example, the variant '1994' (Standardized Resian
1719 orthography) has several 'Prefix' fields in the registry ("sl-rozaj",
1720 "sl-rozaj-biske", "sl-rozaj-njiva", "sl-rozaj-osojs", and "sl-rozaj-
1721 solba"). This indicates not only that '1994' is appropriate to use
1722 with each of these five Resian variant subtags ('rozaj', 'biske',
1723 'njiva', 'osojs', and 'solba'), but also that it SHOULD appear
1724 following any of these variants in a tag. Thus, the language tag
1725 ought to take the form "sl-rozaj-biske-1994", rather than "sl-1994-
1726 rozaj-biske" or "sl-rozaj-1994-biske".
1727
1728 If a record includes no 'Prefix' field, a 'Prefix' field MUST NOT be
1729 added to the record at a later date. Otherwise, changes (additions,
1730 deletions, or modifications) to the set of 'Prefix' fields MAY be
1731 registered, as long as they strictly widen the range of language tags
1732 that are recommended. For example, a 'Prefix' with the value "be-
1733 Latn" (Belarusian, Latin script) could be replaced by the value "be"
1734 (Belarusian) but not by the value "ru-Latn" (Russian, Latin script)
1735
1736
1737
1738Phillips & Davis Best Current Practice [Page 31]
1739
1740RFC 5646 Language Tags September 2009
1741
1742
1743 or the value "be-Latn-BY" (Belarusian, Latin script, Belarus), since
1744 these latter either change or narrow the range of suggested tags.
1745
1746 The field-body of the 'Prefix' field MUST NOT conflict with any
1747 'Prefix' already registered for a given record. Such a conflict
1748 would occur when no valid tag could be constructed that would contain
1749 the prefix, such as when two subtags each have a 'Prefix' that
1750 contains the other subtag. For example, suppose that the subtag
1751 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant'
1752 cannot be assigned the prefix 'avariant', for that would require a
1753 tag of the form "es-avariant-bvariant-avariant", which would not be
1754 valid.
1755
17563.1.9. Suppress-Script Field
1757
1758 The field 'Suppress-Script' contains a script subtag (whose record
1759 appears in the registry). The field 'Suppress-Script' MUST appear
1760 only in records whose 'Type' field-body is either 'language' or
1761 'extlang'. This field MUST NOT appear more than one time in a
1762 record.
1763
1764 This field indicates a script used to write the overwhelming majority
1765 of documents for the given language. The subtag for such a script
1766 therefore adds no distinguishing information to a language tag and
1767 thus SHOULD NOT be used for most documents in that language.
1768 Omitting the script subtag indicated by this field helps ensure
1769 greater compatibility between the language tags generated according
1770 to the rules in this document and language tags and tag processors or
1771 consumers based on RFC 3066. For example, virtually all Icelandic
1772 documents are written in the Latin script, making the subtag 'Latn'
1773 redundant in the tag "is-Latn".
1774
1775 Many language subtag records do not have a 'Suppress-Script' field.
1776 The lack of a 'Suppress-Script' might indicate that the language is
1777 customarily written in more than one script or that the language is
1778 not customarily written at all. It might also mean that sufficient
1779 information was not available when the record was created and thus
1780 remains a candidate for future registration.
1781
17823.1.10. Macrolanguage Field
1783
1784 The field 'Macrolanguage' contains a primary language subtag (whose
1785 record appears in the registry). This field indicates a language
1786 that encompasses this subtag's language according to assignments made
1787 by ISO 639-3.
1788
1789 ISO 639-3 labels some languages in the registry as "macrolanguages".
1790 ISO 639-3 defines the term "macrolanguage" to mean "clusters of
1791
1792
1793
1794Phillips & Davis Best Current Practice [Page 32]
1795
1796RFC 5646 Language Tags September 2009
1797
1798
1799 closely-related language varieties that [...] can be considered
1800 distinct individual languages, yet in certain usage contexts a single
1801 language identity for all is needed". These correspond to codes
1802 registered in ISO 639-2 as individual languages that were found to
1803 correspond to more than one language in ISO 639-3.
1804
1805 A language contained within a macrolanguage is called an "encompassed
1806 language". The record for each encompassed language contains a
1807 'Macrolanguage' field in the registry; the macrolanguages themselves
1808 are not specially marked. Note that some encompassed languages have
1809 ISO 639-1 or ISO 639-2 codes.
1810
1811 The 'Macrolanguage' field can only occur in records of type
1812 'language' or 'extlang'. Only values assigned by ISO 639-3 will be
1813 considered for inclusion. 'Macrolanguage' fields MAY be added or
1814 removed via the normal registration process whenever ISO 639-3
1815 defines new values or withdraws old values. Macrolanguages are
1816 informational, and MAY be removed or changed if ISO 639-3 changes the
1817 values. For more information on the use of this field and choosing
1818 between macrolanguage and encompassed language subtags, see
1819 Section 4.1.1.
1820
1821 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn'
1822 (Norwegian Nynorsk) each have a 'Macrolanguage' field with a value of
1823 'no' (Norwegian). For more information, see Section 4.1.
1824
18253.1.11. Scope Field
1826
1827 The field 'Scope' contains classification information about a primary
1828 or extended language subtag derived from ISO 639. Most languages
1829 have a scope of 'individual', which means that the language is not a
1830 macrolanguage, collection, special code, or private use. That is, it
1831 is what one would normally consider to be 'a language'. Any primary
1832 or extended language subtag that has no 'Scope' field is an
1833 individual language.
1834
1835 'Scope' information can sometimes be helpful in selecting language
1836 tags, since it indicates the purpose or "scope" of the code
1837 assignment within ISO 639. The available values are:
1838
1839 o 'macrolanguage' - Indicates a macrolanguage as defined by ISO
1840 639-3 (see Section 3.1.10). A macrolanguage is a cluster of
1841 closely related languages that are sometimes considered to be a
1842 single language.
1843
1844 o 'collection' - Indicates a subtag that represents a collection of
1845 languages, typically related by some type of historical,
1846 geographical, or linguistic association. Unlike a macrolanguage,
1847
1848
1849
1850Phillips & Davis Best Current Practice [Page 33]
1851
1852RFC 5646 Language Tags September 2009
1853
1854
1855 a collection can contain languages that are only loosely related
1856 and a collection cannot be used interchangeably with languages
1857 that belong to it.
1858
1859 o 'special' - Indicates a special language code. These are subtags
1860 used for identifying linguistic attributes not particularly
1861 associated with a concrete language. These include codes for when
1862 the language is undetermined or for non-linguistic content.
1863
1864 o 'private-use' - Indicates a code reserved for private use in the
1865 underlying standard. Subtags with this scope can be used to
1866 indicate a primary language for which no ISO 639 or registered
1867 assignment exists.
1868
1869 The 'Scope' field MAY appear in records of type 'language' or
1870 'extlang'. Note that many of the prefixes for extended language
1871 subtags will have a 'Scope' of 'macrolanguage' (although some will
1872 not) and that many languages that have a 'Scope' of 'macrolanguage'
1873 will have extended language subtags associated with them.
1874
1875 The 'Scope' field MAY be added, modified, or removed via the
1876 registration process, provided the change mirrors changes made by ISO
1877 639 to the assignment's classification. Such a change is expected to
1878 be rare.
1879
1880 For example, the primary language subtag 'zh' (Chinese) has a 'Scope'
1881 of 'macrolanguage', while its enclosed language 'nan' (Min Nan
1882 Chinese) has a 'Scope' of 'individual'. The special value 'und'
1883 (Undetermined) has a 'Scope' of 'special'. The ISO 639-5 collection
1884 'gem' (Germanic languages) has a 'Scope' of 'collection'.
1885
18863.1.12. Comments Field
1887
1888 The field 'Comments' contains additional information about the record
1889 and MAY appear more than once per record. The field-body MAY include
1890 the full range of Unicode characters and is not restricted to any
1891 particular script. This field MAY be inserted or changed via the
1892 registration process, and no guarantee of stability is provided.
1893
1894 The content of this field is not restricted, except by the need to
1895 register the information, the suitability of the request, and by
1896 reasonable practical size limitations. The primary reason for the
1897 'Comments' field is subtag identification -- to help distinguish the
1898 subtag from others with which it might be confused as an aid to
1899 usage. Large amounts of information about the use, history, or
1900 general background of a subtag are frowned upon, as these generally
1901 belong in a registration request rather than in the registry.
1902
1903
1904
1905
1906Phillips & Davis Best Current Practice [Page 34]
1907
1908RFC 5646 Language Tags September 2009
1909
1910
19113.2. Language Subtag Reviewer
1912
1913 The Language Subtag Reviewer moderates the ietf-languages@iana.org
1914 mailing list, responds to requests for registration, and performs the
1915 other registry maintenance duties described in Section 3.3. Only the
1916 Language Subtag Reviewer is permitted to request IANA to change,
1917 update, or add records to the Language Subtag Registry. The Language
1918 Subtag Reviewer MAY delegate list moderation and other clerical
1919 duties as needed.
1920
1921 The Language Subtag Reviewer is appointed by the IESG for an
1922 indefinite term, subject to removal or replacement at the IESG's
1923 discretion. The IESG will solicit nominees for the position (upon
1924 adoption of this document or upon a vacancy) and then solicit
1925 feedback on the nominees' qualifications. Qualified candidates
1926 should be familiar with BCP 47 and its requirements; be willing to
1927 fairly, responsively, and judiciously administer the registration
1928 process; and be suitably informed about the issues of language
1929 identification so that the reviewer can assess the claims and draw
1930 upon the contributions of language experts and subtag requesters.
1931
1932 The subsequent performance or decisions of the Language Subtag
1933 Reviewer MAY be appealed to the IESG under the same rules as other
1934 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the
1935 decisions of the Language Subtag Reviewer, provide guidance, or take
1936 other appropriate actions.
1937
19383.3. Maintenance of the Registry
1939
1940 Maintenance of the registry requires that, as codes are assigned or
1941 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
1942 Subtag Reviewer MUST evaluate each change and determine the
1943 appropriate course of action according to the rules in this document.
1944 Such updates follow the registration process described in
1945 Section 3.5. Usually, the Language Subtag Reviewer will start the
1946 process for the new or updated record by filling in the registration
1947 form and submitting it. If a change to one of these standards takes
1948 place and the Language Subtag Reviewer does not do this in a timely
1949 manner, then any interested party MAY submit the form. Thereafter,
1950 the registration process continues normally.
1951
1952 Note that some registrations affect other subtags--perhaps more than
1953 one--as when a region subtag is being deprecated in favor of a new
1954 value. The Language Subtag Reviewer is responsible for ensuring that
1955 any such changes are properly registered, with each change requiring
1956 its own registration form.
1957
1958
1959
1960
1961
1962Phillips & Davis Best Current Practice [Page 35]
1963
1964RFC 5646 Language Tags September 2009
1965
1966
1967 The Language Subtag Reviewer MUST ensure that new subtags meet the
1968 requirements elsewhere in this document (and most especially in
1969 Section 3.4) or submit an appropriate registration form for an
1970 alternate subtag as described in that section. Each individual
1971 subtag affected by a change MUST be sent to the
1972 ietf-languages@iana.org list with its own registration form and in a
1973 separate message.
1974
19753.4. Stability of IANA Registry Entries
1976
1977 The stability of entries and their meaning in the registry is
1978 critical to the long-term stability of language tags. The rules in
1979 this section guarantee that a specific language tag's meaning is
1980 stable over time and will not change.
1981
1982 These rules specifically deal with how changes to codes (including
1983 withdrawal and deprecation of codes) maintained by ISO 639, ISO
1984 15924, ISO 3166, and UN M.49 are reflected in the IANA Language
1985 Subtag Registry. Assignments to the IANA Language Subtag Registry
1986 MUST follow the following stability rules:
1987
1988 1. Values in the fields 'Type', 'Subtag', 'Tag', and 'Added' MUST
1989 NOT be changed and are guaranteed to be stable over time.
1990
1991 2. Values in the fields 'Preferred-Value' and 'Deprecated' MAY be
1992 added, altered, or removed via the registration process. These
1993 changes SHOULD be limited to changes necessary to mirror changes
1994 in one of the underlying standards (ISO 639, ISO 15924, ISO
1995 3166-1, or UN M.49) and typically alteration or removal of a
1996 'Preferred-Value' is limited specifically to region codes.
1997
1998 3. Values in the 'Description' field MUST NOT be changed in a way
1999 that would invalidate any existing tags. The description MAY be
2000 broadened somewhat in scope, changed to add information, or
2001 adapted to the most common modern usage. For example, countries
2002 occasionally change their names; a historical example of this is
2003 "Upper Volta" changing to "Burkina Faso".
2004
2005 4. Values in the field 'Prefix' MAY be added to existing records of
2006 type 'variant' via the registration process, provided the
2007 'variant' already has at least one 'Prefix'. A 'Prefix' field
2008 SHALL NOT be registered for any 'variant' that has no existing
2009 'Prefix' field. If a prefix is added to a variant record,
2010 'Comment' fields MAY be used to explain different usages with
2011 the various prefixes.
2012
2013
2014
2015
2016
2017
2018Phillips & Davis Best Current Practice [Page 36]
2019
2020RFC 5646 Language Tags September 2009
2021
2022
2023 5. Values in the field 'Prefix' in records of type 'variant' MAY
2024 also be modified, so long as the modifications broaden the set
2025 of prefixes. That is, a prefix MAY be replaced by one of its
2026 own prefixes. For example, the prefix "en-US" could be replaced
2027 by "en", but not by the prefixes "en-Latn", "fr", or "en-US-
2028 boont". If one of those prefix values were needed, it would
2029 have to be separately registered.
2030
2031 6. Values in the field 'Prefix' in records of type 'extlang' MUST
2032 NOT be added, modified, or removed.
2033
2034 7. The field 'Prefix' MUST NOT be removed from any record in which
2035 it appears. This field SHOULD be included in the initial
2036 registration of any records of type 'variant' and MUST be
2037 included in any records of type 'extlang'.
2038
2039 8. The field 'Comments' MAY be added, changed, modified, or removed
2040 via the registration process or any of the processes or
2041 considerations described in this section.
2042
2043 9. The field 'Suppress-Script' MAY be added or removed via the
2044 registration process.
2045
2046 10. The field 'Macrolanguage' MAY be added or removed via the
2047 registration process, but only in response to changes made by
2048 ISO 639. The 'Macrolanguage' field appears whenever a language
2049 has a corresponding macrolanguage in ISO 639. That is, the
2050 'Macrolanguage' fields in the registry exactly match those of
2051 ISO 639. No other macrolanguage mappings will be considered for
2052 registration.
2053
2054 11. The field 'Scope' MAY be added or removed from a primary or
2055 extended language subtag after initial registration, and it MAY
2056 be modified in order to match any changes made by ISO 639.
2057 Changes to the 'Scope' field MUST mirror changes made by ISO
2058 639. Note that primary or extended language subtags whose
2059 records do not contain a 'Scope' field (that is, most of them)
2060 are individual languages as described in Section 3.1.11.
2061
2062 12. Primary and extended language subtags (other than independently
2063 registered values created using the registration process) are
2064 created according to the assignments of the various parts of ISO
2065 639, as follows:
2066
2067 A. Codes assigned by ISO 639-1 that do not conflict with
2068 existing two-letter primary language subtags and that have
2069 no corresponding three-letter primary defined in the
2070 registry are entered into the IANA registry as new records
2071
2072
2073
2074Phillips & Davis Best Current Practice [Page 37]
2075
2076RFC 5646 Language Tags September 2009
2077
2078
2079 of type 'language'. Note that languages given an ISO 639-1
2080 code cannot be given extended language subtags, even if
2081 encompassed by a macrolanguage.
2082
2083 B. Codes assigned by ISO 639-3 or ISO 639-5 that do not
2084 conflict with existing three-letter primary language subtags
2085 and that do not have ISO 639-1 codes assigned (or expected
2086 to be assigned) are entered into the IANA registry as new
2087 records of type 'language'. Note that these two standards
2088 now comprise a superset of ISO 639-2 codes. Codes that have
2089 a defined 'macrolanguage' mapping at the time of their
2090 registration MUST contain a 'Macrolanguage' field.
2091
2092 C. Codes assigned by ISO 639-3 MAY also be considered for an
2093 extended language subtag registration. Note that they MUST
2094 be assigned a primary language subtag record of type
2095 'language' even when an 'extlang' record is proposed. When
2096 considering extended language subtag assignment, these
2097 criteria apply:
2098
2099 1. If a language has a macrolanguage mapping, and that
2100 macrolanguage has other encompassed languages that are
2101 assigned extended language subtags, then the new
2102 language SHOULD have an 'extlang' record assigned to it
2103 as well. For example, any language with a macrolanguage
2104 of 'zh' or 'ar' would be assigned an 'extlang' record.
2105
2106 2. 'Extlang' records SHOULD NOT be created for languages if
2107 other languages encompassed by the macrolanguage do not
2108 also include 'extlang' records. For example, if a new
2109 Serbo-Croatian ('sh') language were registered, it would
2110 not get an extlang record because other languages
2111 encompassed, such as Serbian ('sr'), do not include one
2112 in the registry.
2113
2114 3. Sign languages SHOULD have an 'extlang' record with a
2115 'Prefix' of 'sgn'.
2116
2117 4. 'Extlang' records MUST NOT be created for items already
2118 in the registry. Extended language subtags will only be
2119 considered at the time of initial registration.
2120
2121 5. Extended language subtag records MUST include the fields
2122 'Prefix' and 'Preferred-Value' with field values
2123 assigned as described in Section 2.2.2.
2124
2125 D. Any other codes assigned by ISO 639-2 that do not conflict
2126 with existing three-letter primary or extended language
2127
2128
2129
2130Phillips & Davis Best Current Practice [Page 38]
2131
2132RFC 5646 Language Tags September 2009
2133
2134
2135 subtags and that do not have ISO 639-1 two-letter codes
2136 assigned are entered into the IANA registry as new records
2137 of type 'language'. This type of registration is not
2138 supposed to occur in the future.
2139
2140 13. Codes assigned by ISO 15924 and ISO 3166-1 that do not conflict
2141 with existing subtags of the associated type and whose meaning
2142 is not the same as an existing subtag of the same type are
2143 entered into the IANA registry as new records.
2144
2145 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that are
2146 withdrawn by their respective maintenance or registration
2147 authority remain valid in language tags. A 'Deprecated' field
2148 containing the date of withdrawal MUST be added to the record.
2149 If a new record of the same type is added that represents a
2150 replacement value, then a 'Preferred-Value' field MAY also be
2151 added. The registration process MAY be used to add comments
2152 about the withdrawal of the code by the respective standard.
2153
2154 For example: the region code 'TL' was assigned to the country
2155 'Timor-Leste', replacing the code 'TP' (which was assigned to
2156 'East Timor' when it was under administration by Portugal).
2157 The subtag 'TP' remains valid in language tags, but its
2158 record contains the 'Preferred-Value' of 'TL' and its field
2159 'Deprecated' contains the date the new code was assigned
2160 ('2004-07-06').
2161
2162 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that
2163 conflict with existing subtags of the associated type, including
2164 subtags that are deprecated, MUST NOT be entered into the
2165 registry. The following additional considerations apply to
2166 subtag values that are reassigned:
2167
2168 A. For ISO 639 codes, if the newly assigned code's meaning is
2169 not represented by a subtag in the IANA registry, the
2170 Language Subtag Reviewer, as described in Section 3.5, SHALL
2171 prepare a proposal for entering in the IANA registry, as
2172 soon as practical, a registered language subtag as an
2173 alternate value for the new code. The form of the
2174 registered language subtag will be at the discretion of the
2175 Language Subtag Reviewer and MUST conform to other
2176 restrictions on language subtags in this document.
2177
2178 B. For all subtags whose meaning is derived from an external
2179 standard (that is, by ISO 639, ISO 15924, ISO 3166-1, or UN
2180 M.49), if a new meaning is assigned to an existing code and
2181 the new meaning broadens the meaning of that code, then the
2182 meaning for the associated subtag MAY be changed to match.
2183
2184
2185
2186Phillips & Davis Best Current Practice [Page 39]
2187
2188RFC 5646 Language Tags September 2009
2189
2190
2191 The meaning of a subtag MUST NOT be narrowed, however, as
2192 this can result in an unknown proportion of the existing
2193 uses of a subtag becoming invalid. Note: the ISO 639
2194 registration authority (RA) has adopted a similar stability
2195 policy.
2196
2197 C. For ISO 15924 codes, if the newly assigned code's meaning is
2198 not represented by a subtag in the IANA registry, the
2199 Language Subtag Reviewer, as described in Section 3.5, SHALL
2200 prepare a proposal for entering in the IANA registry, as
2201 soon as practical, a registered variant subtag as an
2202 alternate value for the new code. The form of the
2203 registered variant subtag will be at the discretion of the
2204 Language Subtag Reviewer and MUST conform to other
2205 restrictions on variant subtags in this document.
2206
2207 D. For ISO 3166-1 codes, if the newly assigned code's meaning
2208 is associated with the same UN M.49 code as another 'region'
2209 subtag, then the existing region subtag remains as the
2210 preferred value for that region and no new entry is created.
2211 A comment MAY be added to the existing region subtag
2212 indicating the relationship to the new ISO 3166-1 code.
2213
2214 E. For ISO 3166-1 codes, if the newly assigned code's meaning
2215 is associated with a UN M.49 code that is not represented by
2216 an existing region subtag, then the Language Subtag
2217 Reviewer, as described in Section 3.5, SHALL prepare a
2218 proposal for entering the appropriate UN M.49 country code
2219 as an entry in the IANA registry.
2220
2221 F. For ISO 3166-1 codes, if there is no associated UN numeric
2222 code, then the Language Subtag Reviewer SHALL petition the
2223 UN to create one. If there is no response from the UN
2224 within 90 days of the request being sent, the Language
2225 Subtag Reviewer SHALL prepare a proposal for entering in the
2226 IANA registry, as soon as practical, a registered variant
2227 subtag as an alternate value for the new code. The form of
2228 the registered variant subtag will be at the discretion of
2229 the Language Subtag Reviewer and MUST conform to other
2230 restrictions on variant subtags in this document. This
2231 situation is very unlikely to ever occur.
2232
2233 16. UN M.49 has codes for both "countries and areas" (such as '276'
2234 for Germany) and "geographical regions and sub-regions" (such as
2235 '150' for Europe). UN M.49 country or area codes for which
2236 there is no corresponding ISO 3166-1 code MUST NOT be
2237 registered, except as a surrogate for an ISO 3166-1 code that is
2238 blocked from registration by an existing subtag.
2239
2240
2241
2242Phillips & Davis Best Current Practice [Page 40]
2243
2244RFC 5646 Language Tags September 2009
2245
2246
2247 If such a code becomes necessary, then the maintenance agency
2248 for ISO 3166-1 SHALL first be petitioned to assign a code to the
2249 region. If the petition for a code assignment by ISO 3166-1 is
2250 refused or not acted on in a timely manner, the registration
2251 process described in Section 3.5 can then be used to register
2252 the corresponding UN M.49 code. This way, UN M.49 codes remain
2253 available as the value of last resort in cases where ISO 3166-1
2254 reassigns a deprecated value in the registry.
2255
2256 17. The redundant and grandfathered entries together form the
2257 complete list of tags registered under [RFC3066]. The redundant
2258 tags are those previously registered tags that can now be formed
2259 using the subtags defined in the registry. The grandfathered
2260 entries include those that can never be legal because they are
2261 'irregular' (that is, they do not match the 'langtag' production
2262 in Figure 1), are limited by rule (subtags such as 'nyn' and
2263 'min' look like the extlang production, but cannot be registered
2264 as extended language subtags), or their subtags are
2265 inappropriate for registration. All of the grandfathered tags
2266 are listed in either the 'regular' or the 'irregular'
2267 productions in the ABNF. Under [RFC4646] it was possible for
2268 grandfathered tags to become redundant. However, all of the
2269 tags for which this was possible became redundant before this
2270 document was produced. So the set of redundant and
2271 grandfathered tags is now permanent and immutable: new entries
2272 of either type MUST NOT be added and existing entries MUST NOT
2273 be removed. The decision-making process about which tags were
2274 initially grandfathered and which were made redundant is
2275 described in [RFC4645].
2276
2277 Many of the grandfathered tags are deprecated -- indeed, they
2278 were deprecated even before [RFC4646]. For example, the tag
2279 "art-lojban" was deprecated in favor of the primary language
2280 subtag 'jbo'. These tags could have been made 'redundant' by
2281 registering some of their subtags as 'variants'. The 'variant-
2282 like' subtags in the grandfathered registrations SHALL NOT be
2283 registered in the future, even with a similar or identical
2284 meaning.
2285
22863.5. Registration Procedure for Subtags
2287
2288 The procedure given here MUST be used by anyone who wants to use a
2289 subtag not currently in the IANA Language Subtag Registry or who
2290 wishes to add, modify, update, or remove information in existing
2291 records as permitted by this document.
2292
2293 Only subtags of type 'language' and 'variant' will be considered for
2294 independent registration of new subtags. Subtags needed for
2295
2296
2297
2298Phillips & Davis Best Current Practice [Page 41]
2299
2300RFC 5646 Language Tags September 2009
2301
2302
2303 stability and subtags necessary to keep the registry synchronized
2304 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
2305 defined by this document also use this process, as described in
2306 Section 3.3 and subject to stability provisions as described in
2307 Section 3.4.
2308
2309 Registration requests are accepted relating to information in the
2310 'Comments', 'Deprecated', 'Description', 'Prefix', 'Preferred-Value',
2311 'Macrolanguage', or 'Suppress-Script' fields in a subtag's record as
2312 described in Section 3.4. Changes to all other fields in the IANA
2313 registry are NOT permitted.
2314
2315 Registering a new subtag or requesting modifications to an existing
2316 tag or subtag starts with the requester filling out the registration
2317 form reproduced below. Note that each response is not limited in
2318 size so that the request can adequately describe the registration.
2319 The fields in the "Record Requested" section need to follow the
2320 requirements in Section 3.1 before the record will be approved.
2321
2322 LANGUAGE SUBTAG REGISTRATION FORM
2323 1. Name of requester:
2324 2. E-mail address of requester:
2325 3. Record Requested:
2326
2327 Type:
2328 Subtag:
2329 Description:
2330 Prefix:
2331 Preferred-Value:
2332 Deprecated:
2333 Suppress-Script:
2334 Macrolanguage:
2335 Comments:
2336
2337 4. Intended meaning of the subtag:
2338 5. Reference to published description
2339 of the language (book or article):
2340 6. Any other relevant information:
2341
2342 Figure 5: The Language Subtag Registration Form
2343
2344 Examples of completed registration forms can be found in Appendix B.
2345 A complete list of approved registration forms is online through
2346 http://www.iana.org; readers should note that the Language Tag
2347 Registry is now obsolete and should instead look for the Language
2348 Subtag Registry.
2349
2350
2351
2352
2353
2354Phillips & Davis Best Current Practice [Page 42]
2355
2356RFC 5646 Language Tags September 2009
2357
2358
2359 The subtag registration form MUST be sent to
2360 <ietf-languages@iana.org>. Registration requests receive a two-week
2361 review period before being approved and submitted to IANA for
2362 inclusion in the registry. If modifications are made to the request
2363 during the course of the registration process (such as corrections to
2364 meet the requirements in Section 3.1 or to make the 'Description'
2365 fields unique for the given record type), the modified form MUST also
2366 be sent to <ietf-languages@iana.org> at least one week prior to
2367 submission to IANA.
2368
2369 The ietf-languages list is an open list and can be joined by sending
2370 a request to <ietf-languages-request@iana.org>. The list can be
2371 hosted by IANA or any third party at the request of IESG.
2372
2373 Before forwarding any registration to IANA, the Language Subtag
2374 Reviewer MUST ensure that all requirements in this document are met.
2375 This includes ensuring that values in the 'Subtag' field match case
2376 according to the description in Section 3.1.4 and that 'Description'
2377 fields are unique for the given record type as described in
2378 Section 3.1.5. The Reviewer MUST also ensure that an appropriate
2379 File-Date record is included in the request, to assist IANA when
2380 updating the registry (see Section 5.1).
2381
2382 Some fields in both the registration form as well as the registry
2383 record itself permit the use of non-ASCII characters. Registration
2384 requests SHOULD use the UTF-8 encoding for consistency and clarity.
2385 However, since some mail clients do not support this encoding, other
2386 encodings MAY be used for the registration request. The Language
2387 Subtag Reviewer is responsible for ensuring that the proper Unicode
2388 characters appear in both the archived request form and the registry
2389 record. In the case of a transcription or encoding error by IANA,
2390 the Language Subtag Reviewer will request that the registry be
2391 repaired, providing any necessary information to assist IANA.
2392
2393 Extended language subtags (type 'extlang'), by definition, are always
2394 encompassed by another language. All records of type 'extlang' MUST,
2395 therefore, contain a 'Prefix' field at the time of registration.
2396 This 'Prefix' field can never be altered or removed, and requests to
2397 do so MUST be rejected.
2398
2399 Variant subtags are usually registered for use with a particular
2400 range of language tags, and variant subtags based on the terminology
2401 of the language to which they are apply are encouraged. For example,
2402 the subtag 'rozaj' (Resian) is intended for use with language tags
2403 that start with the primary language subtag "sl" (Slovenian), since
2404 Resian is a dialect of Slovenian. Thus, the subtag 'rozaj' would be
2405 appropriate in tags such as "sl-Latn-rozaj" or "sl-IT-rozaj". This
2406 information is stored in the 'Prefix' field in the registry. Variant
2407
2408
2409
2410Phillips & Davis Best Current Practice [Page 43]
2411
2412RFC 5646 Language Tags September 2009
2413
2414
2415 registration requests SHOULD include at least one 'Prefix' field in
2416 the registration form.
2417
2418 Requests to assign an additional record of a given type with an
2419 existing subtag value MUST be rejected. For example, the variant
2420 subtag 'rozaj' already exists in the registry, so adding a second
2421 record of type 'variant' with the subtag 'rozaj' is prohibited.
2422
2423 The 'Prefix' field for a given registered variant subtag exists in
2424 the IANA registry as a guide to usage. Additional 'Prefix' fields
2425 MAY be added by filing an additional registration form. In that
2426 form, the "Any other relevant information:" field MUST indicate that
2427 it is the addition of a prefix.
2428
2429 Requests to add a 'Prefix' field to a variant subtag that imply a
2430 different semantic meaning SHOULD be rejected. For example, a
2431 request to add the prefix "de" to the subtag '1994' so that the tag
2432 "de-1994" represented some German dialect or orthographic form would
2433 be rejected. The '1994' subtag represents a particular Slovenian
2434 orthography, and the additional registration would change or blur the
2435 semantic meaning assigned to the subtag. A separate subtag SHOULD be
2436 proposed instead.
2437
2438 Requests to add a 'Prefix' to a variant subtag that has no current
2439 'Prefix' field MUST be rejected. Variants are registered with no
2440 prefix because they are potentially useful with many or even all
2441 languages. Adding one or more 'Prefix' fields would be potentially
2442 harmful to the use of the variant, since it dramatically reduces the
2443 scope of the subtag (which is not allowed under the stability rules
2444 (Section 3.4) as opposed to broadening the scope of the subtag, which
2445 is what the addition of a 'Prefix' normally does. An example of such
2446 a "no-prefix" variant is the subtag 'fonipa', which represents the
2447 International Phonetic Alphabet, a scheme that can be used to
2448 transcribe many languages.
2449
2450 The 'Description' fields provided in the request MUST contain at
2451 least one description written or transcribed into the Latin script;
2452 the request MAY also include additional 'Description' fields in any
2453 script or language. The 'Description' field is used for
2454 identification purposes and doesn't necessarily represent the actual
2455 native name of the language or variation. It also doesn't have to be
2456 in any particular language, but SHOULD be both suitable and
2457 sufficient to identify the item in the record. The Language Subtag
2458 Reviewer will check and edit any proposed 'Description' fields so as
2459 to ensure uniqueness and prevent collisions with 'Description' fields
2460 in other records of the same type. If this occurs in an independent
2461 registration request, the Language Subtag Reviewer MUST resubmit the
2462 record to <ietf-languages@iana.org>, treating it as a modification of
2463
2464
2465
2466Phillips & Davis Best Current Practice [Page 44]
2467
2468RFC 5646 Language Tags September 2009
2469
2470
2471 a request due to discussion, as described in Section 3.5, unless the
2472 request's sole purpose is to introduce a duplicate 'Description'
2473 field, in which case the request SHALL be rejected.
2474
2475 The 'Description' field is not guaranteed to be stable. Corrections
2476 or clarifications of intent are examples of possible changes.
2477 Attempts to provide translations or transcriptions of entries in the
2478 registry (which, by definition, provide no new information) are
2479 unlikely to be approved.
2480
2481 Soon after the two-week review period has passed, the Language Subtag
2482 Reviewer MUST take one of the following actions:
2483
2484 o Explicitly accept the request and forward the form containing the
2485 record to be inserted or modified to <iana@iana.org> according to
2486 the procedure described in Section 3.3.
2487
2488 o Explicitly reject the request because of significant objections
2489 raised on the list or due to problems with constraints in this
2490 document (which MUST be explicitly cited).
2491
2492 o Extend the review period by granting an additional two-week
2493 increment to permit further discussion. After each two-week
2494 increment, the Language Subtag Reviewer MUST indicate on the list
2495 whether the registration has been accepted, rejected, or extended.
2496
2497 Note that the Language Subtag Reviewer MAY raise objections on the
2498 list if he or she so desires. The important thing is that the
2499 objection MUST be made publicly.
2500
2501 Sometimes the request needs to be modified as a result of discussion
2502 during the review period or due to requirements in this document.
2503 The applicant, Language Subtag Reviewer, or others MAY submit a
2504 modified version of the completed registration form, which will be
2505 considered in lieu of the original request with the explicit approval
2506 of the applicant. Such changes do not restart the two-week
2507 discussion period, although an application containing the final
2508 record submitted to IANA MUST appear on the list at least one week
2509 prior to the Language Subtag Reviewer forwarding the record to IANA.
2510 The applicant MAY modify a rejected application with more appropriate
2511 or additional information and submit it again; this starts a new two-
2512 week comment period.
2513
2514 Registrations initiated due to the provisions of Section 3.3 or
2515 Section 3.4 SHALL NOT be rejected altogether (since they have to
2516 ultimately appear in the registry) and SHOULD be completed as quickly
2517 as possible. The review process allows list members to comment on
2518 the specific information in the form and the record it contains and
2519
2520
2521
2522Phillips & Davis Best Current Practice [Page 45]
2523
2524RFC 5646 Language Tags September 2009
2525
2526
2527 thus help ensure that it is correct and consistent. The Language
2528 Subtag Reviewer MAY reject a specific version of the form, but MUST
2529 propose a suitable replacement, extending the review period as
2530 described above, until the form is in a format worthy of the
2531 reviewer's approval and meets with rough consensus of the list.
2532
2533 Decisions made by the Language Subtag Reviewer MAY be appealed to the
2534 IESG [RFC2028] under the same rules as other IETF decisions
2535 [RFC2026]. This includes a decision to extend the review period or
2536 the failure to announce a decision in a clear and timely manner.
2537
2538 The approved records appear in the Language Subtag Registry. The
2539 approved registration forms are available online from
2540 http://www.iana.org.
2541
2542 Updates or changes to existing records follow the same procedure as
2543 new registrations. The Language Subtag Reviewer decides whether
2544 there is consensus to update the registration following the two-week
2545 review period; normally, objections by the original registrant will
2546 carry extra weight in forming such a consensus.
2547
2548 Registrations are permanent and stable. Once registered, subtags
2549 will not be removed from the registry and will remain a valid way in
2550 which to specify a specific language or variant.
2551
2552 Note: The purpose of the "Reference to published description" section
2553 in the registration form is to aid in verifying whether a language is
2554 registered or to which language or language variation a particular
2555 subtag refers. In most cases, reference to an authoritative grammar
2556 or dictionary of that language will be useful; in cases where no such
2557 work exists, other well-known works describing that language or in
2558 that language MAY be appropriate. The Language Subtag Reviewer
2559 decides what constitutes "good enough" reference material. This
2560 requirement is not intended to exclude particular languages or
2561 dialects due to the size of the speaker population or lack of a
2562 standardized orthography. Minority languages will be considered
2563 equally on their own merits.
2564
25653.6. Possibilities for Registration
2566
2567 Possibilities for registration of subtags or information about
2568 subtags include:
2569
2570 o Primary language subtags for languages not listed in ISO 639 that
2571 are not variants of any listed or registered language MAY be
2572 registered. At the time this document was created, there were no
2573 examples of this form of subtag. Before attempting to register a
2574 language subtag, there MUST be an attempt to register the language
2575
2576
2577
2578Phillips & Davis Best Current Practice [Page 46]
2579
2580RFC 5646 Language Tags September 2009
2581
2582
2583 with ISO 639. Subtags MUST NOT be registered for languages
2584 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3;
2585 that are under consideration by the ISO 639 registration
2586 authorities; or that have never been attempted for registration
2587 with those authorities. If ISO 639 has previously rejected a
2588 language for registration, it is reasonable to assume that there
2589 must be additional, very compelling evidence of need before it
2590 will be registered as a primary language subtag in the IANA
2591 registry (to the extent that it is very unlikely that any subtags
2592 will be registered of this type).
2593
2594 o Dialect or other divisions or variations within a language, its
2595 orthography, writing system, regional or historical usage,
2596 transliteration or other transformation, or distinguishing
2597 variation MAY be registered as variant subtags. An example is the
2598 'rozaj' subtag (the Resian dialect of Slovenian).
2599
2600 o The addition or maintenance of fields (generally of an
2601 informational nature) in tag or subtag records as described in
2602 Section 3.1 is allowed. Such changes are subject to the stability
2603 provisions in Section 3.4. This includes 'Description',
2604 'Comments', 'Deprecated', and 'Preferred-Value' fields for
2605 obsolete or withdrawn codes, or the addition of 'Suppress-Script'
2606 or 'Macrolanguage' fields to primary language subtags, as well as
2607 other changes permitted by this document, such as the addition of
2608 an appropriate 'Prefix' field to a variant subtag.
2609
2610 o The addition of records and related field value changes necessary
2611 to reflect assignments made by ISO 639, ISO 15924, ISO 3166-1, and
2612 UN M.49 as described in Section 3.4 is allowed.
2613
2614 Subtags proposed for registration that would cause all or part of a
2615 grandfathered tag to become redundant but whose meaning conflicts
2616 with or alters the meaning of the grandfathered tag MUST be rejected.
2617
2618 This document leaves the decision on what subtags or changes to
2619 subtags are appropriate (or not) to the registration process
2620 described in Section 3.5.
2621
2622 Note: Four-character primary language subtags are reserved to allow
2623 for the possibility of alpha4 codes in some future addition to the
2624 ISO 639 family of standards.
2625
2626 ISO 639 defines a registration authority for additions to and changes
2627 in the list of languages in ISO 639. This agency is:
2628
2629
2630
2631
2632
2633
2634Phillips & Davis Best Current Practice [Page 47]
2635
2636RFC 5646 Language Tags September 2009
2637
2638
2639 International Information Centre for Terminology (Infoterm)
2640 Aichholzgasse 6/12, AT-1120
2641 Wien, Austria
2642 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72
2643
2644 ISO 639-2 defines a registration authority for additions to and
2645 changes in the list of languages in ISO 639-2. This agency is:
2646
2647 Library of Congress
2648 Network Development and MARC Standards Office
2649 Washington, DC 20540, USA
2650 Phone: +1 202 707 6237 Fax: +1 202 707 0115
2651 URL: http://www.loc.gov/standards/iso639-2
2652
2653 ISO 639-3 defines a registration authority for additions to and
2654 changes in the list of languages in ISO 639-3. This agency is:
2655
2656 SIL International
2657 ISO 639-3 Registrar
2658 7500 W. Camp Wisdom Rd.
2659 Dallas, TX 75236, USA
2660 Phone: +1 972 708 7400, ext. 2293
2661 Fax: +1 972 708 7546
2662 Email: iso639-3@sil.org
2663 URL: http://www.sil.org/iso639-3
2664
2665 ISO 639-5 defines a registration authority for additions to and
2666 changes in the list of languages in ISO 639-5. This agency is the
2667 same as for ISO 639-2 and is:
2668
2669 Library of Congress
2670 Network Development and MARC Standards Office
2671 Washington, DC 20540, USA
2672 Phone: +1 202 707 6237
2673 Fax: +1 202 707 0115
2674 URL: http://www.loc.gov/standards/iso639-5
2675
2676 The maintenance agency for ISO 3166-1 (country codes) is:
2677
2678 ISO 3166 Maintenance Agency
2679 c/o International Organization for Standardization
2680 Case postale 56
2681 CH-1211 Geneva 20, Switzerland
2682 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49
2683 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html
2684
2685
2686
2687
2688
2689
2690Phillips & Davis Best Current Practice [Page 48]
2691
2692RFC 5646 Language Tags September 2009
2693
2694
2695 The registration authority for ISO 15924 (script codes) is:
2696
2697 Unicode Consortium
2698 Box 391476
2699 Mountain View, CA 94039-1476, USA
2700 URL: http://www.unicode.org/iso15924
2701
2702 The Statistics Division of the United Nations Secretariat maintains
2703 the Standard Country or Area Codes for Statistical Use and can be
2704 reached at:
2705
2706 Statistical Services Branch
2707 Statistics Division
2708 United Nations, Room DC2-1620
2709 New York, NY 10017, USA
2710 Fax: +1-212-963-0623
2711 Email: statistics@un.org
2712 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm
2713
27143.7. Extensions and the Extensions Registry
2715
2716 Extension subtags are those introduced by single-character subtags
2717 ("singletons") other than 'x'. They are reserved for the generation
2718 of identifiers that contain a language component and are compatible
2719 with applications that understand language tags.
2720
2721 The structure and form of extensions are defined by this document so
2722 that implementations can be created that are forward compatible with
2723 applications that might be created using singletons in the future.
2724 In addition, defining a mechanism for maintaining singletons will
2725 lend stability to this document by reducing the likely need for
2726 future revisions or updates.
2727
2728 Single-character subtags are assigned by IANA using the "IETF Review"
2729 policy defined by [RFC5226]. This policy requires the development of
2730 an RFC, which SHALL define the name, purpose, processes, and
2731 procedures for maintaining the subtags. The maintaining or
2732 registering authority, including name, contact email, discussion list
2733 email, and URL location of the registry, MUST be indicated clearly in
2734 the RFC. The RFC MUST specify or include each of the following:
2735
2736 o The specification MUST reference the specific version or revision
2737 of this document that governs its creation and MUST reference this
2738 section of this document.
2739
2740 o The specification and all subtags defined by the specification
2741 MUST follow the ABNF and other rules for the formation of tags and
2742 subtags as defined in this document. In particular, it MUST
2743
2744
2745
2746Phillips & Davis Best Current Practice [Page 49]
2747
2748RFC 5646 Language Tags September 2009
2749
2750
2751 specify that case is not significant and that subtags MUST NOT
2752 exceed eight characters in length.
2753
2754 o The specification MUST specify a canonical representation.
2755
2756 o The specification of valid subtags MUST be available over the
2757 Internet and at no cost.
2758
2759 o The specification MUST be in the public domain or available via a
2760 royalty-free license acceptable to the IETF and specified in the
2761 RFC.
2762
2763 o The specification MUST be versioned, and each version of the
2764 specification MUST be numbered, dated, and stable.
2765
2766 o The specification MUST be stable. That is, extension subtags,
2767 once defined by a specification, MUST NOT be retracted or change
2768 in meaning in any substantial way.
2769
2770 o The specification MUST include, in a separate section, the
2771 registration form reproduced in this section (below) to be used in
2772 registering the extension upon publication as an RFC.
2773
2774 o IANA MUST be informed of changes to the contact information and
2775 URL for the specification.
2776
2777 IANA will maintain a registry of allocated single-character
2778 (singleton) subtags. This registry MUST use the record-jar format
2779 described by the ABNF in Section 3.1.1. Upon publication of an
2780 extension as an RFC, the maintaining authority defined in the RFC
2781 MUST forward this registration form to <iesg@ietf.org>, who MUST
2782 forward the request to <iana@iana.org>. The maintaining authority of
2783 the extension MUST maintain the accuracy of the record by sending an
2784 updated full copy of the record to <iana@iana.org> with the subject
2785 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only
2786 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY
2787 be modified in these updates.
2788
2789 Failure to maintain this record, maintain the corresponding registry,
2790 or meet other conditions imposed by this section of this document MAY
2791 be appealed to the IESG [RFC2028] under the same rules as other IETF
2792 decisions (see [RFC2026]) and MAY result in the authority to maintain
2793 the extension being withdrawn or reassigned by the IESG.
2794
2795
2796
2797
2798
2799
2800
2801
2802Phillips & Davis Best Current Practice [Page 50]
2803
2804RFC 5646 Language Tags September 2009
2805
2806
2807 %%
2808 Identifier:
2809 Description:
2810 Comments:
2811 Added:
2812 RFC:
2813 Authority:
2814 Contact_Email:
2815 Mailing_List:
2816 URL:
2817 %%
2818
2819 Figure 6: Format of Records in the Language Tag Extensions Registry
2820
2821 'Identifier' contains the single-character subtag (singleton)
2822 assigned to the extension. The Internet-Draft submitted to define
2823 the extension SHOULD specify which letter or digit to use, although
2824 the IESG MAY change the assignment when approving the RFC.
2825
2826 'Description' contains the name and description of the extension.
2827
2828 'Comments' is an OPTIONAL field and MAY contain a broader description
2829 of the extension.
2830
2831 'Added' contains the date the extension's RFC was published in the
2832 "full-date" format specified in [RFC3339]. For example: 2004-06-28
2833 represents June 28, 2004, in the Gregorian calendar.
2834
2835 'RFC' contains the RFC number assigned to the extension.
2836
2837 'Authority' contains the name of the maintaining authority for the
2838 extension.
2839
2840 'Contact_Email' contains the email address used to contact the
2841 maintaining authority.
2842
2843 'Mailing_List' contains the URL or subscription email address of the
2844 mailing list used by the maintaining authority.
2845
2846 'URL' contains the URL of the registry for this extension.
2847
2848 The determination of whether an Internet-Draft meets the above
2849 conditions and the decision to grant or withhold such authority rests
2850 solely with the IESG and is subject to the normal review and appeals
2851 process associated with the RFC process.
2852
2853 Extension authors are strongly cautioned that many (including most
2854 well-formed) processors will be unaware of any special relationships
2855
2856
2857
2858Phillips & Davis Best Current Practice [Page 51]
2859
2860RFC 5646 Language Tags September 2009
2861
2862
2863 or meaning inherent in the order of extension subtags. Extension
2864 authors SHOULD avoid subtag relationships or canonicalization
2865 mechanisms that interfere with matching or with length restrictions
2866 that sometimes exist in common protocols where the extension is used.
2867 In particular, applications MAY truncate the subtags in doing
2868 matching or in fitting into limited lengths, so it is RECOMMENDED
2869 that the most significant information be in the most significant
2870 (left-most) subtags and that the specification gracefully handle
2871 truncated subtags.
2872
2873 When a language tag is to be used in a specific, known protocol, it
2874 is RECOMMENDED that the language tag not contain extensions not
2875 supported by that protocol. In addition, note that some protocols
2876 MAY impose upper limits on the length of the strings used to store or
2877 transport the language tag.
2878
28793.8. Update of the Language Subtag Registry
2880
2881 After the adoption of this document, the IANA Language Subtag
2882 Registry needed an update so that it would contain the complete set
2883 of subtags valid in a language tag. [RFC5645] describes the process
2884 used to create this update.
2885
2886 Registrations that are in process under the rules defined in
2887 [RFC4646] when this document is adopted MUST be completed under the
2888 rules contained in this document.
2889
28903.9. Applicability of the Subtag Registry
2891
2892 The Language Subtag Registry is the source of data elements used to
2893 construct language tags, following the rules described in this
2894 document. Language tags are designed for indicating linguistic
2895 attributes of various content, including not only text but also most
2896 media formats, such as video or audio. They also form the basis for
2897 language and locale negotiation in various protocols and APIs.
2898
2899 The registry is therefore applicable to many applications that need
2900 some form of language identification, with these limitations:
2901
2902 o It is not designed to be the sole data source in the creation of a
2903 language-selection user interface. For example, the registry does
2904 not contain translations for subtag descriptions or for tags
2905 composed from the subtags. Sources for localized data based on
2906 the registry are generally available, notably [CLDR]. Nor does
2907 the registry indicate which subtag combinations are particularly
2908 useful or relevant.
2909
2910
2911
2912
2913
2914Phillips & Davis Best Current Practice [Page 52]
2915
2916RFC 5646 Language Tags September 2009
2917
2918
2919 o It does not provide information indicating relationships between
2920 different languages, such as might be used in a user interface to
2921 select language tags hierarchically, regionally, or on some other
2922 organizational model.
2923
2924 o It does not supply information about potential overlap between
2925 different language tags, as the notion of what constitutes a
2926 language is not precise: several different language tags might be
2927 reasonable choices for the same given piece of content.
2928
2929 o It does not contain information about appropriate fallback choices
2930 when performing language negotiation. A good fallback language
2931 might be linguistically unrelated to the specified language. The
2932 fact that one language is often used as a fallback language for
2933 another is usually a result of outside factors, such as geography,
2934 history, or culture -- factors that might not apply in all cases.
2935 For example, most people who use Breton (a Celtic language used in
2936 the Northwest of France) would probably prefer to be served French
2937 (a Romance language) if Breton isn't available.
2938
29394. Formation and Processing of Language Tags
2940
2941 This section addresses how to use the information in the registry
2942 with the tag syntax to choose, form, and process language tags.
2943
29444.1. Choice of Language Tag
2945
2946 The guiding principle in forming language tags is to "tag content
2947 wisely." Sometimes there is a choice between several possible tags
2948 for the same content. The choice of which tag to use depends on the
2949 content and application in question, and some amount of judgment
2950 might be necessary when selecting a tag.
2951
2952 Interoperability is best served when the same language tag is used
2953 consistently to represent the same language. If an application has
2954 requirements that make the rules here inapplicable, then that
2955 application risks damaging interoperability. It is strongly
2956 RECOMMENDED that users not define their own rules for language tag
2957 choice.
2958
2959 Standards, protocols, and applications that reference this document
2960 normatively but apply different rules to the ones given in this
2961 section MUST specify how language tag selection varies from the
2962 guidelines given here.
2963
2964 To ensure consistent backward compatibility, this document contains
2965 several provisions to account for potential instability in the
2966 standards used to define the subtags that make up language tags.
2967
2968
2969
2970Phillips & Davis Best Current Practice [Page 53]
2971
2972RFC 5646 Language Tags September 2009
2973
2974
2975 These provisions mean that no valid language tag can become invalid,
2976 nor will a language tag have a narrower scope in the future (it may
2977 have a broader scope). The most appropriate language tag for a given
2978 application or content item might evolve over time, but once applied,
2979 the tag itself cannot become invalid or have its meaning wholly
2980 change.
2981
2982 A subtag SHOULD only be used when it adds useful distinguishing
2983 information to the tag. Extraneous subtags interfere with the
2984 meaning, understanding, and processing of language tags. In
2985 particular, users and implementations SHOULD follow the 'Prefix' and
2986 'Suppress-Script' fields in the registry (defined in Section 3.1):
2987 these fields provide guidance on when specific additional subtags
2988 SHOULD be used or avoided in a language tag.
2989
2990 The choice of subtags used to form a language tag SHOULD follow these
2991 guidelines:
2992
2993 1. Use as precise a tag as possible, but no more specific than is
2994 justified. Avoid using subtags that are not important for
2995 distinguishing content in an application.
2996
2997 * For example, 'de' might suffice for tagging an email written
2998 in German, while "de-CH-1996" is probably unnecessarily
2999 precise for such a task.
3000
3001 * Note that some subtag sequences might not represent the
3002 language a casual user might expect. For example, the Swiss
3003 German (Schweizerdeutsch) language is represented by "gsw-CH"
3004 and not by "de-CH". This latter tag represents German ('de')
3005 as used in Switzerland ('CH'), also known as Swiss High German
3006 (Schweizer Hochdeutsch). Both are real languages, and
3007 distinguishing between them could be important to an
3008 application.
3009
3010 2. The script subtag SHOULD NOT be used to form language tags unless
3011 the script adds some distinguishing information to the tag.
3012 Script subtags were first formally defined in [RFC4646]. Their
3013 use can affect matching and subtag identification for
3014 implementations of [RFC1766] or [RFC3066] (which are obsoleted by
3015 this document), as these subtags appear between the primary
3016 language and region subtags. Some applications can benefit from
3017 the use of script subtags in language tags, as long as the use is
3018 consistent for a given context. Script subtags are never
3019 appropriate for unwritten content (such as audio recordings).
3020 The field 'Suppress-Script' in the primary or extended language
3021 record in the registry indicates script subtags that do not add
3022 distinguishing information for most applications; this field
3023
3024
3025
3026Phillips & Davis Best Current Practice [Page 54]
3027
3028RFC 5646 Language Tags September 2009
3029
3030
3031 defines when users SHOULD NOT include a script subtag with a
3032 particular primary language subtag.
3033
3034 For example, if an implementation selects content using Basic
3035 Filtering [RFC4647] (originally described in Section 14.4 of
3036 [RFC2616]) and the user requested the language range "en-US",
3037 content labeled "en-Latn-US" will not match the request and thus
3038 not be selected. Therefore, it is important to know when script
3039 subtags will customarily be used and when they ought not be used.
3040
3041 For example:
3042
3043 * The subtag 'Latn' should not be used with the primary language
3044 'en' because nearly all English documents are written in the
3045 Latin script and it adds no distinguishing information.
3046 However, if a document were written in English mixing Latin
3047 script with another script such as Braille ('Brai'), then it
3048 might be appropriate to choose to indicate both scripts to aid
3049 in content selection, such as the application of a style
3050 sheet.
3051
3052 * When labeling content that is unwritten (such as a recording
3053 of human speech), the script subtag should not be used, even
3054 if the language is customarily written in several scripts.
3055 Thus, the subtitles to a movie might use the tag "uz-Arab"
3056 (Uzbek, Arabic script), but the audio track for the same
3057 language would be tagged simply "uz". (The tag "uz-Zxxx"
3058 could also be used where content is not written, as the subtag
3059 'Zxxx' represents the "Code for unwritten documents".)
3060
3061 3. If a tag or subtag has a 'Preferred-Value' field in its registry
3062 entry, then the value of that field SHOULD be used to form the
3063 language tag in preference to the tag or subtag in which the
3064 preferred value appears.
3065
3066 * For example, use 'jbo' for Lojban in preference to the
3067 grandfathered tag "art-lojban".
3068
3069 4. Use subtags or sequences of subtags for individual languages in
3070 preference to subtags for language collections. A "language
3071 collection" is a group of languages that are descended from a
3072 common ancestor, are spoken in the same geographical area, or are
3073 otherwise related. Certain language collections are assigned
3074 codes by [ISO639-5] (and some of these [ISO639-5] codes are also
3075 defined as collections in [ISO639-2]). These codes are included
3076 as primary language subtags in the registry. Subtags for a
3077 language collection in the registry have a 'Scope' field with a
3078 value of 'collection'. A subtag for a language collection is
3079
3080
3081
3082Phillips & Davis Best Current Practice [Page 55]
3083
3084RFC 5646 Language Tags September 2009
3085
3086
3087 always preferred to less specific alternatives such as 'mul' and
3088 'und' (see below), and a subtag representing a language
3089 collection MAY be used when more specific language information is
3090 not available. However, most users and implementations do not
3091 know there is a relationship between the collection and its
3092 individual languages. In addition, the relationship between the
3093 individual languages in the collection is not well defined; in
3094 particular, the languages are usually not mutually intelligible.
3095 Since the subtags are different, a request for the collection
3096 will typically only produce items tagged with the collection's
3097 subtag, not items tagged with subtags for the individual
3098 languages contained in the collection.
3099
3100 * For example, collections are interpreted inclusively, so the
3101 subtag 'gem' (Germanic languages) could, but SHOULD NOT, be
3102 used with content that would be better tagged with "en"
3103 (English), "de" (German), or "gsw" (Swiss German, Alemannic).
3104 While 'gem' collects all of these (and other) languages, most
3105 implementations will not match 'gem' to the individual
3106 languages; thus, using the subtag will not produce the desired
3107 result.
3108
3109 5. [ISO639-2] has defined several codes included in the subtag
3110 registry that require additional care when choosing language
3111 tags. In most of these cases, where omitting the language tag is
3112 permitted, such omission is preferable to using these codes.
3113 Language tags SHOULD NOT incorporate these subtags as a prefix,
3114 unless the additional information conveys some value to the
3115 application.
3116
3117 * The 'mul' (Multiple) primary language subtag identifies
3118 content in multiple languages. This subtag SHOULD NOT be used
3119 when a list of languages or individual tags for each content
3120 element can be used instead. For example, the 'Content-
3121 Language' header [RFC3282] allows a list of languages to be
3122 used, not just a single language tag.
3123
3124 * The 'und' (Undetermined) primary language subtag identifies
3125 linguistic content whose language is not determined. This
3126 subtag SHOULD NOT be used unless a language tag is required
3127 and language information is not available or cannot be
3128 determined. Omitting the language tag (where permitted) is
3129 preferred. The 'und' subtag might be useful for protocols
3130 that require a language tag to be provided or where a primary
3131 language subtag is required (such as in "und-Latn"). The
3132 'und' subtag MAY also be useful when matching language tags in
3133 certain situations.
3134
3135
3136
3137
3138Phillips & Davis Best Current Practice [Page 56]
3139
3140RFC 5646 Language Tags September 2009
3141
3142
3143 * The 'zxx' (Non-Linguistic, Not Applicable) primary language
3144 subtag identifies content for which a language classification
3145 is inappropriate or does not apply. Some examples might
3146 include instrumental or electronic music; sound recordings
3147 consisting of nonverbal sounds; audiovisual materials with no
3148 narration, dialog, printed titles, or subtitles; machine-
3149 readable data files consisting of machine languages or
3150 character codes; or programming source code.
3151
3152 * The 'mis' (Uncoded) primary language subtag identifies content
3153 whose language is known but that does not currently have a
3154 corresponding subtag. This subtag SHOULD NOT be used.
3155 Because the addition of other codes in the future can render
3156 its application invalid, it is inherently unstable and hence
3157 incompatible with the stability goals of BCP 47. It is always
3158 preferable to use other subtags: either 'und' or (with prior
3159 agreement) private use subtags.
3160
3161 6. Use variant subtags sparingly and in the correct order. Most
3162 variant subtags have one or more 'Prefix' fields in the registry
3163 that express the list of subtags with which they are appropriate.
3164 Variants SHOULD only be used with subtags that appear in one of
3165 these 'Prefix' fields. If a variant lists a second variant in
3166 one of its 'Prefix' fields, the first variant SHOULD appear
3167 directly after the second variant in any language tag where both
3168 occur. General purpose variants (those with no 'Prefix' fields
3169 at all) SHOULD appear after any other variant subtags. Order any
3170 remaining variants by placing the most significant subtag first.
3171 If none of the subtags is more significant or no relationship can
3172 be determined, alphabetize the subtags. Because variants are
3173 very specialized, using many of them together generally makes the
3174 tag so narrow as to override the additional precision gained.
3175 Putting the subtags into another order interferes with
3176 interoperability, as well as the overall interpretation of the
3177 tag.
3178
3179 For example:
3180
3181 * The tag "en-scotland-fonipa" (English, Scottish dialect, IPA
3182 phonetic transcription) is correctly ordered because
3183 'scotland' has a 'Prefix' of "en", while 'fonipa' has no
3184 'Prefix' field.
3185
3186 * The tag "sl-IT-rozaj-biske-1994" is correctly ordered: 'rozaj'
3187 lists "sl" as its sole 'Prefix'; 'biske' lists "sl-rozaj" as
3188 its sole 'Prefix'. The subtag '1994' has several prefixes,
3189
3190
3191
3192
3193
3194Phillips & Davis Best Current Practice [Page 57]
3195
3196RFC 5646 Language Tags September 2009
3197
3198
3199 including "sl-rozaj". However, it follows both 'rozaj' and
3200 'biske' because one of its 'Prefix' fields is "sl-rozaj-
3201 biske".
3202
3203 7. The grandfathered tag "i-default" (Default Language) was
3204 originally registered according to [RFC1766] to meet the needs of
3205 [RFC2277]. It is not used to indicate a specific language, but
3206 rather to identify the condition or content used where the
3207 language preferences of the user cannot be established. It
3208 SHOULD NOT be used except as a means of labeling the default
3209 content for applications or protocols that require default
3210 language content to be labeled with that specific tag. It MAY
3211 also be used by an application or protocol to identify when the
3212 default language content is being returned.
3213
32144.1.1. Tagging Encompassed Languages
3215
3216 Some primary language records in the registry have a 'Macrolanguage'
3217 field (Section 3.1.10) that contains a mapping from each "encompassed
3218 language" to its macrolanguage. The 'Macrolanguage' mapping doesn't
3219 define what the relationship between the encompassed language and its
3220 macrolanguage is, nor does it define how languages encompassed by the
3221 same macrolanguage are related to each other. Two different
3222 languages encompassed by the same macrolanguage may differ from one
3223 another more than, say, French and Spanish do.
3224
3225 A few specific macrolanguages, such as Chinese ('zh') and Arabic
3226 ('ar'), are handled differently. See Section 4.1.2.
3227
3228 The more specific encompassed language subtag SHOULD be used to form
3229 the language tag, although either the macrolanguage's primary
3230 language subtag or the encompassed language's subtag MAY be used.
3231 This means, for example, tagging Plains Cree with 'crk' rather than
3232 'cr' (Cree), and so forth.
3233
3234 Each macrolanguage subtag's scope, by definition, includes all of its
3235 encompassed languages. Since the relationship between encompassed
3236 languages varies, users cannot assume that the macrolanguage subtag
3237 means any particular encompassed language, nor that any given pair of
3238 encompassed languages are mutually intelligible or otherwise
3239 interchangeable.
3240
3241 Applications MAY use macrolanguage information to improve matching or
3242 language negotiation. For example, the information that 'sr'
3243 (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a
3244 closer relation between those languages than between, say, 'sr'
3245 (Serbian) and 'ma' (Macedonian). However, this relationship is not
3246 guaranteed nor is it exclusive. For example, Romanian ('ro') and
3247
3248
3249
3250Phillips & Davis Best Current Practice [Page 58]
3251
3252RFC 5646 Language Tags September 2009
3253
3254
3255 Moldavian ('mo') do not share a macrolanguage, but are far more
3256 closely related to each other than Cantonese ('yue') and Wu ('wuu'),
3257 which do share a macrolanguage.
3258
32594.1.2. Using Extended Language Subtags
3260
3261 To accommodate language tag forms used prior to the adoption of this
3262 document, language tags provide a special compatibility mechanism:
3263 the extended language subtag. Selected languages have been provided
3264 with both primary and extended language subtags. These include
3265 macrolanguages, such as Malay ('ms') and Uzbek ('uz'), that have a
3266 specific dominant variety that is generally synonymous with the
3267 macrolanguage. Other languages, such as the Chinese ('zh') and
3268 Arabic ('ar') macrolanguages and the various sign languages ('sgn'),
3269 have traditionally used their primary language subtag, possibly
3270 coupled with various region subtags or as part of a registered
3271 grandfathered tag, to indicate the language.
3272
3273 With the adoption of this document, specific ISO 639-3 subtags became
3274 available to identify the languages contained within these diverse
3275 language families or groupings. This presents a choice of language
3276 tags where previously none existed:
3277
3278 o Each encompassed language's subtag SHOULD be used as the primary
3279 language subtag. For example, a document in Mandarin Chinese
3280 would be tagged "cmn" (the subtag for Mandarin Chinese) in
3281 preference to "zh" (Chinese).
3282
3283 o If compatibility is desired or needed, the encompassed subtag MAY
3284 be used as an extended language subtag. For example, a document
3285 in Mandarin Chinese could be tagged "zh-cmn" instead of either
3286 "cmn" or "zh".
3287
3288 o The macrolanguage or prefixing subtag MAY still be used to form
3289 the tag instead of the more specific encompassed language subtag.
3290 That is, tags such as "zh-HK" or "sgn-RU" are still valid.
3291
3292 Chinese ('zh') provides a useful illustration of this. In the past,
3293 various content has used tags beginning with the 'zh' subtag, with
3294 application-specific meaning being associated with region codes,
3295 private use sequences, or grandfathered registered values. This is
3296 because historically only the macrolanguage subtag 'zh' was available
3297 for forming language tags. However, the languages encompassed by the
3298 Chinese subtag 'zh' are, in the main, not mutually intelligible when
3299 spoken, and the written forms of these languages also show wide
3300 variation in form and usage.
3301
3302
3303
3304
3305
3306Phillips & Davis Best Current Practice [Page 59]
3307
3308RFC 5646 Language Tags September 2009
3309
3310
3311 To provide compatibility, Chinese languages encompassed by the 'zh'
3312 subtag are in the registry both as primary language subtags and as
3313 extended language subtags. For example, the ISO 639-3 code for
3314 Cantonese is 'yue'. Content in Cantonese might historically have
3315 used a tag such as "zh-HK" (since Cantonese is commonly spoken in
3316 Hong Kong), although that tag actually means any type of Chinese as
3317 used in Hong Kong. With the availability of ISO 639-3 codes in the
3318 registry, content in Cantonese can be directly tagged using the 'yue'
3319 subtag. The content can use it as a primary language subtag, as in
3320 the tag "yue-HK" (Cantonese, Hong Kong). Or it can use an extended
3321 language subtag with 'zh', as in the tag "zh-yue-Hant" (Chinese,
3322 Cantonese, Traditional script).
3323
3324 As noted above, applications can choose to use the macrolanguage
3325 subtag to form the tag instead of using the more specific encompassed
3326 language subtag. For example, an application with large quantities
3327 of data already using tags with the 'zh' (Chinese) subtag might
3328 continue to use this more general subtag even for new data, even
3329 though the content could be more precisely tagged with 'cmn'
3330 (Mandarin), 'yue' (Cantonese), 'wuu' (Wu), and so on. Similarly, an
3331 application already using tags that start with the 'ar' (Arabic)
3332 subtag might continue to use this more general subtag even for new
3333 data, which could be more precisely tagged with 'arb' (Standard
3334 Arabic).
3335
3336 In some cases, the encompassed languages had tags registered for them
3337 during the RFC 3066 era. Those grandfathered tags not already
3338 deprecated or rendered redundant were deprecated in the registry upon
3339 adoption of this document. As grandfathered values, they remain
3340 valid for use, and some content or applications might use them. As
3341 with other grandfathered tags, since implementations might not be
3342 able to associate the grandfathered tags with the encompassed
3343 language subtag equivalents that are recommended by this document,
3344 implementations are encouraged to canonicalize tags for comparison
3345 purposes. Some examples of this include the tags "zh-hakka" (Hakka)
3346 and "zh-guoyu" (Mandarin or Standard Chinese).
3347
3348 Sign languages share a mode of communication rather than a linguistic
3349 heritage. There are many sign languages that have developed
3350 independently, and the subtag 'sgn' indicates only the presence of a
3351 sign language. A number of sign languages also had grandfathered
3352 tags registered for them during the RFC 3066 era. For example, the
3353 grandfathered tag "sgn-US" was registered to represent 'American Sign
3354 Language' specifically, without reference to the United States. This
3355 is still valid, but deprecated: a document in American Sign Language
3356 can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the
3357 language called 'American Sign Language').
3358
3359
3360
3361
3362Phillips & Davis Best Current Practice [Page 60]
3363
3364RFC 5646 Language Tags September 2009
3365
3366
33674.2. Meaning of the Language Tag
3368
3369 The meaning of a language tag is related to the meaning of the
3370 subtags that it contains. Each subtag, in turn, implies a certain
3371 range of expectations one might have for related content, although it
3372 is not a guarantee. For example, the use of a script subtag such as
3373 'Arab' (Arabic script) does not mean that the content contains only
3374 Arabic characters. It does mean that the language involved is
3375 predominantly in the Arabic script. Thus, a language tag and its
3376 subtags can encompass a very wide range of variation and yet remain
3377 appropriate in each particular instance.
3378
3379 Validity of a tag is not the only factor determining its usefulness.
3380 While every valid tag has a meaning, it might not represent any real-
3381 world language usage. This is unavoidable in a system in which
3382 subtags can be combined freely. For example, tags such as
3383 "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia) or "tlh-
3384 Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA
3385 phonetic transcription) are both valid and unlikely to represent a
3386 useful combination of language attributes.
3387
3388 The meaning of a given tag doesn't depend on the context in which it
3389 appears. The relationship between a tag's meaning and the
3390 information objects to which that tag is applied, however, can vary.
3391
3392 o For a single information object, the associated language tags
3393 might be interpreted as the set of languages that is necessary for
3394 a complete comprehension of the complete object. Example: Plain
3395 text documents.
3396
3397 o For an aggregation of information objects, the associated language
3398 tags could be taken as the set of languages used inside components
3399 of that aggregation. Examples: Document stores and libraries.
3400
3401 o For information objects whose purpose is to provide alternatives,
3402 the associated language tags could be regarded as a hint that the
3403 content is provided in several languages and that one has to
3404 inspect each of the alternatives in order to find its language or
3405 languages. In this case, the presence of multiple tags might not
3406 mean that one needs to be multilingual to get complete
3407 understanding of the document. Example: MIME multipart/
3408 alternative [RFC2046].
3409
3410 o For markup languages, such as HTML and XML, language information
3411 can be added to each part of the document identified by the markup
3412 structure (including the whole document itself). For example, one
3413 could write <span lang="fr">C'est la vie.</span> inside a German
3414 document; the German-speaking user could then access a French-
3415
3416
3417
3418Phillips & Davis Best Current Practice [Page 61]
3419
3420RFC 5646 Language Tags September 2009
3421
3422
3423 German dictionary to find out what the marked section meant. If
3424 the user were listening to that document through a speech
3425 synthesis interface, this formation could be used to signal the
3426 synthesizer to appropriately apply French text-to-speech
3427 pronunciation rules to that span of text, instead of applying the
3428 inappropriate German rules.
3429
3430 o For markup languages and document formats that allow the audience
3431 to be identified, a language tag could indicate the audience(s)
3432 appropriate for that document. For example, the same HTML
3433 document described in the preceding bullet might have an HTTP
3434 header "Content-Language: de" to indicate that the intended
3435 audience for the file is German (even though three words appear
3436 and are identified as being in French within it).
3437
3438 o For systems and APIs, language tags form the basis for most
3439 implementations of locale identifiers. For example, see Unicode's
3440 CLDR (Common Locale Data Repository) (see UTS #35 [UTS35])
3441 project.
3442
3443 Language tags are related when they contain a similar sequence of
3444 subtags. For example, if a language tag B contains language tag A as
3445 a prefix, then B is typically "narrower" or "more specific" than A.
3446 Thus, "zh-Hant-TW" is more specific than "zh-Hant".
3447
3448 This relationship is not guaranteed in all cases: specifically,
3449 languages that begin with the same sequence of subtags are NOT
3450 guaranteed to be mutually intelligible, although they might be. For
3451 example, the tag "az" shares a prefix with both "az-Latn"
3452 (Azerbaijani written using the Latin script) and "az-Cyrl"
3453 (Azerbaijani written using the Cyrillic script). A person fluent in
3454 one script might not be able to read the other, even though the
3455 linguistic content (e.g., what would be heard if both texts were read
3456 aloud) might be identical. Content tagged as "az" most probably is
3457 written in just one script and thus might not be intelligible to a
3458 reader familiar with the other script.
3459
3460 Similarly, not all subtags specify an actual distinction in language.
3461 For example, the tags "en-US" and "en-CA" mean, roughly, English with
3462 features generally thought to be characteristic of the United States
3463 and Canada, respectively. They do not imply that a significant
3464 dialectical boundary exists between any arbitrarily selected point in
3465 the United States and any arbitrarily selected point in Canada.
3466 Neither does a particular region subtag imply that linguistic
3467 distinctions do not exist within that region.
3468
3469
3470
3471
3472
3473
3474Phillips & Davis Best Current Practice [Page 62]
3475
3476RFC 5646 Language Tags September 2009
3477
3478
34794.3. Lists of Languages
3480
3481 In some applications, a single content item might best be associated
3482 with more than one language tag. Examples of such a usage include:
3483
3484 o Content items that contain multiple, distinct varieties. Often
3485 this is used to indicate an appropriate audience for a given
3486 content item when multiple choices might be appropriate. Examples
3487 of this could include:
3488
3489 * Metadata about the appropriate audience for a movie title. For
3490 example, a DVD might label its individual audio tracks 'de'
3491 (German), 'fr' (French), and 'es' (Spanish), but the overall
3492 title would list "de, fr, es" as its overall audience.
3493
3494 * A French/English, English/French dictionary tagged as both "en"
3495 and "fr" to specify that it applies equally to French and
3496 English.
3497
3498 * A side-by-side or interlinear translation of a document, as is
3499 commonly done with classical works in Latin or Greek.
3500
3501 o Content items that contain a single language but that require
3502 multiple levels of specificity. For example, a library might wish
3503 to classify a particular work as both Norwegian ('no') and as
3504 Nynorsk ('nn') for audiences capable of appreciating the
3505 distinction or needing to select content more narrowly.
3506
35074.4. Length Considerations
3508
3509 There is no defined upper limit on the size of language tags. While
3510 historically most language tags have consisted of language and region
3511 subtags with a combined total length of up to six characters, larger
3512 tags have always been both possible and have actually appeared in
3513 use.
3514
3515 Neither the language tag syntax nor other requirements in this
3516 document impose a fixed upper limit on the number of subtags in a
3517 language tag (and thus an upper bound on the size of a tag). The
3518 language tag syntax suggests that, depending on the specific
3519 language, more subtags (and thus a longer tag) are sometimes
3520 necessary to completely identify the language for certain
3521 applications; thus, it is possible to envision long or complex subtag
3522 sequences.
3523
3524
3525
3526
3527
3528
3529
3530Phillips & Davis Best Current Practice [Page 63]
3531
3532RFC 5646 Language Tags September 2009
3533
3534
35354.4.1. Working with Limited Buffer Sizes
3536
3537 Some applications and protocols are forced to allocate fixed buffer
3538 sizes or otherwise limit the length of a language tag. A conformant
3539 implementation or specification MAY refuse to support the storage of
3540 language tags that exceed a specified length. Any such limitation
3541 SHOULD be clearly documented, and such documentation SHOULD include
3542 what happens to longer tags (for example, whether an error value is
3543 generated or the language tag is truncated). A protocol that allows
3544 tags to be truncated at an arbitrary limit, without giving any
3545 indication of what that limit is, has the potential to cause harm by
3546 changing the meaning of tags in substantial ways.
3547
3548 In practice, most language tags do not require more than a few
3549 subtags and will not approach reasonably sized buffer limitations;
3550 see Section 4.1.
3551
3552 Some specifications or protocols have limits on tag length but do not
3553 have a fixed length limitation. For example, [RFC2231] has no
3554 explicit length limitation: the length available for the language tag
3555 is constrained by the length of other header components (such as the
3556 charset's name) coupled with the 76-character limit in [RFC2047].
3557 Thus, the "limit" might be 50 or more characters, but it could
3558 potentially be quite small.
3559
3560 The considerations for assigning a buffer limit are:
3561
3562 Implementations SHOULD NOT truncate language tags unless the
3563 meaning of the tag is purposefully being changed, or unless the
3564 tag does not fit into a limited buffer size specified by a
3565 protocol for storage or transmission.
3566
3567 Implementations SHOULD warn the user when a tag is truncated since
3568 truncation changes the semantic meaning of the tag.
3569
3570 Implementations of protocols or specifications that are space
3571 constrained but do not have a fixed limit SHOULD use the longest
3572 possible tag in preference to truncation.
3573
3574 Protocols or specifications that specify limited buffer sizes for
3575 language tags MUST allow for language tags of at least 35
3576 characters. Note that [RFC4646] recommended a minimum field size
3577 of 42 characters because it included all three elements of the
3578 'extlang' production. Two of these are now permanently reserved,
3579 so a registered primary language subtag of the maximum length of 8
3580 characters is now longer than the longest language-extlang
3581 combination. Protocols or specifications that commonly use
3582
3583
3584
3585
3586Phillips & Davis Best Current Practice [Page 64]
3587
3588RFC 5646 Language Tags September 2009
3589
3590
3591 extensions or private use subtags might wish to reserve or
3592 recommend a longer "minimum buffer" size.
3593
3594 The following illustration shows how the 35-character recommendation
3595 was derived:
3596
3597 language = 8 ; longest allowed registered value
3598 ; longer than primary+extlang
3599 ; which requires 7 characters
3600 script = 5 ; if not suppressed: see Section 4.1
3601 region = 4 ; UN M.49 numeric region code
3602 ; ISO 3166-1 codes require 3
3603 variant1 = 9 ; needs 'language' as a prefix
3604 variant2 = 9 ; very rare, as it needs
3605 ; 'language-variant1' as a prefix
3606
3607 total = 35 characters
3608
3609 Figure 7: Derivation of the Limit on Tag Length
3610
36114.4.2. Truncation of Language Tags
3612
3613 Truncation of a language tag alters the meaning of the tag, and thus
3614 SHOULD be avoided. However, truncation of language tags is sometimes
3615 necessary due to limited buffer sizes. Such truncation MUST NOT
3616 permit a subtag to be chopped off in the middle or the formation of
3617 invalid tags (for example, one ending with the "-" character).
3618
3619 This means that applications or protocols that truncate tags MUST do
3620 so by progressively removing subtags along with their preceding "-"
3621 from the right side of the language tag until the tag is short enough
3622 for the given buffer. If the resulting tag ends with a single-
3623 character subtag, that subtag and its preceding "-" MUST also be
3624 removed. For example:
3625
3626 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1
3627 1. zh-Latn-CN-variant1-a-extend1-x-wadegile
3628 2. zh-Latn-CN-variant1-a-extend1
3629 3. zh-Latn-CN-variant1
3630 4. zh-Latn-CN
3631 5. zh-Latn
3632 6. zh
3633
3634 Figure 8: Example of Tag Truncation
3635
3636
3637
3638
3639
3640
3641
3642Phillips & Davis Best Current Practice [Page 65]
3643
3644RFC 5646 Language Tags September 2009
3645
3646
36474.5. Canonicalization of Language Tags
3648
3649 Since a particular language tag can be used by many processes,
3650 language tags SHOULD always be created or generated in canonical
3651 form.
3652
3653 A language tag is in 'canonical form' when the tag is well-formed
3654 according to the rules in Sections 2.1 and 2.2 and it has been
3655 canonicalized by applying each of the following steps in order, using
3656 data from the IANA registry (see Section 3.1):
3657
3658 1. Extension sequences are ordered into case-insensitive ASCII order
3659 by singleton subtag.
3660
3661 * For example, the subtag sequence '-a-babble' comes before
3662 '-b-warble'.
3663
3664 2. Redundant or grandfathered tags are replaced by their 'Preferred-
3665 Value', if there is one.
3666
3667 * The field-body of the 'Preferred-Value' for grandfathered and
3668 redundant tags is an "extended language range" [RFC4647] and
3669 might consist of more than one subtag.
3670
3671 * 'Preferred-Value' fields in the registry provide mappings from
3672 deprecated tags to modern equivalents. Many of these were
3673 created before the adoption of this document (such as the
3674 mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others
3675 are the result of later registrations or additions to the
3676 registry as permitted or required by this document (for
3677 example, "zh-hakka" was deprecated in favor of the ISO 639-3
3678 code 'hak' when this document was adopted).
3679
3680 3. Subtags are replaced by their 'Preferred-Value', if there is one.
3681 For extlangs, the original primary language subtag is also
3682 replaced if there is a primary language subtag in the 'Preferred-
3683 Value'.
3684
3685 * The field-body of the 'Preferred-Value' for extlangs is an
3686 "extended language range" and typically maps to a primary
3687 language subtag. For example, the subtag sequence "zh-hak"
3688 (Chinese, Hakka) is replaced with the subtag 'hak' (Hakka).
3689
3690 * Most of the non-extlang subtags are either Region subtags
3691 where the country name or designation has changed or clerical
3692 corrections to ISO 639-1.
3693
3694
3695
3696
3697
3698Phillips & Davis Best Current Practice [Page 66]
3699
3700RFC 5646 Language Tags September 2009
3701
3702
3703 The canonical form contains no 'extlang' subtags. There is an
3704 alternate 'extlang form' that maintains or reinstates extlang
3705 subtags. This form can be useful in environments where the presence
3706 of the 'Prefix' subtag is considered beneficial in matching or
3707 selection (see Section 4.1.2).
3708
3709 A language tag is in 'extlang form' when the tag is well-formed
3710 according to the rules in Sections 2.1 and 2.2 and it has been
3711 processed by applying each of the following two steps in order, using
3712 data from the IANA registry:
3713
3714 1. The language tag is first transformed into canonical form, as
3715 described above.
3716
3717 2. If the language tag starts with a primary language subtag that is
3718 also an extlang subtag, then the language tag is prepended with
3719 the extlang's 'Prefix'.
3720
3721 * For example, "hak-CN" (Hakka, China) has the primary language
3722 subtag 'hak', which in turn has an 'extlang' record with a
3723 'Prefix' 'zh' (Chinese). The extlang form is "zh-hak-CN"
3724 (Chinese, Hakka, China).
3725
3726 * Note that Step 2 (prepending a prefix) can restore a subtag
3727 that was removed by Step 1 (canonicalizing).
3728
3729 Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical
3730 form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially
3731 valid (extensions 'a' and 'b' are not defined as of the publication
3732 of this document) but not in canonical form (the extensions are not
3733 in alphabetical order).
3734
3735 Example: Although the tag "en-BU" (English as used in Burma)
3736 maintains its validity, the language tag "en-BU" is not in canonical
3737 form because the 'BU' subtag has a canonical mapping to 'MM'
3738 (Myanmar).
3739
3740 Canonicalization of language tags does not imply anything about the
3741 use of upper- or lowercase letters when processing or comparing
3742 subtags (and as described in Section 2.1). All comparisons MUST be
3743 performed in a case-insensitive manner.
3744
3745 When performing canonicalization of language tags, processors MAY
3746 regularize the case of the subtags (that is, this process is
3747 OPTIONAL), following the case used in the registry (see
3748 Section 2.1.1).
3749
3750
3751
3752
3753
3754Phillips & Davis Best Current Practice [Page 67]
3755
3756RFC 5646 Language Tags September 2009
3757
3758
3759 If more than one variant appears within a tag, processors MAY reorder
3760 the variants to obtain better matching behavior or more consistent
3761 presentation. Reordering of the variants SHOULD follow the
3762 recommendations for variant ordering in Section 4.1.
3763
3764 If the field 'Deprecated' appears in a registry record without an
3765 accompanying 'Preferred-Value' field, then that tag or subtag is
3766 deprecated without a replacement. These values are canonical when
3767 they appear in a language tag. However, tags that include these
3768 values SHOULD NOT be selected by users or generated by
3769 implementations.
3770
3771 An extension MUST define any relationships that exist between the
3772 various subtags in the extension and thus MAY define an alternate
3773 canonicalization scheme for the extension's subtags. Extensions MAY
3774 define how the order of the extension's subtags is interpreted. For
3775 example, an extension could define that its subtags are in canonical
3776 order when the subtags are placed into ASCII order: that is, "en-a-
3777 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might
3778 define that the order of the subtags influences their semantic
3779 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
3780 aaa-bbb-ccc"). However, extension specifications SHOULD be designed
3781 so that they are tolerant of the typical processes described in
3782 Section 3.7.
3783
37844.6. Considerations for Private Use Subtags
3785
3786 Private use subtags, like all other subtags, MUST conform to the
3787 format and content constraints in the ABNF. Private use subtags have
3788 no meaning outside the private agreement between the parties that
3789 intend to use or exchange language tags that employ them. The same
3790 subtags MAY be used with a different meaning under a separate private
3791 agreement. They SHOULD NOT be used where alternatives exist and
3792 SHOULD NOT be used in content or protocols intended for general use.
3793
3794 Private use subtags are simply useless for information exchange
3795 without prior arrangement. The value and semantic meaning of private
3796 use tags and of the subtags used within such a language tag are not
3797 defined by this document.
3798
3799 Private use sequences introduced by the 'x' singleton are completely
3800 opaque to users or implementations outside of the private use
3801 agreement. So, in addition to private use subtag sequences
3802 introduced by the singleton subtag 'x', the Language Subtag Registry
3803 provides private use language, script, and region subtags derived
3804 from the private use codes assigned by the underlying standards.
3805 These subtags are valid for use in forming language tags; they are
3806 RECOMMENDED over the 'x' singleton private use subtag sequences
3807
3808
3809
3810Phillips & Davis Best Current Practice [Page 68]
3811
3812RFC 5646 Language Tags September 2009
3813
3814
3815 because they convey more information via their linkage to the
3816 language tag's inherent structure.
3817
3818 For example, the region subtags 'AA', 'ZZ', and those in the ranges
3819 'QM'-'QZ' and 'XA'-'XZ' (derived from the ISO 3166-1 private use
3820 codes) can be used to form a language tag. A tag such as
3821 "zh-Hans-XQ" conveys a great deal of public, interchangeable
3822 information about the language material (that it is Chinese in the
3823 simplified Chinese script and is suitable for some geographic region
3824 'XQ'). While the precise geographic region is not known outside of
3825 private agreement, the tag conveys far more information than an
3826 opaque tag such as "x-somelang" or even "zh-Hans-x-xq" (where the
3827 'xq' subtag's meaning is entirely opaque).
3828
3829 However, in some cases content tagged with private use subtags can
3830 interact with other systems in a different and possibly unsuitable
3831 manner compared to tags that use opaque, privately defined subtags,
3832 so the choice of the best approach sometimes depends on the
3833 particular domain in question.
3834
38355. IANA Considerations
3836
3837 This section deals with the processes and requirements necessary for
3838 IANA to maintain the subtag and extension registries as defined by
3839 this document and in accordance with the requirements of [RFC5226].
3840
3841 The impact on the IANA maintainers of the two registries defined by
3842 this document will be a small increase in the frequency of new
3843 entries or updates. IANA also is required to create a new mailing
3844 list (described below in Section 5.1) to announce registry changes
3845 and updates.
3846
38475.1. Language Subtag Registry
3848
3849 IANA updated the registry using instructions and content provided in
3850 a companion document [RFC5645]. The criteria and process for
3851 selecting the updated set of records are described in that document.
3852 The updated set of records represents no impact on IANA, since the
3853 work to create it will be performed externally.
3854
3855 Future work on the Language Subtag Registry includes the following
3856 activities:
3857
3858 o Inserting or replacing whole records. These records are
3859 preformatted for IANA by the Language Subtag Reviewer, as
3860 described in Section 3.3.
3861
3862 o Archiving and making publicly available the registration forms.
3863
3864
3865
3866Phillips & Davis Best Current Practice [Page 69]
3867
3868RFC 5646 Language Tags September 2009
3869
3870
3871 o Announcing each updated version of the registry on the
3872 "ietf-languages-announcements@iana.org" mailing list.
3873
3874 Each registration form sent to IANA contains a single record for
3875 incorporation into the registry. The form will be sent to
3876 <iana@iana.org> by the Language Subtag Reviewer. It will have a
3877 subject line indicating whether the enclosed form represents an
3878 insertion of a new record (indicated by the word "INSERT" in the
3879 subject line) or a replacement of an existing record (indicated by
3880 the word "MODIFY" in the subject line). At no time can a record be
3881 deleted from the registry.
3882
3883 IANA will extract the record from the form and place the inserted or
3884 modified record into the appropriate section of the Language Subtag
3885 Registry, grouping the records by their 'Type' field. Inserted
3886 records can be placed anywhere within the appropriate section; there
3887 is no guarantee that the registry's records will be placed in any
3888 particular order except that they will always be grouped by 'Type'.
3889 Modified records overwrite the record they replace.
3890
3891 Whenever an entry is created or modified in the registry, the 'File-
3892 Date' record at the start of the registry is updated to reflect the
3893 most recent modification date. The date format SHALL be the "full-
3894 date" format of [RFC3339]. The date SHALL be the date on which that
3895 version of the registry was first published by IANA. There SHALL be
3896 at most one version of the registry published in a day. A 'File-
3897 Date' record is also included in each request to IANA to insert or
3898 modify records, indicating the acceptance date of the records in the
3899 request.
3900
3901 The updated registry file MUST use the UTF-8 character encoding, and
3902 IANA MUST check the registry file for proper encoding. Non-ASCII
3903 characters can be sent to IANA by attaching the registration form to
3904 the email message or by using various encodings in the mail message
3905 body (UTF-8 is recommended). IANA will verify any unclear or
3906 corrupted characters with the Language Subtag Reviewer prior to
3907 posting the updated registry.
3908
3909 IANA will also archive and make publicly available from
3910 http://www.iana.org each registration form. Note that multiple
3911 registrations can pertain to the same record in the registry.
3912
3913 Developers who are dependent upon the Language Subtag Registry
3914 sometimes would like to be informed of changes in the registry so
3915 that they can update their implementations. When any change is made
3916 to the Language Subtag Registry, IANA will send an announcement
3917 message to <ietf-languages-announcements@iana.org> (a self-
3918 subscribing list to which only IANA can post).
3919
3920
3921
3922Phillips & Davis Best Current Practice [Page 70]
3923
3924RFC 5646 Language Tags September 2009
3925
3926
39275.2. Extensions Registry
3928
3929 The Language Tag Extensions Registry can contain at most 35 records,
3930 and thus changes to this registry are expected to be very infrequent.
3931
3932 Future work by IANA on the Language Tag Extensions Registry is
3933 limited to two cases. First, the IESG MAY request that new records
3934 be inserted into this registry from time to time. These requests
3935 MUST include the record to insert in the exact format described in
3936 Section 3.7. In addition, there MAY be occasional requests from the
3937 maintaining authority for a specific extension to update the contact
3938 information or URLs in the record. These requests MUST include the
3939 complete, updated record. IANA is not responsible for validating the
3940 information provided, only that it is properly formatted. IANA
3941 SHOULD take reasonable steps to ascertain that the request comes from
3942 the maintaining authority named in the record present in the
3943 registry.
3944
39456. Security Considerations
3946
3947 Language tags used in content negotiation, like any other information
3948 exchanged on the Internet, might be a source of concern because they
3949 might be used to infer the nationality of the sender, and thus
3950 identify potential targets for surveillance.
3951
3952 This is a special case of the general problem that anything sent is
3953 visible to the receiving party and possibly to third parties as well.
3954 It is useful to be aware that such concerns can exist in some cases.
3955
3956 The evaluation of the exact magnitude of the threat, and any possible
3957 countermeasures, is left to each application protocol (see BCP 72
3958 [RFC3552] for best current practice guidance on security threats and
3959 defenses).
3960
3961 The language tag associated with a particular information item is of
3962 no consequence whatsoever in determining whether that content might
3963 contain possible homographs. The fact that a text is tagged as being
3964 in one language or using a particular script subtag provides no
3965 assurance whatsoever that it does not contain characters from scripts
3966 other than the one(s) associated with or specified by that language
3967 tag.
3968
3969 Since there is no limit to the number of variant, private use, and
3970 extension subtags, and consequently no limit on the possible length
3971 of a tag, implementations need to guard against buffer overflow
3972 attacks. See Section 4.4 for details on language tag truncation,
3973 which can occur as a consequence of defenses against buffer overflow.
3974
3975
3976
3977
3978Phillips & Davis Best Current Practice [Page 71]
3979
3980RFC 5646 Language Tags September 2009
3981
3982
3983 To prevent denial-of-service attacks, applications SHOULD NOT depend
3984 on either the Language Subtag Registry or the Language Tag Extensions
3985 Registry being always accessible. Additionally, although the
3986 specification of valid subtags for an extension (see Section 3.7)
3987 MUST be available over the Internet, implementations SHOULD NOT
3988 mechanically depend on those sources being always accessible.
3989
3990 The registries specified in this document are not suitable for
3991 frequent or real-time access to, or retrieval of, the full registry
3992 contents. Most applications do not need registry data at all. For
3993 others, being able to validate or canonicalize language tags as of a
3994 particular registry date will be sufficient, as the registry contents
3995 change only occasionally. Changes are announced to
3996 <ietf-languages-announcements@iana.org>. This mailing list is
3997 intended for interested organizations and individuals, not for bulk
3998 subscription to trigger automatic software updates. The size of the
3999 registry makes it unsuitable for automatic software updates.
4000 Implementers considering integrating the Language Subtag Registry in
4001 an automatic updating scheme are strongly advised to distribute only
4002 suitably encoded differences, and only via their own infrastructure
4003 -- not directly from IANA.
4004
4005 Changes, or the absence thereof, can also easily be detected by
4006 looking at the 'File-Date' record at the start of the registry, or by
4007 using features of the protocol used for downloading, without having
4008 to download the full registry. At the time of publication of this
4009 document, IANA is making the Language Tag Registry available over
4010 HTTP 1.1. The proper way to update a local copy of the Language
4011 Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616].
4012
40137. Character Set Considerations
4014
4015 The syntax in this document requires that language tags use only the
4016 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
4017 character sets, so the composition of language tags shouldn't have
4018 any character set issues.
4019
4020 The rendering of text based on the language tag is not addressed
4021 here. Historically, some processes have relied on the use of
4022 character set/encoding information (or other external information) in
4023 order to infer how a specific string of characters should be
4024 rendered. Notably, this applies to language- and culture-specific
4025 variations of Han ideographs as used in Japanese, Chinese, and
4026 Korean, where use of, for example, a Japanese character encoding such
4027 as EUC-JP implies that the text itself is in Japanese. When language
4028 tags are applied to spans of text, rendering engines might be able to
4029 use that information to better select fonts or make other rendering
4030
4031
4032
4033
4034Phillips & Davis Best Current Practice [Page 72]
4035
4036RFC 5646 Language Tags September 2009
4037
4038
4039 choices, particularly where languages with distinct writing
4040 traditions use the same characters.
4041
40428. Changes from RFC 4646
4043
4044 The main goal for this revision of RFC 4646 was to incorporate two
4045 new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their attendant
4046 sets of language codes into the IANA Language Subtag Registry. This
4047 permits the identification of many more languages and language
4048 collections than previously supported.
4049
4050 The specific changes in this document to meet these goals are:
4051
4052 o Defined the incorporation of ISO 639-3 and ISO 639-5 codes for use
4053 as primary and extended language subtags. It also permanently
4054 reserves and disallows the use of additional 'extlang' subtags.
4055 The changes necessary to achieve this were:
4056
4057 * Modified the ABNF comments.
4058
4059 * Updated various registration and stability requirements
4060 sections to reference ISO 639-3 and ISO 639-5 in addition to
4061 ISO 639-1 and ISO 639-2.
4062
4063 * Edited the text to eliminate references to extended language
4064 subtags where they are no longer used.
4065
4066 * Explained the change in the section on extended language
4067 subtags.
4068
4069 o Changed the ABNF related to grandfathered tags. The irregular
4070 tags are now listed. Well-formed grandfathered tags are now
4071 described by the 'langtag' production, and the 'grandfathered'
4072 production was removed as a result. Also: added description of
4073 both types of grandfathered tags to Section 2.2.8.
4074
4075 o Added the paragraph on "collections" to Section 4.1.
4076
4077 o Changed the capitalization rules for 'Tag' fields in Section 3.1.
4078
4079 o Split Section 3.1 up into subsections.
4080
4081 o Modified Section 3.5 to allow 'Suppress-Script' fields to be
4082 added, modified, or removed via the registration process. This
4083 was an erratum from RFC 4646.
4084
4085 o Modified examples that used region code 'CS' (formerly Serbia and
4086 Montenegro) to use 'RS' (Serbia) instead.
4087
4088
4089
4090Phillips & Davis Best Current Practice [Page 73]
4091
4092RFC 5646 Language Tags September 2009
4093
4094
4095 o Modified the rules for creating and maintaining record
4096 'Description' fields to prevent duplicates, including inverted
4097 duplicates.
4098
4099 o Removed the lengthy description of why RFC 4646 was created from
4100 this section, which also caused the removal of the reference to
4101 XML Schema.
4102
4103 o Modified the text in Section 2.1 to place more emphasis on the
4104 fact that language tags are not case sensitive.
4105
4106 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"
4107 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the
4108 'Suppress-Script' on 'Latn' with 'fr'.
4109
4110 o Changed the requirements for well-formedness to make singleton
4111 repetition checking optional (it is required for validity
4112 checking) in Section 2.2.9.
4113
4114 o Changed the text in Section 2.2.9 referring to grandfathered
4115 checking to note that the list is now included in the ABNF.
4116
4117 o Modified and added text to Section 3.2. The job description was
4118 placed first. A note was added making clear that the Language
4119 Subtag Reviewer may delegate various non-critical duties,
4120 including list moderation. Finally, additional text was added to
4121 make the appointment process clear and to clarify that decisions
4122 and performance of the reviewer are appealable.
4123
4124 o Added text to Section 3.5 clarifying that the
4125 ietf-languages@iana.org list is operated by whomever the IESG
4126 appoints.
4127
4128 o Added text to Section 3.1.5 clarifying that the first Description
4129 in a 'language' record matches the corresponding Reference Name
4130 for the language in ISO 639-3.
4131
4132 o Modified Section 2.2.9 to define classes of conformance related to
4133 specific tags (formerly 'well-formed' and 'valid' referred to
4134 implementations). Notes were added about the removal of 'extlang'
4135 from the ABNF provided in RFC 4646, allowing for well-formedness
4136 using this older definition. Reference to RFC 3066 well-
4137 formedness was also added.
4138
4139 o Added text to the end of Section 3.1.2 noting that future versions
4140 of this document might add new field types to the registry format
4141 and recommending that implementations ignore any unrecognized
4142 fields.
4143
4144
4145
4146Phillips & Davis Best Current Practice [Page 74]
4147
4148RFC 5646 Language Tags September 2009
4149
4150
4151 o Added text about what the lack of a 'Suppress-Script' field means
4152 in a record to Section 3.1.9.
4153
4154 o Added text allowing the correction of misspellings and typographic
4155 errors to Section 3.1.5.
4156
4157 o Added text to Section 3.1.8 disallowing 'Prefix' field conflicts
4158 (such as circular prefix references).
4159
4160 o Modified text in Section 3.5 to require the subtag reviewer to
4161 announce his/her decision (or extension) following the two-week
4162 period. Also clarified that any decision or failure to decide can
4163 be appealed.
4164
4165 o Modified text in Section 4.1 to include the (heretofore anecdotal)
4166 guiding principle of tag choice, and clarifying the non-use of
4167 script subtags in non-written applications.
4168
4169 o Prohibited multiple use of the same variant in a tag (i.e., "de-
4170 1901-1901"). Previously, this was only a recommendation
4171 ("SHOULD").
4172
4173 o Removed inappropriate [RFC2119] language from the illustration in
4174 Section 4.4.1.
4175
4176 o Replaced the example of deprecating "zh-guoyu" with "zh-
4177 hakka"->"hak" in Section 4.5, noting that it was this document
4178 that caused the change.
4179
4180 o Replaced the section in Section 4.1 dealing with "mul"/"und" to
4181 include the subtags 'zxx' and 'mis', as well as the tag
4182 "i-default". A normative reference to RFC 2277 was added.
4183
4184 o Added text to Section 3.5 clarifying that any modifications of a
4185 registration request must be sent to the <ietf-languages@iana.org>
4186 list before submission to IANA.
4187
4188 o Changed the ABNF for the record-jar format from using the LWSP
4189 production to use a folding whitespace production similar to obs-
4190 FWS in [RFC5234]. This effectively prevents unintentional blank
4191 lines inside a field.
4192
4193 o Clarified and revised text in Sections 3.3, 3.5, and 5.1 to
4194 clarify that the Language Subtag Reviewer sends the complete
4195 registration forms to IANA, that IANA extracts the record from the
4196 form, and that the forms must also be archived separately from the
4197 registry.
4198
4199
4200
4201
4202Phillips & Davis Best Current Practice [Page 75]
4203
4204RFC 5646 Language Tags September 2009
4205
4206
4207 o Added text to Section 5 requiring IANA to send an announcement to
4208 an ietf-languages-announcements list whenever the registry is
4209 updated.
4210
4211 o Modification of the registry to use UTF-8 as its character
4212 encoding. This also entails additional instructions to IANA and
4213 the Language Subtag Reviewer in the registration process.
4214
4215 o Modified the rules in Section 2.2.4 so that "exceptionally
4216 reserved" ISO 3166-1 codes other than 'UK' were included into the
4217 registry. In particular, this allows the code 'EU' (European
4218 Union) to be used to form language tags or (more commonly) for
4219 applications that use the registry for region codes to reference
4220 this subtag.
4221
4222 o Modified the IANA considerations section (Section 5) to remove
4223 unnecessary normative [RFC2119] language.
4224
42259. References
4226
42279.1. Normative References
4228
4229 [ISO15924] International Organization for Standardization, "ISO
4230 15924:2004. Information and documentation -- Codes
4231 for the representation of names of scripts",
4232 January 2004.
4233
4234 [ISO3166-1] International Organization for Standardization, "ISO
4235 3166-1:2006. Codes for the representation of names
4236 of countries and their subdivisions -- Part 1:
4237 Country codes", November 2006.
4238
4239 [ISO639-1] International Organization for Standardization, "ISO
4240 639-1:2002. Codes for the representation of names
4241 of languages -- Part 1: Alpha-2 code", July 2002.
4242
4243 [ISO639-2] International Organization for Standardization, "ISO
4244 639-2:1998. Codes for the representation of names
4245 of languages -- Part 2: Alpha-3 code", October 1998.
4246
4247 [ISO639-3] International Organization for Standardization, "ISO
4248 639-3:2007. Codes for the representation of names
4249 of languages - Part 3: Alpha-3 code for
4250 comprehensive coverage of languages", February 2007.
4251
4252
4253
4254
4255
4256
4257
4258Phillips & Davis Best Current Practice [Page 76]
4259
4260RFC 5646 Language Tags September 2009
4261
4262
4263 [ISO639-5] International Organization for Standardization, "ISO
4264 639-5:2008. Codes for the representation of names of
4265 languages -- Part 5: Alpha-3 code for language
4266 families and groups", May 2008.
4267
4268 [ISO646] International Organization for Standardization,
4269 "ISO/IEC 646:1991, Information technology -- ISO
4270 7-bit coded character set for information
4271 interchange.", 1991.
4272
4273 [RFC2026] Bradner, S., "The Internet Standards Process --
4274 Revision 3", BCP 9, RFC 2026, October 1996.
4275
4276 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
4277 Requirement Levels", BCP 14, RFC 2119, March 1997.
4278
4279 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
4280 Languages", BCP 18, RFC 2277, January 1998.
4281
4282 [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
4283 Internet: Timestamps", RFC 3339, July 2002.
4284
4285 [RFC4647] Phillips, A. and M. Davis, "Matching of Language
4286 Tags", BCP 47, RFC 4647, September 2006.
4287
4288 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for
4289 Writing an IANA Considerations Section in RFCs",
4290 BCP 26, RFC 5226, May 2008.
4291
4292 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for
4293 Syntax Specifications: ABNF", STD 68, RFC 5234,
4294 January 2008.
4295
4296 [SpecialCasing] The Unicode Consoritum, "Unicode Character Database,
4297 Special Casing Properties", March 2008, <http://
4298 unicode.org/Public/UNIDATA/SpecialCasing.txt>.
4299
4300 [UAX14] Freitag, A., "Unicode Standard Annex #14: Line
4301 Breaking Properties", August 2006,
4302 <http://www.unicode.org/reports/tr14/>.
4303
4304 [UN_M.49] Statistics Division, United Nations, "Standard
4305 Country or Area Codes for Statistical Use", Revision
4306 4 (United Nations publication, Sales No. 98.XVII.9,
4307 June 1999.
4308
4309
4310
4311
4312
4313
4314Phillips & Davis Best Current Practice [Page 77]
4315
4316RFC 5646 Language Tags September 2009
4317
4318
4319 [Unicode] Unicode Consortium, "The Unicode Consortium. The
4320 Unicode Standard, Version 5.0, (Boston, MA, Addison-
4321 Wesley, 2003. ISBN 0-321-49081-0)", January 2007.
4322
43239.2. Informative References
4324
4325 [CLDR] "The Common Locale Data Repository Project",
4326 <http://cldr.unicode.org>.
4327
4328 [RFC1766] Alvestrand, H., "Tags for the Identification of
4329 Languages", RFC 1766, March 1995.
4330
4331 [RFC2028] Hovey, R. and S. Bradner, "The Organizations
4332 Involved in the IETF Standards Process", BCP 11,
4333 RFC 2028, October 1996.
4334
4335 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet
4336 Mail Extensions (MIME) Part Two: Media Types",
4337 RFC 2046, November 1996.
4338
4339 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
4340 Extensions) Part Three: Message Header Extensions
4341 for Non-ASCII Text", RFC 2047, November 1996.
4342
4343 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
4344 Encoded Word Extensions:
4345 Character Sets, Languages, and Continuations",
4346 RFC 2231, November 1997.
4347
4348 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
4349 Masinter, L., Leach, P., and T. Berners-Lee,
4350 "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,
4351 June 1999.
4352
4353 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of
4354 ISO 10646", RFC 2781, February 2000.
4355
4356 [RFC3066] Alvestrand, H., "Tags for the Identification of
4357 Languages", RFC 3066, January 2001.
4358
4359 [RFC3282] Alvestrand, H., "Content Language Headers",
4360 RFC 3282, May 2002.
4361
4362 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing
4363 RFC Text on Security Considerations", BCP 72,
4364 RFC 3552, July 2003.
4365
4366
4367
4368
4369
4370Phillips & Davis Best Current Practice [Page 78]
4371
4372RFC 5646 Language Tags September 2009
4373
4374
4375 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
4376 10646", STD 63, RFC 3629, November 2003.
4377
4378 [RFC4645] Ewell, D., "Initial Language Subtag Registry",
4379 RFC 4645, September 2006.
4380
4381 [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying
4382 Languages", BCP 47, RFC 4646, September 2006.
4383
4384 [RFC5645] Ewell, D., Ed., "Update to the Language Subtag
4385 Registry", September 2009.
4386
4387 [UTS35] Davis, M., "Unicode Technical Standard #35: Locale
4388 Data Markup Language (LDML)", December 2007,
4389 <http://www.unicode.org/reports/tr35/>.
4390
4391 [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint
4392 Advisory Committee: Working principles for ISO 639
4393 maintenance", March 2000, <http://www.loc.gov/
4394 standards/iso639-2/iso639jac_n3r.html>.
4395
4396 [record-jar] Raymond, E., "The Art of Unix Programming", 2003,
4397 <urn:isbn:0-13-142901-9>.
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
4423
4424
4425
4426Phillips & Davis Best Current Practice [Page 79]
4427
4428RFC 5646 Language Tags September 2009
4429
4430
4431Appendix A. Examples of Language Tags (Informative)
4432
4433 Simple language subtag:
4434
4435 de (German)
4436
4437 fr (French)
4438
4439 ja (Japanese)
4440
4441 i-enochian (example of a grandfathered tag)
4442
4443 Language subtag plus Script subtag:
4444
4445 zh-Hant (Chinese written using the Traditional Chinese script)
4446
4447 zh-Hans (Chinese written using the Simplified Chinese script)
4448
4449 sr-Cyrl (Serbian written using the Cyrillic script)
4450
4451 sr-Latn (Serbian written using the Latin script)
4452
4453 Extended language subtags and their primary language subtag
4454 counterparts:
4455
4456 zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in
4457 China)
4458
4459 cmn-Hans-CN (Mandarin Chinese, Simplified script, as used in
4460 China)
4461
4462 zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR)
4463
4464 yue-HK (Cantonese Chinese, as used in Hong Kong SAR)
4465
4466 Language-Script-Region:
4467
4468 zh-Hans-CN (Chinese written using the Simplified script as used in
4469 mainland China)
4470
4471 sr-Latn-RS (Serbian written using the Latin script as used in
4472 Serbia)
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482Phillips & Davis Best Current Practice [Page 80]
4483
4484RFC 5646 Language Tags September 2009
4485
4486
4487 Language-Variant:
4488
4489 sl-rozaj (Resian dialect of Slovenian)
4490
4491 sl-rozaj-biske (San Giorgio dialect of Resian dialect of
4492 Slovenian)
4493
4494 sl-nedis (Nadiza dialect of Slovenian)
4495
4496 Language-Region-Variant:
4497
4498 de-CH-1901 (German as used in Switzerland using the 1901 variant
4499 [orthography])
4500
4501 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)
4502
4503 Language-Script-Region-Variant:
4504
4505 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as
4506 used in Italy)
4507
4508 Language-Region:
4509
4510 de-DE (German for Germany)
4511
4512 en-US (English as used in the United States)
4513
4514 es-419 (Spanish appropriate for the Latin America and Caribbean
4515 region using the UN region code)
4516
4517 Private use subtags:
4518
4519 de-CH-x-phonebk
4520
4521 az-Arab-x-AZE-derbend
4522
4523 Private use registry values:
4524
4525 x-whatever (private use using the singleton 'x')
4526
4527 qaa-Qaaa-QM-x-southern (all private tags)
4528
4529 de-Qaaa (German, with a private script)
4530
4531 sr-Latn-QM (Serbian, Latin script, private region)
4532
4533 sr-Qaaa-RS (Serbian, private script, for Serbia)
4534
4535
4536
4537
4538Phillips & Davis Best Current Practice [Page 81]
4539
4540RFC 5646 Language Tags September 2009
4541
4542
4543 Tags that use extensions (examples ONLY -- extensions MUST be defined
4544 by revision or update to this document, or by RFC):
4545
4546 en-US-u-islamcal
4547
4548 zh-CN-a-myext-x-private
4549
4550 en-a-myext-b-another
4551
4552 Some Invalid Tags:
4553
4554 de-419-DE (two region tags)
4555
4556 a-DE (use of a single-character subtag in primary position; note
4557 that there are a few grandfathered tags that start with "i-" that
4558 are valid)
4559
4560 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
4561 prefix)
4562
4563Appendix B. Examples of Registration Forms
4564
4565 LANGUAGE SUBTAG REGISTRATION FORM
4566
4567 1. Name of requester: Han Steenwijk
4568 2. E-mail address of requester: han.steenwijk @ unipd.it
4569 3. Record Requested:
4570
4571 Type: variant
4572 Subtag: biske
4573 Description: The San Giorgio dialect of Resian
4574 Description: The Bila dialect of Resian
4575 Prefix: sl-rozaj
4576 Comments: The dialect of San Giorgio/Bila is one of the
4577 four major local dialects of Resian
4578
4579 4. Intended meaning of the subtag:
4580
4581 The local variety of Resian as spoken in San Giorgio/Bila
4582
4583 5. Reference to published description of the language (book or
4584 article):
4585
4586 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich
4587 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875.
4588
4589
4590
4591
4592
4593
4594Phillips & Davis Best Current Practice [Page 82]
4595
4596RFC 5646 Language Tags September 2009
4597
4598
4599 LANGUAGE SUBTAG REGISTRATION FORM
4600
4601 1. Name of requester: Jaska Zedlik
4602 2. E-mail address of requester: jz53 @ zedlik.com
4603 3. Record Requested:
4604
4605 Type: variant
4606 Subtag: tarask
4607 Description: Belarusian in Taraskievica orthography
4608 Prefix: be
4609 Comments: The subtag represents Branislau Taraskievic's Belarusian
4610 orthography as published in "Bielaruski klasycny pravapis" by
4611 Juras Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka
4612 (Vilnia-Miensk 2005).
4613
4614 4. Intended meaning of the subtag:
4615
4616 The subtag is intended to represent the Belarusian orthography as
4617 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk
4618 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005).
4619
4620 5. Reference to published description of the language (book or
4621 article):
4622
4623 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd.
4624 "Bielaruskaha kamitetu", 1929, 5th edition.
4625
4626 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier.
4627 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005.
4628
4629 6. Any other relevant information:
4630
4631 Belarusian in Taraskievica orthography became widely used, especially
4632 in Belarusian-speaking Internet segment, but besides this some books
4633 and newspapers are also printed using this orthography of Belarusian.
4634
4635Appendix C. Acknowledgements
4636
4637 Any list of contributors is bound to be incomplete; please regard the
4638 following as only a selection from the group of people who have
4639 contributed to make this document what it is today.
4640
4641 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the
4642 precursors of this document, made enormous contributions directly or
4643 indirectly to this document and are generally responsible for the
4644 success of language tags.
4645
4646
4647
4648
4649
4650Phillips & Davis Best Current Practice [Page 83]
4651
4652RFC 5646 Language Tags September 2009
4653
4654
4655 The following people contributed to this document:
4656
4657 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,
4658 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion
4659 Gunn, Alfred Hoenes, Kent Karlsson, Chris Newman, Randy Presuhn,
4660 Stephen Silver, Shawn Steele, and many, many others.
4661
4662 Very special thanks must go to Harald Tveit Alvestrand, who
4663 originated RFCs 1766 and 3066, and without whom this document would
4664 not have been possible.
4665
4666 Special thanks go to Michael Everson, who served as the Language Tag
4667 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as
4668 the Language Subtag Reviewer since the adoption of RFC 4646.
4669
4670 Special thanks also go to Doug Ewell, for his production of the first
4671 complete subtag registry, his work to support and maintain new
4672 registrations, and his careful editorship of both RFC 4645 and
4673 [RFC5645].
4674
4675Authors' Addresses
4676
4677 Addison Phillips (editor)
4678 Lab126
4679
4680 EMail: addison@inter-locale.com
4681 URI: http://www.inter-locale.com
4682
4683
4684 Mark Davis (editor)
4685 Google
4686
4687 EMail: markdavis@google.com
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706Phillips & Davis Best Current Practice [Page 84]
4707
4708