7Network Working Group A. Phillips, Ed.

8Request for Comments: 5646 Lab126

9BCP: 47 M. Davis, Ed.

10Obsoletes: 4646 Google

11Category: Best Current Practice September 2009

14 Tags for Identifying Languages

16Abstract

18 This document describes the structure, content, construction, and

19 semantics of language tags for use in cases where it is desirable to

20 indicate the language used in an information object. It also

21 describes how to register values for use in language tags and the

22 creation of user-defined extensions for private interchange.

24Status of This Memo

26 This document specifies an Internet Best Current Practices for the

27 Internet Community, and requests discussion and suggestions for

28 improvements. Distribution of this memo is unlimited.

30Copyright Notice

35 This document is subject to BCP 78 and the IETF Trust's Legal

36 Provisions Relating to IETF Documents in effect on the date of

37 publication of this document (http://trustee.ietf.org/license-info).

38 Please review these documents carefully, as they describe your rights

39 and restrictions with respect to this document.

41 This document may contain material from IETF Documents or IETF

42 Contributions published or made publicly available before November

43 10, 2008. The person(s) controlling the copyright in some of this

44 material may not have granted the IETF Trust the right to allow

45 modifications of such material outside the IETF Standards Process.

46 Without obtaining an adequate license from the person(s) controlling

47 the copyright in such materials, this document may not be modified

48 outside the IETF Standards Process, and derivative works of it may

49 not be created outside the IETF Standards Process, except to format

50 it for publication as an RFC or to translate it into languages other

51 than English.

58Phillips & Davis Best Current Practice [Page 1]

60RFC 5646 Language Tags September 2009

63Table of Contents

65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3

66 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4

67 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4

68 2.1.1. Formatting of Language Tags . . . . . . . . . . . . . 6

69 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8

70 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . . 9

71 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11

72 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12

73 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13

74 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15

75 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16

76 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 18

77 2.2.8. Grandfathered and Redundant Registrations . . . . . . 18

78 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 19

79 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 21

80 3.1. Format of the IANA Language Subtag Registry . . . . . . . 21

81 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 21

82 3.1.2. Record and Field Definitions . . . . . . . . . . . . . 23

83 3.1.3. Type Field . . . . . . . . . . . . . . . . . . . . . . 26

84 3.1.4. Subtag and Tag Fields . . . . . . . . . . . . . . . . 26

85 3.1.5. Description Field . . . . . . . . . . . . . . . . . . 26

86 3.1.6. Deprecated Field . . . . . . . . . . . . . . . . . . . 28

87 3.1.7. Preferred-Value Field . . . . . . . . . . . . . . . . 28

88 3.1.8. Prefix Field . . . . . . . . . . . . . . . . . . . . . 31

89 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 32

90 3.1.10. Macrolanguage Field . . . . . . . . . . . . . . . . . 32

91 3.1.11. Scope Field . . . . . . . . . . . . . . . . . . . . . 33

92 3.1.12. Comments Field . . . . . . . . . . . . . . . . . . . . 34

93 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 35

94 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 35

95 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 36

96 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 41

97 3.6. Possibilities for Registration . . . . . . . . . . . . . . 46

98 3.7. Extensions and the Extensions Registry . . . . . . . . . . 49

99 3.8. Update of the Language Subtag Registry . . . . . . . . . . 52

100 3.9. Applicability of the Subtag Registry . . . . . . . . . . . 52

101 4. Formation and Processing of Language Tags . . . . . . . . . . 53

102 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 53

103 4.1.1. Tagging Encompassed Languages . . . . . . . . . . . . 58

104 4.1.2. Using Extended Language Subtags . . . . . . . . . . . 59

105 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 61

106 4.3. Lists of Languages . . . . . . . . . . . . . . . . . . . . 63

107 4.4. Length Considerations . . . . . . . . . . . . . . . . . . 63

108 4.4.1. Working with Limited Buffer Sizes . . . . . . . . . . 64

109 4.4.2. Truncation of Language Tags . . . . . . . . . . . . . 65

110 4.5. Canonicalization of Language Tags . . . . . . . . . . . . 66

111

112

113

114Phillips & Davis Best Current Practice [Page 2]

115

116RFC 5646 Language Tags September 2009

117

118

119 4.6. Considerations for Private Use Subtags . . . . . . . . . . 68

120 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 69

121 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 69

122 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 71

123 6. Security Considerations . . . . . . . . . . . . . . . . . . . 71

124 7. Character Set Considerations . . . . . . . . . . . . . . . . . 72

125 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 73

126 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 76

127 9.1. Normative References . . . . . . . . . . . . . . . . . . . 76

128 9.2. Informative References . . . . . . . . . . . . . . . . . . 78

129 Appendix A. Examples of Language Tags (Informative) . . . . . . . 80

130 Appendix B. Examples of Registration Forms . . . . . . . . . . . 82

131 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 83

132

1331. Introduction

134

135 Human beings on our planet have, past and present, used a number of

136 languages. There are many reasons why one would want to identify the

137 language used when presenting or requesting information.

138

139 The language of an information item or a user's language preferences

140 often need to be identified so that appropriate processing can be

141 applied. For example, the user's language preferences in a Web

142 browser can be used to select Web pages appropriately. Language

143 information can also be used to select among tools (such as

144 dictionaries) to assist in the processing or understanding of content

145 in different languages. Knowledge about the particular language used

146 by some piece of information content might be useful or even required

147 by some types of processing, for example, spell-checking, computer-

148 synthesized speech, Braille transcription, or high-quality print

149 renderings.

150

151 One means of indicating the language used is by labeling the

152 information content with an identifier or "tag". These tags can also

153 be used to specify the user's preferences when selecting information

154 content or to label additional attributes of content and associated

155 resources.

156

157 Sometimes language tags are used to indicate additional language

158 attributes of content. For example, indicating specific information

159 about the dialect, writing system, or orthography used in a document

160 or resource may enable the user to obtain information in a form that

161 they can understand, or it can be important in processing or

162 rendering the given content into an appropriate form or style.

163

164 This document specifies a particular identifier mechanism (the

165 language tag) and a registration function for values to be used to

166

167

168

169

170Phillips & Davis Best Current Practice [Page 3]

171

172RFC 5646 Language Tags September 2009

173

174

175 form tags. It also defines a mechanism for private use values and

176 future extensions.

177

178 This document replaces [RFC4646] (which obsoleted [RFC3066] which, in

179 turn, replaced [RFC1766]). This document, in combination with

180 [RFC4647], comprises BCP 47. For a list of changes in this document,

181 see Section 8.

182

183 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

184 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

185 document are to be interpreted as described in [RFC2119].

186

1872. The Language Tag

188

189 Language tags are used to help identify languages, whether spoken,

190 written, signed, or otherwise signaled, for the purpose of

191 communication. This includes constructed and artificial languages

192 but excludes languages not intended primarily for human

193 communication, such as programming languages.

194

1952.1. Syntax

196

197 A language tag is composed from a sequence of one or more "subtags",

198 each of which refines or narrows the range of language identified by

199 the overall tag. Subtags, in turn, are a sequence of alphanumeric

200 characters (letters and digits), distinguished and separated from

201 other subtags in a tag by a hyphen ("-", [Unicode] U+002D).

202

203 There are different types of subtag, each of which is distinguished

204 by length, position in the tag, and content: each subtag's type can

205 be recognized solely by these features. This makes it possible to

206 extract and assign some semantic information to the subtags, even if

207 the specific subtag values are not recognized. Thus, a language tag

208 processor need not have a list of valid tags or subtags (that is, a

209 copy of some version of the IANA Language Subtag Registry) in order

210 to perform common searching and matching operations. The only

211 exceptions to this ability to infer meaning from subtag structure are

212 the grandfathered tags listed in the productions 'regular' and

213 'irregular' below. These tags were registered under [RFC3066] and

214 are a fixed list that can never change.

215

216 The syntax of the language tag in ABNF [RFC5234] is:

217

218 Language-Tag = langtag ; normal language tags 3282:86 todo: ../imapserver/fetch.go:1046

219 / privateuse ; private use tag

220 / grandfathered ; grandfathered tags

221

222

223

224

225

226Phillips & Davis Best Current Practice [Page 4]

227

228RFC 5646 Language Tags September 2009

229

230

231 langtag = language

232 ["-" script]

233 ["-" region]

234 *("-" variant)

235 *("-" extension)

236 ["-" privateuse]

237

238 language = 2*3ALPHA ; shortest ISO 639 code

239 ["-" extlang] ; sometimes followed by

240 ; extended language subtags

241 / 4ALPHA ; or reserved for future use

242 / 5*8ALPHA ; or registered language subtag

243

244 extlang = 3ALPHA ; selected ISO 639 codes

245 *2("-" 3ALPHA) ; permanently reserved

246

247 script = 4ALPHA ; ISO 15924 code

248

249 region = 2ALPHA ; ISO 3166-1 code

250 / 3DIGIT ; UN M.49 code

251

252 variant = 5*8alphanum ; registered variants

253 / (DIGIT 3alphanum)

254

255 extension = singleton 1*("-" (2*8alphanum))

256

257 ; Single alphanumerics

258 ; "x" reserved for private use

259 singleton = DIGIT ; 0 - 9

260 / %x41-57 ; A - W

261 / %x59-5A ; Y - Z

262 / %x61-77 ; a - w

263 / %x79-7A ; y - z

264

265 privateuse = "x" 1*("-" (1*8alphanum))

266

267 grandfathered = irregular ; non-redundant tags registered

268 / regular ; during the RFC 3066 era

269

270 irregular = "en-GB-oed" ; irregular tags do not match

271 / "i-ami" ; the 'langtag' production and

272 / "i-bnn" ; would not otherwise be

273 / "i-default" ; considered 'well-formed'

274 / "i-enochian" ; These tags are all valid,

275 / "i-hak" ; but most are deprecated

276 / "i-klingon" ; in favor of more modern

277 / "i-lux" ; subtags or subtag

278 / "i-mingo" ; combination

279

280

281

282Phillips & Davis Best Current Practice [Page 5]

283

284RFC 5646 Language Tags September 2009

285

286

287 / "i-navajo"

288 / "i-pwn"

289 / "i-tao"

290 / "i-tay"

291 / "i-tsu"

292 / "sgn-BE-FR"

293 / "sgn-BE-NL"

294 / "sgn-CH-DE"

295

296 regular = "art-lojban" ; these tags match the 'langtag'

297 / "cel-gaulish" ; production, but their subtags

298 / "no-bok" ; are not extended language

299 / "no-nyn" ; or variant subtags: their meaning

300 / "zh-guoyu" ; is defined by their registration

301 / "zh-hakka" ; and all of these are deprecated

302 / "zh-min" ; in favor of a more modern

303 / "zh-min-nan" ; subtag or sequence of subtags

304 / "zh-xiang"

305

306 alphanum = (ALPHA / DIGIT) ; letters and numbers

307

308 Figure 1: Language Tag ABNF

309

310 For examples of language tags, see Appendix A.

311

312 All subtags have a maximum length of eight characters. Whitespace is

313 not permitted in a language tag. There is a subtlety in the ABNF

314 production 'variant': a variant starting with a digit has a minimum

315 length of four characters, while those starting with a letter have a

316 minimum length of five characters.

317

318 Although [RFC5234] refers to octets, the language tags described in

319 this document are sequences of characters from the US-ASCII [ISO646]

320 repertoire. Language tags MAY be used in documents and applications

321 that use other encodings, so long as these encompass the relevant

322 part of the US-ASCII repertoire. An example of this would be an XML

323 document that uses the UTF-16LE [RFC2781] encoding of [Unicode].

324

3252.1.1. Formatting of Language Tags

326

327 At all times, language tags and their subtags, including private use

328 and extensions, are to be treated as case insensitive: there exist

329 conventions for the capitalization of some of the subtags, but these

330 MUST NOT be taken to carry meaning.

331

332 Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-

333 cYrL-Mn" (or any other combination), and each of these variations

334

335

336

337

338Phillips & Davis Best Current Practice [Page 6]

339

340RFC 5646 Language Tags September 2009

341

342

343 conveys the same meaning: Mongolian written in the Cyrillic script as

344 used in Mongolia.

345

346 The ABNF syntax also does not distinguish between upper- and

347 lowercase: the uppercase US-ASCII letters in the range 'A' through

348 'Z' are always considered equivalent and mapped directly to their US-

349 ASCII lowercase equivalents in the range 'a' through 'z'. So the tag

350 "I-AMI" is considered equivalent to that value "i-ami" in the

351 'irregular' production.

352

353 Although case distinctions do not carry meaning in language tags,

354 consistent formatting and presentation of language tags will aid

355 users. The format of subtags in the registry is RECOMMENDED as the

356 form to use in language tags. This format generally corresponds to

357 the common conventions for the various ISO standards from which the

358 subtags are derived.

359

360 These conventions include:

361

362 o [ISO639-1] recommends that language codes be written in lowercase

363 ('mn' Mongolian).

364

365 o [ISO15924] recommends that script codes use lowercase with the

366 initial letter capitalized ('Cyrl' Cyrillic).

367

368 o [ISO3166-1] recommends that country codes be capitalized ('MN'

369 Mongolia).

370

371 An implementation can reproduce this format without accessing the

372 registry as follows. All subtags, including extension and private

373 use subtags, use lowercase letters with two exceptions: two-letter

374 and four-letter subtags that neither appear at the start of the tag

375 nor occur after singletons. Such two-letter subtags are all

376 uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four-

377 letter subtags are titlecase (as in the tag "az-Latn-x-latn").

378

379 Note: Case folding of ASCII letters in certain locales, unless

380 carefully handled, sometimes produces non-ASCII character values.

381 The Unicode Character Database file "SpecialCasing.txt"

382 [SpecialCasing] defines the specific cases that are known to cause

383 problems with this. In particular, the letter 'i' (U+0069) in

384 Turkish and Azerbaijani is uppercased to U+0130 (LATIN CAPITAL LETTER

385 I WITH DOT ABOVE). Implementers SHOULD specify a locale-neutral

386 casing operation to ensure that case folding of subtags does not

387 produce this value, which is illegal in language tags. For example,

388 if one were to uppercase the region subtag 'in' using Turkish locale

389 rules, the sequence U+0130 U+004E would result, instead of the

390 expected 'IN'.

391

392

393

394Phillips & Davis Best Current Practice [Page 7]

395

396RFC 5646 Language Tags September 2009

397

398

3992.2. Language Subtag Sources and Interpretation

400

401 The namespace of language tags and their subtags is administered by

402 the Internet Assigned Numbers Authority (IANA) according to the rules

403 in Section 5 of this document. The Language Subtag Registry

404 maintained by IANA is the source for valid subtags: other standards

405 referenced in this section provide the source material for that

406 registry.

407

408 Terminology used in this document:

409

410 o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or

411 "az-Arab-IR". Examples of tags in this document are enclosed in

412 double-quotes ("en-US").

413

414 o "Subtag" refers to a specific section of a tag, delimited by a

415 hyphen, such as the subtags 'zh', 'Hant', and 'CN' in the tag "zh-

416 Hant-CN". Examples of subtags in this document are enclosed in

417 single quotes ('Hant').

418

419 o "Code" refers to values defined in external standards (and that

420 are used as subtags in this document). For example, 'Hant' is an

421 [ISO15924] script code that was used to define the 'Hant' script

422 subtag for use in a language tag. Examples of codes in this

423 document are enclosed in single quotes ('en', 'Hant').

424

425 Language tags are designed so that each subtag type has unique length

426 and content restrictions. These make identification of the subtag's

427 type possible, even if the content of the subtag itself is

428 unrecognized. This allows tags to be parsed and processed without

429 reference to the latest version of the underlying standards or the

430 IANA registry and makes the associated exception handling when

431 parsing tags simpler.

432

433 Some of the subtags in the IANA registry do not come from an

434 underlying standard. These can only appear in specific positions in

435 a tag: they can only occur as primary language subtags or as variant

436 subtags.

437

438 Sequences of private use and extension subtags MUST occur at the end

439 of the sequence of subtags and MUST NOT be interspersed with subtags

440 defined elsewhere in this document. These sequences are introduced

441 by single-character subtags, which are reserved as follows:

442

443 o The single-letter subtag 'x' introduces a sequence of private use

444 subtags. The interpretation of any private use subtag is defined

445

446

447

448

449

450Phillips & Davis Best Current Practice [Page 8]

451

452RFC 5646 Language Tags September 2009

453

454

455 solely by private agreement and is not defined by the rules in

456 this section or in any standard or registry defined in this

457 document.

458

459 o The single-letter subtag 'i' is used by some grandfathered tags,

460 such as "i-default", where it always appears in the first position

461 and cannot be confused with an extension.

462

463 o All other single-letter and single-digit subtags are reserved to

464 introduce standardized extension subtag sequences as described in

465 Section 3.7.

466

4672.2.1. Primary Language Subtag

468

469 The primary language subtag is the first subtag in a language tag and

470 cannot be omitted, with two exceptions:

471

472 o The single-character subtag 'x' as the primary subtag indicates

473 that the language tag consists solely of subtags whose meaning is

474 defined by private agreement. For example, in the tag "x-fr-CH",

475 the subtags 'fr' and 'CH' do not represent the French language or

476 the country of Switzerland (or any other value in the IANA

477 registry) unless there is a private agreement in place to do so.

478 See Section 4.6.

479

480 o The single-character subtag 'i' is used by some grandfathered tags

481 (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other

482 grandfathered tags have a primary language subtag in their first

483 position.)

484

485 The following rules apply to the primary language subtag:

486

487 1. Two-character primary language subtags were defined in the IANA

488 registry according to the assignments found in the standard "ISO

489 639-1:2002, Codes for the representation of names of languages --

490 Part 1: Alpha-2 code" [ISO639-1], or using assignments

491 subsequently made by the ISO 639-1 registration authority (RA) or

492 governing standardization bodies.

493

494 2. Three-character primary language subtags in the IANA registry

495 were defined according to the assignments found in one of these

496 additional ISO 639 parts or assignments subsequently made by the

497 relevant ISO 639 registration authorities or governing

498 standardization bodies:

499

500 A. "ISO 639-2:1998 - Codes for the representation of names of

501 languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2]

502

503

504

505

506Phillips & Davis Best Current Practice [Page 9]

507

508RFC 5646 Language Tags September 2009

509

510

511 B. "ISO 639-3:2007 - Codes for the representation of names of

512 languages -- Part 3: Alpha-3 code for comprehensive coverage

513 of languages" [ISO639-3]

514

515 C. "ISO 639-5:2008 - Codes for the representation of names of

516 languages -- Part 5: Alpha-3 code for language families and

517 groups" [ISO639-5]

518

519 3. The subtags in the range 'qaa' through 'qtz' are reserved for

520 private use in language tags. These subtags correspond to codes

521 reserved by ISO 639-2 for private use. These codes MAY be used

522 for non-registered primary language subtags (instead of using

523 private use subtags following 'x-'). Please refer to Section 4.6

524 for more information on private use subtags.

525

526 4. Four-character language subtags are reserved for possible future

527 standardization.

528

529 5. Any language subtags of five to eight characters in length in the

530 IANA registry were defined via the registration process in

531 Section 3.5 and MAY be used to form the primary language subtag.

532 An example of what such a registration might include is the

533 grandfathered IANA registration "i-enochian". The subtag

534 'enochian' could be registered in the IANA registry as a primary

535 language subtag (assuming that ISO 639 does not register this

536 language first), making tags such as "enochian-AQ" and "enochian-

537 Latn" valid.

538

539 At the time this document was created, there were no examples of

540 this kind of subtag. Future registrations of this type are

541 discouraged: an attempt to register any new proposed primary

542 language MUST be made to the ISO 639 registration authority.

543 Proposals rejected by the ISO 639 registration authority are

544 unlikely to meet the criteria for primary language subtags and

545 are thus unlikely to be registered.

546

547 6. Other values MUST NOT be assigned to the primary subtag except by

548 revision or update of this document.

549

550 When languages have both an ISO 639-1 two-character code and a three-

551 character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only

552 the ISO 639-1 two-character code is defined in the IANA registry.

553

554 When a language has no ISO 639-1 two-character code and the ISO

555 639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code

556 for that language differ, only the Terminology code is defined in the

557 IANA registry. At the time this document was created, all languages

558 that had both kinds of three-character codes were also assigned a

559

560

561

562Phillips & Davis Best Current Practice [Page 10]

563

564RFC 5646 Language Tags September 2009

565

566

567 two-character code; it is expected that future assignments of this

568 nature will not occur.

569

570 In order to avoid instability in the canonical form of tags, if a

571 two-character code is added to ISO 639-1 for a language for which a

572 three-character code was already included in either ISO 639-2 or ISO

573 639-3, the two-character code MUST NOT be registered. See

574 Section 3.4.

575

576 For example, if some content were tagged with 'haw' (Hawaiian), which

577 currently has no two-character code, the tag would not need to be

578 changed if ISO 639-1 were to assign a two-character code to the

579 Hawaiian language at a later date.

580

581 To avoid these problems with versioning and subtag choice (as

582 experienced during the transition between RFC 1766 and RFC 3066), as

583 well as to ensure the canonical nature of subtags defined by this

584 document, the ISO 639 Registration Authority Joint Advisory Committee

585 (ISO 639/RA-JAC) has included the following statement in

586 [iso639.prin]:

587

588 "A language code already in ISO 639-2 at the point of freezing ISO

589 639-1 shall not later be added to ISO 639-1. This is to ensure

590 consistency in usage over time, since users are directed in

591 Internet applications to employ the alpha-3 code when an alpha-2

592 code for that language is not available."

593

5942.2.2. Extended Language Subtags

595

596 Extended language subtags are used to identify certain specially

597 selected languages that, for various historical and compatibility

598 reasons, are closely identified with or tagged using an existing

599 primary language subtag. Extended language subtags are always used

600 with their enclosing primary language subtag (indicated with a

601 'Prefix' field in the registry) when used to form the language tag.

602 All languages that have an extended language subtag in the registry

603 also have an identical primary language subtag record in the

604 registry. This primary language subtag is RECOMMENDED for forming

605 the language tag. The following rules apply to the extended language

606 subtags:

607

608 1. Extended language subtags consist solely of three-letter subtags.

609 All extended language subtag records defined in the registry were

610 defined according to the assignments found in [ISO639-3].

611 Language collections and groupings, such as defined in

612 [ISO639-5], are specifically excluded from being extended

613 language subtags.

614

615

616

617

618Phillips & Davis Best Current Practice [Page 11]

619

620RFC 5646 Language Tags September 2009

621

622

623 2. Extended language subtag records MUST include exactly one

624 'Prefix' field indicating an appropriate subtag or sequence of

625 subtags for that extended language subtag.

626

627 3. Extended language subtag records MUST include a 'Preferred-

628 Value'. The 'Preferred-Value' and 'Subtag' fields MUST be

629 identical.

630

631 4. Although the ABNF production 'extlang' permits up to three

632 extended language tags in the language tag, extended language

633 subtags MUST NOT include another extended language subtag in

634 their 'Prefix'. That is, the second and third extended language

635 subtag positions in a language tag are permanently reserved and

636 tags that include those subtags in that position are, and will

637 always remain, invalid.

638

639 For example, the macrolanguage Chinese ('zh') encompasses a number of

640 languages. For compatibility reasons, each of these languages has

641 both a primary and extended language subtag in the registry. A few

642 selected examples of these include Gan Chinese ('gan'), Cantonese

643 Chinese ('yue'), and Mandarin Chinese ('cmn'). Each is encompassed

644 by the macrolanguage 'zh' (Chinese). Therefore, they each have the

645 prefix "zh" in their registry records. Thus, Gan Chinese is

646 represented with tags beginning "zh-gan" or "gan", Cantonese with

647 tags beginning either "yue" or "zh-yue", and Mandarin Chinese with

648 "zh-cmn" or "cmn". The language subtag 'zh' can still be used

649 without an extended language subtag to label a resource as some

650 unspecified variety of Chinese, while the primary language subtag

651 ('gan', 'yue', 'cmn') is preferred to using the extended language

652 form ("zh-gan", "zh-yue", "zh-cmn").

653

6542.2.3. Script Subtag

655

656 Script subtags are used to indicate the script or writing system

657 variations that distinguish the written forms of a language or its

658 dialects. The following rules apply to the script subtags:

659

660 1. Script subtags MUST follow any primary and extended language

661 subtags and MUST precede any other type of subtag.

662

663 2. Script subtags consist of four letters and were defined according

664 to the assignments found in [ISO15924] ("Information and

665 documentation -- Codes for the representation of names of

666 scripts"), or subsequently assigned by the ISO 15924 registration

667 authority or governing standardization bodies. Only codes

668 assigned by ISO 15924 will be considered for registration.

669

670

671

672

673

674Phillips & Davis Best Current Practice [Page 12]

675

676RFC 5646 Language Tags September 2009

677

678

679 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private

680 use in language tags. These subtags correspond to codes reserved

681 by ISO 15924 for private use. These codes MAY be used for non-

682 registered script values. Please refer to Section 4.6 for more

683 information on private use subtags.

684

685 4. There MUST be at most one script subtag in a language tag, and

686 the script subtag SHOULD be omitted when it adds no

687 distinguishing value to the tag or when the primary or extended

688 language subtag's record in the subtag registry includes a

689 'Suppress-Script' field listing the applicable script subtag.

690

691 For example: "sr-Latn" represents Serbian written using the Latin

692 script.

693

6942.2.4. Region Subtag

695

696 Region subtags are used to indicate linguistic variations associated

697 with or appropriate to a specific country, territory, or region.

698 Typically, a region subtag is used to indicate variations such as

699 regional dialects or usage, or region-specific spelling conventions.

700 It can also be used to indicate that content is expressed in a way

701 that is appropriate for use throughout a region, for instance,

702 Spanish content tailored to be useful throughout Latin America.

703

704 The following rules apply to the region subtags:

705

706 1. Region subtags MUST follow any primary language, extended

707 language, or script subtags and MUST precede any other type of

708 subtag.

709

710 2. Two-letter region subtags were defined according to the

711 assignments found in [ISO3166-1] ("Codes for the representation

712 of names of countries and their subdivisions -- Part 1: Country

713 codes"), using the list of alpha-2 country codes or using

714 assignments subsequently made by the ISO 3166-1 maintenance

715 agency or governing standardization bodies. In addition, the

716 codes that are "exceptionally reserved" (as opposed to

717 "assigned") in ISO 3166-1 were also defined in the registry, with

718 the exception of 'UK', which is an exact synonym for the assigned

719 code 'GB'.

720

721 3. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are

722 reserved for private use in language tags. These subtags

723 correspond to codes reserved by ISO 3166 for private use. These

724 codes MAY be used for private use region subtags (instead of

725 using a private use subtag sequence). Please refer to

726 Section 4.6 for more information on private use subtags.

727

728

729

730Phillips & Davis Best Current Practice [Page 13]

731

732RFC 5646 Language Tags September 2009

733

734

735 4. Three-character region subtags consist solely of digit (number)

736 characters and were defined according to the assignments found in

737 the UN Standard Country or Area Codes for Statistical Use

738 [UN_M.49] or assignments subsequently made by the governing

739 standards body. Not all of the UN M.49 codes are defined in the

740 IANA registry. The following rules define which codes are

741 entered into the registry as valid subtags:

742

743 A. UN numeric codes assigned to 'macro-geographical

744 (continental)' or sub-regions MUST be registered in the

745 registry. These codes are not associated with an assigned

746 ISO 3166-1 alpha-2 code and represent supra-national areas,

747 usually covering more than one nation, state, province, or

748 territory.

749

750 B. UN numeric codes for 'economic groupings' or 'other

751 groupings' MUST NOT be registered in the IANA registry and

752 MUST NOT be used to form language tags.

753

754 C. When ISO 3166-1 reassigns a code formerly used for one

755 country or area to another country or area and that code

756 already is present in the registry, the UN numeric code for

757 that country or area MUST be registered in the registry as

758 described in Section 3.4 and MUST be used to form language

759 tags that represent the country or region for which it is

760 defined (rather than the recycled ISO 3166-1 code).

761

762 D. UN numeric codes for countries or areas for which there is an

763 associated ISO 3166-1 alpha-2 code in the registry MUST NOT

764 be entered into the registry and MUST NOT be used to form

765 language tags. Note that the ISO 3166-based subtag in the

766 registry MUST actually be associated with the UN M.49 code in

767 question.

768

769 E. For historical reasons, the UN numeric code 830 (Channel

770 Islands), which was not registered at the time this document

771 was adopted and had, at that time, no corresponding ISO

772 3166-1 code, MAY be entered into the IANA registry via the

773 process described in Section 3.5, provided no ISO 3166-1 code

774 with that exact meaning has been previously registered.

775

776 F. All other UN numeric codes for countries or areas that do not

777 have an associated ISO 3166-1 alpha-2 code MUST NOT be

778 entered into the registry and MUST NOT be used to form

779 language tags. For more information about these codes, see

780 Section 3.4.

781

782

783

784

785

786Phillips & Davis Best Current Practice [Page 14]

787

788RFC 5646 Language Tags September 2009

789

790

791 5. The alphanumeric codes in Appendix X of the UN document MUST NOT

792 be entered into the registry and MUST NOT be used to form

793 language tags. (At the time this document was created, these

794 values matched the ISO 3166-1 alpha-2 codes.)

795

796 6. There MUST be at most one region subtag in a language tag and the

797 region subtag MAY be omitted, as when it adds no distinguishing

798 value to the tag.

799

800 For example:

801

802 "de-AT" represents German ('de') as used in Austria ('AT').

803

804 "sr-Latn-RS" represents Serbian ('sr') written using Latin script

805 ('Latn') as used in Serbia ('RS').

806

807 "es-419" represents Spanish ('es') appropriate to the UN-defined

808 Latin America and Caribbean region ('419').

809

8102.2.5. Variant Subtags

811

812 Variant subtags are used to indicate additional, well-recognized

813 variations that define a language or its dialects that are not

814 covered by other available subtags. The following rules apply to the

815 variant subtags:

816

817 1. Variant subtags MUST follow any primary language, extended

818 language, script, or region subtags and MUST precede any

819 extension or private use subtag sequences.

820

821 2. Variant subtags, as a collection, are not associated with any

822 particular external standard. The meaning of variant subtags in

823 the registry is defined in the course of the registration process

824 defined in Section 3.5. Note that any particular variant subtag

825 might be associated with some external standard. However,

826 association with a standard is not required for registration.

827

828 3. More than one variant MAY be used to form the language tag.

829

830 4. Variant subtags MUST be registered with IANA according to the

831 rules in Section 3.5 of this document before being used to form

832 language tags. In order to distinguish variants from other types

833 of subtags, registrations MUST meet the following length and

834 content restrictions:

835

836 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be

837 at least five characters long.

838

839

840

841

842Phillips & Davis Best Current Practice [Page 15]

843

844RFC 5646 Language Tags September 2009

845

846

847 2. Variant subtags that begin with a digit (0-9) MUST be at

848 least four characters long.

849

850 5. The same variant subtag MUST NOT be used more than once within a

851 language tag.

852

853 * For example, the tag "de-DE-1901-1901" is not valid.

854

855 Variant subtag records in the Language Subtag Registry MAY include

856 one or more 'Prefix' (Section 3.1.8) fields. Each 'Prefix' indicates

857 a suitable sequence of subtags for forming (with other subtags, as

858 appropriate) a language tag when using the variant.

859

860 Most variants that share a prefix are mutually exclusive. For

861 example, the German orthographic variations '1996' and '1901' SHOULD

862 NOT be used in the same tag, as they represent the dates of different

863 spelling reforms. A variant that can meaningfully be used in

864 combination with another variant SHOULD include a 'Prefix' field in

865 its registry record that lists that other variant. For example, if

866 another German variant 'example' were created that made sense to use

867 with '1996', then 'example' should include two 'Prefix' fields: "de"

868 and "de-1996".

869

870 For example:

871

872 "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

873

874 "de-CH-1996" represents German as used in Switzerland and as

875 written using the spelling reform beginning in the year 1996 C.E.

876

8772.2.6. Extension Subtags

878

879 Extensions provide a mechanism for extending language tags for use in

880 various applications. They are intended to identify information that

881 is commonly used in association with languages or language tags but

882 that is not part of language identification. See Section 3.7. The

883 following rules apply to extensions:

884

885 1. An extension MUST follow at least a primary language subtag.

886 That is, a language tag cannot begin with an extension.

887 Extensions extend language tags, they do not override or replace

888 them. For example, "a-value" is not a well-formed language tag,

889 while "de-a-value" is. Note that extensions cannot be used in

890 tags that are entirely private use (that is, tags starting with

891 "x-").

892

893

894

895

896

897

898Phillips & Davis Best Current Practice [Page 16]

899

900RFC 5646 Language Tags September 2009

901

902

903 2. Extension subtags are separated from the other subtags defined in

904 this document by a single-character subtag (called a

905 "singleton"). The singleton MUST be one allocated to a

906 registration authority via the mechanism described in Section 3.7

907 and MUST NOT be the letter 'x', which is reserved for private use

908 subtag sequences.

909

910 3. Each singleton subtag MUST appear at most one time in each tag

911 (other than as a private use subtag). That is, singleton subtags

912 MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is

913 invalid because the subtag 'a' appears twice. Note that the tag

914 "en-a-bbb-x-a-ccc" is valid because the second appearance of the

915 singleton 'a' is in a private use sequence.

916

917 4. Extension subtags MUST meet whatever requirements are set by the

918 document that defines their singleton prefix and whatever

919 requirements are provided by the maintaining authority. Note

920 that there might not be a registry of these subtags and

921 validating processors are not required to validate extensions.

922

923 5. Each extension subtag MUST be from two to eight characters long

924 and consist solely of letters or digits, with each subtag

925 separated by a single '-'. Case distinctions are ignored in

926 extensions (as with any language subtag) and normalized subtags

927 of this type are expected to be in lowercase.

928

929 6. Each singleton MUST be followed by at least one extension subtag.

930 For example, the tag "tlh-a-b-foo" is invalid because the first

931 singleton 'a' is followed immediately by another singleton 'b'.

932

933 7. Extension subtags MUST follow all primary language, extended

934 language, script, region, and variant subtags in a tag and MUST

935 precede any private use subtag sequences.

936

937 8. All subtags following the singleton and before another singleton

938 are part of the extension. Example: In the tag "fr-a-Latn", the

939 subtag 'Latn' does not represent the script subtag 'Latn' defined

940 in the IANA Language Subtag Registry. Its meaning is defined by

941 the extension 'a'.

942

943 9. In the event that more than one extension appears in a single

944 tag, the tag SHOULD be canonicalized as described in Section 4.5,

945 by ordering the various extension sequences into case-insensitive

946 ASCII order.

947

948 For example, if an extension were defined for the singleton 'r' and

949 it defined the subtags shown, then the following tag would be a valid

950 example: "en-Latn-GB-boont-r-extended-sequence-x-private".

951

952

953

954Phillips & Davis Best Current Practice [Page 17]

955

956RFC 5646 Language Tags September 2009

957

958

9592.2.7. Private Use Subtags

960

961 Private use subtags are used to indicate distinctions in language

962 that are important in a given context by private agreement. The

963 following rules apply to private use subtags:

964

965 1. Private use subtags are separated from the other subtags defined

966 in this document by the reserved single-character subtag 'x'.

967

968 2. Private use subtags MUST conform to the format and content

969 constraints defined in the ABNF for all subtags; that is, they

970 MUST consist solely of letters and digits and not exceed eight

971 characters in length.

972

973 3. Private use subtags MUST follow all primary language, extended

974 language, script, region, variant, and extension subtags in the

975 tag. Another way of saying this is that all subtags following

976 the singleton 'x' MUST be considered private use. Example: The

977 subtag 'US' in the tag "en-x-US" is a private use subtag.

978

979 4. A tag MAY consist entirely of private use subtags.

980

981 5. No source is defined for private use subtags. Use of private use

982 subtags is by private agreement only.

983

984 6. Private use subtags are NOT RECOMMENDED where alternatives exist

985 or for general interchange. See Section 4.6 for more information

986 on private use subtag choice.

987

988 For example, suppose a group of scholars is studying some texts in

989 medieval Greek. They might agree to use some collection of private

990 use subtags to identify different styles of writing in the texts.

991 For example, they might use 'el-x-koine' for documents in the

992 "common" style while using 'el-x-attic' for other documents that

993 mimic the Attic style. These subtags would not be recognized by

994 outside processes or systems, but might be useful in categorizing

995 various texts for study by those in the group.

996

997 In the registry, there are also subtags derived from codes reserved

998 by ISO 639, ISO 15924, or ISO 3166 for private use. Do not confuse

999 these with private use subtag sequences following the subtag 'x'.

1000 See Section 4.6.

1001

10022.2.8. Grandfathered and Redundant Registrations

1003

1004 Prior to RFC 4646, whole language tags were registered according to

1005 the rules in RFC 1766 and/or RFC 3066. All of these registered tags

1006 remain valid as language tags.

1007

1008

1009

1010Phillips & Davis Best Current Practice [Page 18]

1011

1012RFC 5646 Language Tags September 2009

1013

1014

1015 Many of these registered tags were made redundant by the advent of

1016 either RFC 4646 or this document. A redundant tag is a grandfathered

1017 registration whose individual subtags appear with the same semantic

1018 meaning in the registry. For example, the tag "zh-Hant" (Traditional

1019 Chinese) can now be composed from the subtags 'zh' (Chinese) and

1020 'Hant' (Han script traditional variant). These redundant tags are

1021 maintained in the registry as records of type 'redundant', mostly as

1022 a matter of historical curiosity.

1023

1024 The remainder of the previously registered tags are "grandfathered".

1025 These tags are classified into two groups: 'regular' and 'irregular'.

1026

1027 Grandfathered tags that (appear to) match the 'langtag' production in

1028 Figure 1 are considered 'regular' grandfathered tags. These tags

1029 contain one or more subtags that either do not individually appear in

1030 the registry or appear but with a different semantic meaning: each

1031 tag, in its entirety, represents a language or collection of

1032 languages.

1033

1034 Grandfathered tags that do not match the 'langtag' production in the

1035 ABNF and would otherwise be invalid are considered 'irregular'

1036 grandfathered tags. With the exception of "en-GB-oed", which is a

1037 variant of "en-GB", each of them, in its entirety, represents a

1038 language.

1039

1040 Many of the grandfathered tags have been superseded by the subsequent

1041 addition of new subtags: each superseded record contains a

1042 'Preferred-Value' field that ought to be used to form language tags

1043 representing that value. For example, the tag "art-lojban" is

1044 superseded by the primary language subtag 'jbo'.

1045

10462.2.9. Classes of Conformance

1047

1048 Implementations sometimes need to describe their capabilities with

1049 regard to the rules and practices described in this document. Tags

1050 can be checked or verified in a number of ways, but two particular

1051 classes of tag conformance are formally defined here.

1052

1053 A tag is considered "well-formed" if it conforms to the ABNF

1054 (Section 2.1). Language tags may be well-formed in terms of syntax

1055 but not valid in terms of content. However, many operations

1056 involving language tags work well without knowing anything about the

1057 meaning or validity of the subtags.

1058

1059 A tag is considered "valid" if it satisfies these conditions:

1060

1061 o The tag is well-formed.

1066Phillips & Davis Best Current Practice [Page 19]

1067

1068RFC 5646 Language Tags September 2009

1069

1070

1071 o Either the tag is in the list of grandfathered tags or all of its

1072 primary language, extended language, script, region, and variant

1073 subtags appear in the IANA Language Subtag Registry as of the

1074 particular registry date.

1075

1076 o There are no duplicate variant subtags.

1077

1078 o There are no duplicate singleton (extension) subtags.

1079

1080 Note that a tag's validity depends on the date of the registry used

1081 to validate the tag. A more recent copy of the registry might

1082 contain a subtag that an older version does not.

1083

1084 A tag is considered valid for a given extension (Section 3.7) (as of

1085 a particular version, revision, and date) if it meets the criteria

1086 for "valid" above and also satisfies this condition:

1087

1088 Each subtag used in the extension part of the tag is valid

1089 according to the extension.

1090

1091 Older specifications or language tag implementations sometimes

1092 reference [RFC3066]. A wider array of tags was considered well-

1093 formed under that document. Any tags that were valid for use under

1094 RFC 3066 are both well-formed and valid under this document's syntax;

1095 only invalid or illegal tags were well-formed under the earlier

1096 definition but no longer are. The language tag syntax under RFC 3066

1097 was:

1098

1099 obs-language-tag = primary-subtag *( "-" subtag )

1100 primary-subtag = 1*8ALPHA

1101 subtag = 1*8(ALPHA / DIGIT)

1102

1103 Figure 2: RFC 3066 Language Tag Syntax

1104

1105 Subtags designated for private use as well as private use sequences

1106 introduced by the 'x' subtag are available for cases in which no

1107 assigned subtags are available and registration is not a suitable

1108 option. For example, one might use a tag such as "no-QQ", where 'QQ'

1109 is one of a range of private use ISO 3166-1 codes to indicate an

1110 otherwise undefined region. Users MUST NOT assign language tags that

1111 use subtags that do not appear in the registry other than in private

1112 use sequences (such as the subtag 'personal' in the tag "en-x-

1113 personal"). Besides not being valid, the user also risks collision

1114 with a future possible assignment or registrations.

1115

1116 Note well: although the 'Language-Tag' production appearing in this

1117 document is functionally equivalent to the one in [RFC4646], it has

1122Phillips & Davis Best Current Practice [Page 20]

1123

1124RFC 5646 Language Tags September 2009

1125

1126

1127 been changed to prevent certain errors in well-formedness arising

1128 from the old 'grandfathered' production.

1129

11303. Registry Format and Maintenance

1131

1132 The IANA Language Subtag Registry ("the registry") contains a

1133 comprehensive list of all of the subtags valid in language tags.

1134 This allows implementers a straightforward and reliable way to

1135 validate language tags. The registry will be maintained so that,

1136 except for extension subtags, it is possible to validate all of the

1137 subtags that appear in a language tag under the provisions of this

1138 document or its revisions or successors. In addition, the meaning of

1139 the various subtags will be unambiguous and stable over time. (The

1140 meaning of private use subtags, of course, is not defined by the

1141 registry.)

1142

1143 This section defines the registry along with the maintenance and

1144 update procedures associated with it, as well as a registry for

1145 extensions to language tags (Section 3.7).

1146

11473.1. Format of the IANA Language Subtag Registry

1148

1149 The IANA Language Subtag Registry is a machine-readable file in the

1150 format described in this section, plus copies of the registration

1151 forms approved in accordance with the process described in

1152 Section 3.5.

1153

1154 The existing registration forms for grandfathered and redundant tags

1155 taken from RFC 3066 have been maintained as part of the obsolete RFC

1156 3066 registry. The subtags added to the registry by either [RFC4645]

1157 or [RFC5645] do not have separate registration forms (so no forms are

1158 archived for these additions).

1159

11603.1.1. File Format

1161

1162 The registry is a [Unicode] text file and consists of a series of

1163 records in a format based on "record-jar" (described in

1164 [record-jar]). Each record, in turn, consists of a series of fields

1165 that describe the various subtags and tags. The actual registry file

1166 is encoded using the UTF-8 [RFC3629] character encoding.

1167

1168 Each field can be considered a single, logical line of characters.

1169 Each field contains a "field-name" and a "field-body". These are

1170 separated by a "field-separator". The field-separator is a COLON

1171 character (U+003A) plus any surrounding whitespace. Each field is

1172 terminated by the newline sequence CRLF. The text in each field MUST

1173 be in Unicode Normalization Form C (NFC).

1178Phillips & Davis Best Current Practice [Page 21]

1179

1180RFC 5646 Language Tags September 2009

1181

1182

1183 A collection of fields forms a "record". Records are separated by

1184 lines containing only the sequence "%%" (U+0025 U+0025).

1185

1186 Although fields are logically a single line of text, each line of

1187 text in the file format is limited to 72 bytes in length. To

1188 accommodate this, the field-body can be split into a multiple-line

1189 representation; this is called "folding". Folding is done according

1190 to customary conventions for line-wrapping. This is typically on

1191 whitespace boundaries, but can occur between other characters when

1192 the value does not include spaces, such as when a language does not

1193 use whitespace between words. In any event, there MUST NOT be breaks

1194 inside a multibyte UTF-8 sequence or in the middle of a combining

1195 character sequence. For more information, see [UAX14].

1196

1197 Although the file format uses the Unicode character set and the file

1198 itself is encoded using the UTF-8 encoding, fields are restricted to

1199 the printable characters from the US-ASCII [ISO646] repertoire unless

1200 otherwise indicated in the description of a specific field

1201 (Section 3.1.2).

1202

1203 The format of the registry is described by the following ABNF

1204 [RFC5234]. Character numbers (code points) are taken from Unicode,

1205 and terminals in the ABNF productions are in terms of characters

1206 rather than bytes.

1207

1208 registry = record *("%%" CRLF record)

1209 record = 1*field

1210 field = ( field-name field-sep field-body CRLF )

1211 field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]

1212 field-sep = *SP ":" *SP

1213 field-body = *([[*SP CRLF] 1*SP] 1*CHARS)

1214 CHARS = (%x21-10FFFF) ; Unicode code points

1215

1216 Figure 3: Registry Format ABNF

1217

1218 The sequence '..' (U+002E U+002E) in a field-body denotes a range of

1219 values. Such a range represents all subtags of the same length that

1220 are in alphabetic or numeric order within that range, including the

1221 values explicitly mentioned. For example, 'a..c' denotes the values

1222 'a', 'b', and 'c', and '11..13' denotes the values '11', '12', and

1223 '13'.

1224

1225 All fields whose field-body contains a date value use the "full-date"

1226 format specified in [RFC3339]. For example, "2004-06-28" represents

1227 June 28, 2004, in the Gregorian calendar.

1234Phillips & Davis Best Current Practice [Page 22]

1235

1236RFC 5646 Language Tags September 2009

1237

1238

12393.1.2. Record and Field Definitions

1240

1241 There are three types of records in the registry: "File-Date",

1242 "Subtag", and "Tag".

1243

1244 The first record in the registry is always the "File-Date" record.

1245 This record occurs only once in the file and contains a single field

1246 whose field-name is "File-Date". The field-body of this record

1247 contains a date (see Section 5.1), making it possible to easily

1248 recognize different versions of the registry.

1249

1250 File-Date: 2004-06-28

1251 %%

1252

1253 Figure 4: Example of the File-Date Record

1254

1255 Subsequent records contain multiple fields and represent information

1256 about either subtags or tags. Both types of records have an

1257 identical structure, except that "Subtag" records contain a field

1258 with a field-name of "Subtag", while, unsurprisingly, "Tag" records

1259 contain a field with a field-name of "Tag". Field-names MUST NOT

1260 occur more than once per record, with the exception of the

1261 'Description', 'Comments', and 'Prefix' fields.

1262

1263 Each record MUST contain at least one of each of the following

1264 fields:

1265

1266 o 'Type'

1267

1268 * Type's field-body MUST consist of one of the following strings:

1269 "language", "extlang", "script", "region", "variant",

1270 "grandfathered", and "redundant"; it denotes the type of tag or

1271 subtag.

1272

1273 o Either 'Subtag' or 'Tag'

1274

1275 * Subtag's field-body contains the subtag being defined. This

1276 field MUST appear in all records whose 'Type' has one of these

1277 values: "language", "extlang", "script", "region", or

1278 "variant".

1279

1280 * Tag's field-body contains a complete language tag. This field

1281 MUST appear in all records whose 'Type' has one of these

1282 values: "grandfathered" or "redundant". If the 'Type' is

1283 "grandfathered", then the 'Tag' field-body will be one of the

1284 tags listed in either the 'regular' or 'irregular' production

1285 found in Section 2.1.

1290Phillips & Davis Best Current Practice [Page 23]

1291

1292RFC 5646 Language Tags September 2009

1293

1294

1295 o 'Description'

1296

1297 * Description's field-body contains a non-normative description

1298 of the subtag or tag.

1299

1300 o 'Added'

1301

1302 * Added's field-body contains the date the record was registered

1303 or, in the case of grandfathered or redundant tags, the date

1304 the corresponding tag was registered under the rules of

1305 [RFC1766] or [RFC3066].

1306

1307 Each record MAY also contain the following fields:

1308

1309 o 'Deprecated'

1310

1311 * Deprecated's field-body contains the date the record was

1312 deprecated. In some cases, this value is earlier than that of

1313 the 'Added' field in the same record. That is, the date of

1314 deprecation preceded the addition of the record to the

1315 registry.

1316

1317 o 'Preferred-Value'

1318

1319 * Preferred-Value's field-body contains a canonical mapping from

1320 this record's value to a modern equivalent that is preferred in

1321 its place. Depending on the value of the 'Type' field, this

1322 value can take different forms:

1323

1324 + For fields of type 'language', 'Preferred-Value' contains

1325 the primary language subtag that is preferred when forming

1326 the language tag.

1327

1328 + For fields of type 'script', 'region', or 'variant',

1329 'Preferred-Value' contains the subtag of the same type that

1330 is preferred for forming the language tag.

1331

1332 + For fields of type 'extlang', 'grandfathered', or

1333 'redundant', 'Preferred-Value' contains an "extended

1334 language range" [RFC4647] that is preferred for forming the

1335 language tag. That is, the preferred language tag will

1336 contain, in order, each of the subtags that appears in the

1337 'Preferred-Value'; additional fields can be included in a

1338 language tag, as described elsewhere in this document. For

1339 example, the replacement for the grandfathered tag "zh-min-

1340 nan" (Min Nan Chinese) is "nan", which can be used as the

1346Phillips & Davis Best Current Practice [Page 24]

1347

1348RFC 5646 Language Tags September 2009

1349

1350

1351 basis for tags such as "nan-Hant" or "nan-TW" (note that the

1352 extended language subtag form such as "zh-nan-Hant" or "zh-

1353 nan-TW" can also be used).

1354

1355 o 'Prefix'

1356

1357 * Prefix's field-body contains a valid language tag that is

1358 RECOMMENDED as one possible prefix to this record's subtag.

1359 This field MAY appear in records whose 'Type' field-body is

1360 either 'extlang' or 'variant' (it MUST NOT appear in any other

1361 record type).

1362

1363 o 'Suppress-Script'

1364

1365 * Suppress-Script's field-body contains a script subtag that

1366 SHOULD NOT be used to form language tags with the associated

1367 primary or extended language subtag. This field MUST appear

1368 only in records whose 'Type' field-body is 'language' or

1369 'extlang'. See Section 4.1.

1370

1371 o 'Macrolanguage'

1372

1373 * Macrolanguage's field-body contains a primary language subtag

1374 defined by ISO 639 as the "macrolanguage" that encompasses this

1375 language subtag. This field MUST appear only in records whose

1376 'Type' field-body is either 'language' or 'extlang'.

1377

1378 o 'Scope'

1379

1380 * Scope's field-body contains information about a primary or

1381 extended language subtag indicating the type of language code

1382 according to ISO 639. The values permitted in this field are

1383 "macrolanguage", "collection", "special", and "private-use".

1384 This field only appears in records whose 'Type' field-body is

1385 either 'language' or 'extlang'. When this field is omitted,

1386 the language is an individual language.

1387

1388 o 'Comments'

1389

1390 * Comments's field-body contains additional information about the

1391 subtag, as deemed appropriate for understanding the registry

1392 and implementing language tags using the subtag or tag.

1393

1394 Future versions of this document might add additional fields to the

1395 registry; implementations SHOULD ignore fields found in the registry

1396 that are not defined in this document.

1402Phillips & Davis Best Current Practice [Page 25]

1403

1404RFC 5646 Language Tags September 2009

1405

1406

14073.1.3. Type Field

1408

1409 The field 'Type' contains the string identifying the record type in

1410 which it appears. Values for the 'Type' field-body are: "language"

1411 (Section 2.2.1); "extlang" (Section 2.2.2); "script" (Section 2.2.3);

1412 "region" (Section 2.2.4); "variant" (Section 2.2.5); "grandfathered"

1413 or "redundant" (Section 2.2.8).

1414

14153.1.4. Subtag and Tag Fields

1416

1417 The field 'Subtag' contains the subtag defined in the record. The

1418 field 'Tag' appears in records whose 'Type' is either 'grandfathered'

1419 or 'redundant' and contains a tag registered under [RFC3066].

1420

1421 The 'Subtag' field-body MUST follow the casing conventions described

1422 in Section 2.1.1. All subtags use lowercase letters in the field-

1423 body, with two exceptions:

1424

1425 Subtags whose 'Type' field is 'script' (in other words, subtags

1426 defined by ISO 15924) MUST use titlecase.

1427

1428 Subtags whose 'Type' field is 'region' (in other words, the non-

1429 numeric region subtags defined by ISO 3166-1) MUST use all

1430 uppercase.

1431

1432 The 'Tag' field-body MUST be formatted according to the rules

1433 described in Section 2.1.1.

1434

14353.1.5. Description Field

1436

1437 The field 'Description' contains a description of the tag or subtag

1438 in the record. The 'Description' field MAY appear more than once per

1439 record. The 'Description' field MAY include the full range of

1440 Unicode characters. At least one of the 'Description' fields MUST be

1441 written or transcribed into the Latin script; additional

1442 'Description' fields MAY be in any script or language.

1443

1444 The 'Description' field is used for identification purposes.

1445 Descriptions SHOULD contain all and only that information necessary

1446 to distinguish one subtag from others with which it might be

1447 confused. They are not intended to provide general background

1448 information or to provide all possible alternate names or

1449 designations. 'Description' fields don't necessarily represent the

1450 actual native name of the item in the record, nor are any of the

1451 descriptions guaranteed to be in any particular language (such as

1452 English or French, for example).

1458Phillips & Davis Best Current Practice [Page 26]

1459

1460RFC 5646 Language Tags September 2009

1461

1462

1463 Descriptions in the registry that correspond to ISO 639, ISO 15924,

1464 ISO 3166-1, or UN M.49 codes are intended only to indicate the

1465 meaning of that identifier as defined in the source standard at the

1466 time it was added to the registry or as subsequently modified, within

1467 the bounds of the stability rules (Section 3.4), via subsequent

1468 registration. The 'Description' does not replace the content of the

1469 source standard itself. 'Description' fields are not intended to be

1470 the localized English names for the subtags. Localization or

1471 translation of language tag and subtag descriptions is out of scope

1472 of this document.

1473

1474 For subtags taken from a source standard (such as ISO 639 or ISO

1475 15924), the 'Description' fields in the record are also initially

1476 taken from that source standard. Multiple descriptions in the source

1477 standard are split into separate 'Description' fields. The source

1478 standard's descriptions MAY be edited or modified, either prior to

1479 insertion or via the registration process, and additional or

1480 extraneous descriptions omitted or removed. Each 'Description' field

1481 MUST be unique within the record in which it appears, and formatting

1482 variations of the same description SHOULD NOT occur in that specific

1483 record. For example, while the ISO 639-1 code 'fy' has both the

1484 description "Western Frisian" and the description "Frisian, Western"

1485 in that standard, only one of these descriptions appears in the

1486 registry.

1487

1488 To help ensure that users do not become confused about which subtag

1489 to use, 'Description' fields assigned to a record of any specific

1490 type ('language', 'extlang', 'script', and so on) MUST be unique

1491 within that given record type with the following exception: if a

1492 particular 'Description' field occurs in multiple records of a given

1493 type, then at most one of the records can omit the 'Deprecated'

1494 field. All deprecated records that share a 'Description' MUST have

1495 the same 'Preferred-Value', and all non-deprecated records MUST be

1496 that 'Preferred-Value'. This means that two records of the same type

1497 that share a 'Description' are also semantically equivalent and no

1498 more than one record with a given 'Description' is preferred for that

1499 meaning.

1500

1501 For example, consider the 'language' subtags 'zza' (Zaza) and 'diq'

1502 (Dimli). It so happens that 'zza' is a macrolanguage enclosing 'diq'

1503 and thus also has a description in ISO 639-3 of "Dimli". This

1504 description was edited to read "Dimli (macrolanguage)" in the

1505 registry record for 'zza' to prevent a collision.

1506

1507 By contrast, the subtags 'he' and 'iw' share a 'Description' value of

1508 "Hebrew"; this is permitted because 'iw' is deprecated and its

1509 'Preferred-Value' is 'he'.

1514Phillips & Davis Best Current Practice [Page 27]

1515

1516RFC 5646 Language Tags September 2009

1517

1518

1519 For fields of type 'language', the first 'Description' field

1520 appearing in the registry corresponds whenever possible to the

1521 Reference Name assigned by ISO 639-3. This helps facilitate cross-

1522 referencing between ISO 639 and the registry.

1523

1524 When creating or updating a record due to the action of one of the

1525 source standards, the Language Subtag Reviewer MAY edit descriptions

1526 to correct irregularities in formatting (such as misspellings,

1527 inappropriate apostrophes or other punctuation, or excessive or

1528 missing spaces) prior to submitting the proposed record to the

1529 ietf-languages@iana.org list for consideration.

1530

15313.1.6. Deprecated Field

1532

1533 The field 'Deprecated' contains the date the record was deprecated

1534 and MAY be added, changed, or removed from any record via the

1535 maintenance process described in Section 3.3 or via the registration

1536 process described in Section 3.5. Usually, the addition of a

1537 'Deprecated' field is due to the action of one of the standards

1538 bodies, such as ISO 3166, withdrawing a code. Although valid in

1539 language tags, subtags and tags with a 'Deprecated' field are

1540 deprecated, and validating processors SHOULD NOT generate these

1541 subtags. Note that a record that contains a 'Deprecated' field and

1542 no corresponding 'Preferred-Value' field has no replacement mapping.

1543

1544 In some historical cases, it might not have been possible to

1545 reconstruct the original deprecation date. For these cases, an

1546 approximate date appears in the registry. Some subtags and some

1547 grandfathered or redundant tags were deprecated before the initial

1548 creation of the registry. The exact rules for this appear in Section

1549 2 of [RFC4645]. Note that these records have a 'Deprecated' field

1550 with an earlier date then the corresponding 'Added' field!

1551

15523.1.7. Preferred-Value Field

1553

1554 The field 'Preferred-Value' contains a mapping between the record in

1555 which it appears and another tag or subtag (depending on the record's

1556 'Type'). The value in this field is used for canonicalization (see

1557 Section 4.5). In cases where the subtag or tag also has a

1558 'Deprecated' field, then the 'Preferred-Value' is RECOMMENDED as the

1559 best choice to represent the value of this record when selecting a

1560 language tag.

1561

1562 Records containing a 'Preferred-Value' fall into one of these four

1563 groups:

1570Phillips & Davis Best Current Practice [Page 28]

1571

1572RFC 5646 Language Tags September 2009

1573

1574

1575 1. ISO 639 language codes that were later withdrawn in favor of

1576 other codes. These values are mostly a historical curiosity.

1577 The 'he'/'iw' pairing above is an example of this.

1578

1579 2. Subtags (with types other than language or extlang) taken from

1580 codes or values that have been withdrawn in favor of a new code.

1581 In particular, this applies to region subtags taken from ISO

1582 3166-1, because sometimes a country will change its name or

1583 administration in such a way that warrants a new region code. In

1584 some cases, countries have reverted to an older name, which might

1585 already be encoded. For example, the subtag 'ZR' (Zaire) was

1586 replaced by the subtag 'CD' (Democratic Republic of the Congo)

1587 when that country's name was changed.

1588

1589 3. Tags or subtags that have become obsolete because the values they

1590 represent were later encoded. Many of the grandfathered or

1591 redundant tags were later encoded by ISO 639, for example, and

1592 fall into this grouping. For example, "i-klingon" was deprecated

1593 when the subtag 'tlh' was added. The record for "i-klingon" has

1594 a 'Preferred-Value' of 'tlh'.

1595

1596 4. Extended language subtags always have a mapping to their

1597 identical primary language subtag. For example, the extended

1598 language subtag 'yue' (Cantonese) can be used to form the tag

1599 "zh-yue". It has a 'Preferred-Value' mapping to the primary

1600 language subtag 'yue', meaning that a tag such as

1601 "zh-yue-Hant-HK" can be canonicalized to "yue-Hant-HK".

1602

1603 Records other than those of type 'extlang' that contain a 'Preferred-

1604 Value' field MUST also have a 'Deprecated' field. This field

1605 contains the date on which the tag or subtag was deprecated in favor

1606 of the preferred value.

1607

1608 For records of type 'extlang', the 'Preferred-Value' field appears

1609 without a corresponding 'Deprecated' field. An implementation MAY

1610 ignore these preferred value mappings, although if it ignores the

1611 mapping, it SHOULD do so consistently. It SHOULD also treat the

1612 'Preferred-Value' as equivalent to the mapped item. For example, the

1613 tags "zh-yue-Hant-HK" and "yue-Hant-HK" are semantically equivalent

1614 and ought to be treated as if they were the same tag.

1615

1616 Occasionally, the deprecated code is preferred in certain contexts.

1617 For example, both "iw" and "he" can be used in the Java programming

1618 language, but "he" is converted on input to "iw", which is thus the

1619 canonical form in Java.

1626Phillips & Davis Best Current Practice [Page 29]

1627

1628RFC 5646 Language Tags September 2009

1629

1630

1631 'Preferred-Value' mappings in records of type 'region' sometimes do

1632 not represent exactly the same meaning as the original value. There

1633 are many reasons for a country code to be changed, and the effect

1634 this has on the formation of language tags will depend on the nature

1635 of the change in question. For example, the region subtag 'YD'

1636 (Democratic Yemen) was deprecated in favor of the subtag 'YE' (Yemen)

1637 when those two countries unified in 1990.

1638

1639 A 'Preferred-Value' MAY be added to, changed, or removed from records

1640 according to the rules in Section 3.3. Addition, modification, or

1641 removal of a 'Preferred-Value' field in a record does not imply that

1642 content using the affected subtag needs to be retagged.

1643

1644 The 'Preferred-Value' fields in records of type "grandfathered" and

1645 "redundant" each contain an "extended language range" [RFC4647] that

1646 is strongly RECOMMENDED for use in place of the record's value. In

1647 many cases, these mappings were created via deprecation of the tags

1648 during the period before [RFC4646] was adopted. For example, the tag

1649 "no-nyn" was deprecated in favor of the ISO 639-1-defined language

1650 code 'nn'.

1651

1652 The 'Preferred-Value' field in subtag records of type "extlang" also

1653 contains an "extended language range". This allows the subtag to be

1654 deprecated in favor of either a single primary language subtag or a

1655 new language-extlang sequence.

1656

1657 Usually, the addition, removal, or change of a 'Preferred-Value'

1658 field for a subtag is done to reflect changes in one of the source

1659 standards. For example, if an ISO 3166-1 region code is deprecated

1660 in favor of another code, that SHOULD result in the addition of a

1661 'Preferred-Value' field.

1662

1663 Changes to one subtag can affect other subtags as well: when

1664 proposing changes to the registry, the Language Subtag Reviewer MUST

1665 review the registry for such effects and propose the necessary

1666 changes using the process in Section 3.5, although anyone MAY request

1667 such changes. For example:

1668

1669 Suppose that subtag 'XX' has a 'Preferred-Value' of 'YY'. If 'YY'

1670 later changes to have a 'Preferred-Value' of 'ZZ', then the

1671 'Preferred-Value' for 'XX' MUST also change to be 'ZZ'.

1672

1673 Suppose that a registered language subtag 'dialect' represents a

1674 language not yet available in any part of ISO 639. The later

1675 addition of a corresponding language code in ISO 639 SHOULD result

1676 in the addition of a 'Preferred-Value' for 'dialect'.

1682Phillips & Davis Best Current Practice [Page 30]

1683

1684RFC 5646 Language Tags September 2009

1685

1686

16873.1.8. Prefix Field

1688

1689 The field 'Prefix' contains a valid language tag that is RECOMMENDED

1690 as one possible prefix to this record's subtag, perhaps with other

1691 subtags. That is, when including an extended language or a variant

1692 subtag that has at least one 'Prefix' in a language tag, the

1693 resulting tag SHOULD match at least one of the subtag's 'Prefix'

1694 fields using the "Extended Filtering" algorithm (see [RFC4647]), and

1695 each of the subtags in that 'Prefix' SHOULD appear before the subtag

1696 itself.

1697

1698 The 'Prefix' field MUST appear exactly once in a record of type

1699 'extlang'. The 'Prefix' field MAY appear multiple times (or not at

1700 all) in records of type 'variant'. Additional fields of this type

1701 MAY be added to a 'variant' record via the registration process,

1702 provided the 'variant' record already has at least one 'Prefix'

1703 field.

1704

1705 Each 'Prefix' field indicates a particular sequence of subtags that

1706 form a meaningful tag with this subtag. For example, the extended

1707 language subtag 'cmn' (Mandarin Chinese) only makes sense with its

1708 prefix 'zh' (Chinese). Similarly, 'rozaj' (Resian, a dialect of

1709 Slovenian) would be appropriate when used with its prefix 'sl'

1710 (Slovenian), while tags such as "is-1994" are not appropriate (and

1711 probably not meaningful). Although the 'Prefix' for 'rozaj' is "sl",

1712 other subtags might appear between them. For example, the tag "sl-

1713 IT-rozaj" (Slovenian, Italy, Resian) matches the 'Prefix' "sl".

1714

1715 The 'Prefix' also indicates when variant subtags make sense when used

1716 together (many that otherwise share a 'Prefix' are mutually

1717 exclusive) and what the relative ordering of variants is supposed to

1718 be. For example, the variant '1994' (Standardized Resian

1719 orthography) has several 'Prefix' fields in the registry ("sl-rozaj",

1720 "sl-rozaj-biske", "sl-rozaj-njiva", "sl-rozaj-osojs", and "sl-rozaj-

1721 solba"). This indicates not only that '1994' is appropriate to use

1722 with each of these five Resian variant subtags ('rozaj', 'biske',

1723 'njiva', 'osojs', and 'solba'), but also that it SHOULD appear

1724 following any of these variants in a tag. Thus, the language tag

1725 ought to take the form "sl-rozaj-biske-1994", rather than "sl-1994-

1726 rozaj-biske" or "sl-rozaj-1994-biske".

1727

1728 If a record includes no 'Prefix' field, a 'Prefix' field MUST NOT be

1729 added to the record at a later date. Otherwise, changes (additions,

1730 deletions, or modifications) to the set of 'Prefix' fields MAY be

1731 registered, as long as they strictly widen the range of language tags

1732 that are recommended. For example, a 'Prefix' with the value "be-

1733 Latn" (Belarusian, Latin script) could be replaced by the value "be"

1734 (Belarusian) but not by the value "ru-Latn" (Russian, Latin script)

1735

1736

1737

1738Phillips & Davis Best Current Practice [Page 31]

1739

1740RFC 5646 Language Tags September 2009

1741

1742

1743 or the value "be-Latn-BY" (Belarusian, Latin script, Belarus), since

1744 these latter either change or narrow the range of suggested tags.

1745

1746 The field-body of the 'Prefix' field MUST NOT conflict with any

1747 'Prefix' already registered for a given record. Such a conflict

1748 would occur when no valid tag could be constructed that would contain

1749 the prefix, such as when two subtags each have a 'Prefix' that

1750 contains the other subtag. For example, suppose that the subtag

1751 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant'

1752 cannot be assigned the prefix 'avariant', for that would require a

1753 tag of the form "es-avariant-bvariant-avariant", which would not be

1754 valid.

1755

17563.1.9. Suppress-Script Field

1757

1758 The field 'Suppress-Script' contains a script subtag (whose record

1759 appears in the registry). The field 'Suppress-Script' MUST appear

1760 only in records whose 'Type' field-body is either 'language' or

1761 'extlang'. This field MUST NOT appear more than one time in a

1762 record.

1763

1764 This field indicates a script used to write the overwhelming majority

1765 of documents for the given language. The subtag for such a script

1766 therefore adds no distinguishing information to a language tag and

1767 thus SHOULD NOT be used for most documents in that language.

1768 Omitting the script subtag indicated by this field helps ensure

1769 greater compatibility between the language tags generated according

1770 to the rules in this document and language tags and tag processors or

1771 consumers based on RFC 3066. For example, virtually all Icelandic

1772 documents are written in the Latin script, making the subtag 'Latn'

1773 redundant in the tag "is-Latn".

1774

1775 Many language subtag records do not have a 'Suppress-Script' field.

1776 The lack of a 'Suppress-Script' might indicate that the language is

1777 customarily written in more than one script or that the language is

1778 not customarily written at all. It might also mean that sufficient

1779 information was not available when the record was created and thus

1780 remains a candidate for future registration.

1781

17823.1.10. Macrolanguage Field

1783

1784 The field 'Macrolanguage' contains a primary language subtag (whose

1785 record appears in the registry). This field indicates a language

1786 that encompasses this subtag's language according to assignments made

1787 by ISO 639-3.

1788

1789 ISO 639-3 labels some languages in the registry as "macrolanguages".

1790 ISO 639-3 defines the term "macrolanguage" to mean "clusters of

1791

1792

1793

1794Phillips & Davis Best Current Practice [Page 32]

1795

1796RFC 5646 Language Tags September 2009

1797

1798

1799 closely-related language varieties that [...] can be considered

1800 distinct individual languages, yet in certain usage contexts a single

1801 language identity for all is needed". These correspond to codes

1802 registered in ISO 639-2 as individual languages that were found to

1803 correspond to more than one language in ISO 639-3.

1804

1805 A language contained within a macrolanguage is called an "encompassed

1806 language". The record for each encompassed language contains a

1807 'Macrolanguage' field in the registry; the macrolanguages themselves

1808 are not specially marked. Note that some encompassed languages have

1809 ISO 639-1 or ISO 639-2 codes.

1810

1811 The 'Macrolanguage' field can only occur in records of type

1812 'language' or 'extlang'. Only values assigned by ISO 639-3 will be

1813 considered for inclusion. 'Macrolanguage' fields MAY be added or

1814 removed via the normal registration process whenever ISO 639-3

1815 defines new values or withdraws old values. Macrolanguages are

1816 informational, and MAY be removed or changed if ISO 639-3 changes the

1817 values. For more information on the use of this field and choosing

1818 between macrolanguage and encompassed language subtags, see

1819 Section 4.1.1.

1820

1821 For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn'

1822 (Norwegian Nynorsk) each have a 'Macrolanguage' field with a value of

1823 'no' (Norwegian). For more information, see Section 4.1.

1824

18253.1.11. Scope Field

1826

1827 The field 'Scope' contains classification information about a primary

1828 or extended language subtag derived from ISO 639. Most languages

1829 have a scope of 'individual', which means that the language is not a

1830 macrolanguage, collection, special code, or private use. That is, it

1831 is what one would normally consider to be 'a language'. Any primary

1832 or extended language subtag that has no 'Scope' field is an

1833 individual language.

1834

1835 'Scope' information can sometimes be helpful in selecting language

1836 tags, since it indicates the purpose or "scope" of the code

1837 assignment within ISO 639. The available values are:

1838

1839 o 'macrolanguage' - Indicates a macrolanguage as defined by ISO

1840 639-3 (see Section 3.1.10). A macrolanguage is a cluster of

1841 closely related languages that are sometimes considered to be a

1842 single language.

1843

1844 o 'collection' - Indicates a subtag that represents a collection of

1845 languages, typically related by some type of historical,

1846 geographical, or linguistic association. Unlike a macrolanguage,

1847

1848

1849

1850Phillips & Davis Best Current Practice [Page 33]

1851

1852RFC 5646 Language Tags September 2009

1853

1854

1855 a collection can contain languages that are only loosely related

1856 and a collection cannot be used interchangeably with languages

1857 that belong to it.

1858

1859 o 'special' - Indicates a special language code. These are subtags

1860 used for identifying linguistic attributes not particularly

1861 associated with a concrete language. These include codes for when

1862 the language is undetermined or for non-linguistic content.

1863

1864 o 'private-use' - Indicates a code reserved for private use in the

1865 underlying standard. Subtags with this scope can be used to

1866 indicate a primary language for which no ISO 639 or registered

1867 assignment exists.

1868

1869 The 'Scope' field MAY appear in records of type 'language' or

1870 'extlang'. Note that many of the prefixes for extended language

1871 subtags will have a 'Scope' of 'macrolanguage' (although some will

1872 not) and that many languages that have a 'Scope' of 'macrolanguage'

1873 will have extended language subtags associated with them.

1874

1875 The 'Scope' field MAY be added, modified, or removed via the

1876 registration process, provided the change mirrors changes made by ISO

1877 639 to the assignment's classification. Such a change is expected to

1878 be rare.

1879

1880 For example, the primary language subtag 'zh' (Chinese) has a 'Scope'

1881 of 'macrolanguage', while its enclosed language 'nan' (Min Nan

1882 Chinese) has a 'Scope' of 'individual'. The special value 'und'

1883 (Undetermined) has a 'Scope' of 'special'. The ISO 639-5 collection

1884 'gem' (Germanic languages) has a 'Scope' of 'collection'.

1885

18863.1.12. Comments Field

1887

1888 The field 'Comments' contains additional information about the record

1889 and MAY appear more than once per record. The field-body MAY include

1890 the full range of Unicode characters and is not restricted to any

1891 particular script. This field MAY be inserted or changed via the

1892 registration process, and no guarantee of stability is provided.

1893

1894 The content of this field is not restricted, except by the need to

1895 register the information, the suitability of the request, and by

1896 reasonable practical size limitations. The primary reason for the

1897 'Comments' field is subtag identification -- to help distinguish the

1898 subtag from others with which it might be confused as an aid to

1899 usage. Large amounts of information about the use, history, or

1900 general background of a subtag are frowned upon, as these generally

1901 belong in a registration request rather than in the registry.

1906Phillips & Davis Best Current Practice [Page 34]

1907

1908RFC 5646 Language Tags September 2009

1909

1910

19113.2. Language Subtag Reviewer

1912

1913 The Language Subtag Reviewer moderates the ietf-languages@iana.org

1914 mailing list, responds to requests for registration, and performs the

1915 other registry maintenance duties described in Section 3.3. Only the

1916 Language Subtag Reviewer is permitted to request IANA to change,

1917 update, or add records to the Language Subtag Registry. The Language

1918 Subtag Reviewer MAY delegate list moderation and other clerical

1919 duties as needed.

1920

1921 The Language Subtag Reviewer is appointed by the IESG for an

1922 indefinite term, subject to removal or replacement at the IESG's

1923 discretion. The IESG will solicit nominees for the position (upon

1924 adoption of this document or upon a vacancy) and then solicit

1925 feedback on the nominees' qualifications. Qualified candidates

1926 should be familiar with BCP 47 and its requirements; be willing to

1927 fairly, responsively, and judiciously administer the registration

1928 process; and be suitably informed about the issues of language

1929 identification so that the reviewer can assess the claims and draw

1930 upon the contributions of language experts and subtag requesters.

1931

1932 The subsequent performance or decisions of the Language Subtag

1933 Reviewer MAY be appealed to the IESG under the same rules as other

1934 IETF decisions (see [RFC2026]). The IESG can reverse or overturn the

1935 decisions of the Language Subtag Reviewer, provide guidance, or take

1936 other appropriate actions.

1937

19383.3. Maintenance of the Registry

1939

1940 Maintenance of the registry requires that, as codes are assigned or

1941 withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language

1942 Subtag Reviewer MUST evaluate each change and determine the

1943 appropriate course of action according to the rules in this document.

1944 Such updates follow the registration process described in

1945 Section 3.5. Usually, the Language Subtag Reviewer will start the

1946 process for the new or updated record by filling in the registration

1947 form and submitting it. If a change to one of these standards takes

1948 place and the Language Subtag Reviewer does not do this in a timely

1949 manner, then any interested party MAY submit the form. Thereafter,

1950 the registration process continues normally.

1951

1952 Note that some registrations affect other subtags--perhaps more than

1953 one--as when a region subtag is being deprecated in favor of a new

1954 value. The Language Subtag Reviewer is responsible for ensuring that

1955 any such changes are properly registered, with each change requiring

1956 its own registration form.

1962Phillips & Davis Best Current Practice [Page 35]

1963

1964RFC 5646 Language Tags September 2009

1965

1966

1967 The Language Subtag Reviewer MUST ensure that new subtags meet the

1968 requirements elsewhere in this document (and most especially in

1969 Section 3.4) or submit an appropriate registration form for an

1970 alternate subtag as described in that section. Each individual

1971 subtag affected by a change MUST be sent to the

1972 ietf-languages@iana.org list with its own registration form and in a

1973 separate message.

1974

19753.4. Stability of IANA Registry Entries

1976

1977 The stability of entries and their meaning in the registry is

1978 critical to the long-term stability of language tags. The rules in

1979 this section guarantee that a specific language tag's meaning is

1980 stable over time and will not change.

1981

1982 These rules specifically deal with how changes to codes (including

1983 withdrawal and deprecation of codes) maintained by ISO 639, ISO

1984 15924, ISO 3166, and UN M.49 are reflected in the IANA Language

1985 Subtag Registry. Assignments to the IANA Language Subtag Registry

1986 MUST follow the following stability rules:

1987

1988 1. Values in the fields 'Type', 'Subtag', 'Tag', and 'Added' MUST

1989 NOT be changed and are guaranteed to be stable over time.

1990

1991 2. Values in the fields 'Preferred-Value' and 'Deprecated' MAY be

1992 added, altered, or removed via the registration process. These

1993 changes SHOULD be limited to changes necessary to mirror changes

1994 in one of the underlying standards (ISO 639, ISO 15924, ISO

1995 3166-1, or UN M.49) and typically alteration or removal of a

1996 'Preferred-Value' is limited specifically to region codes.

1997

1998 3. Values in the 'Description' field MUST NOT be changed in a way

1999 that would invalidate any existing tags. The description MAY be

2000 broadened somewhat in scope, changed to add information, or

2001 adapted to the most common modern usage. For example, countries

2002 occasionally change their names; a historical example of this is

2003 "Upper Volta" changing to "Burkina Faso".

2004

2005 4. Values in the field 'Prefix' MAY be added to existing records of

2006 type 'variant' via the registration process, provided the

2007 'variant' already has at least one 'Prefix'. A 'Prefix' field

2008 SHALL NOT be registered for any 'variant' that has no existing

2009 'Prefix' field. If a prefix is added to a variant record,

2010 'Comment' fields MAY be used to explain different usages with

2011 the various prefixes.

2018Phillips & Davis Best Current Practice [Page 36]

2019

2020RFC 5646 Language Tags September 2009

2021

2022

2023 5. Values in the field 'Prefix' in records of type 'variant' MAY

2024 also be modified, so long as the modifications broaden the set

2025 of prefixes. That is, a prefix MAY be replaced by one of its

2026 own prefixes. For example, the prefix "en-US" could be replaced

2027 by "en", but not by the prefixes "en-Latn", "fr", or "en-US-

2028 boont". If one of those prefix values were needed, it would

2029 have to be separately registered.

2030

2031 6. Values in the field 'Prefix' in records of type 'extlang' MUST

2032 NOT be added, modified, or removed.

2033

2034 7. The field 'Prefix' MUST NOT be removed from any record in which

2035 it appears. This field SHOULD be included in the initial

2036 registration of any records of type 'variant' and MUST be

2037 included in any records of type 'extlang'.

2038

2039 8. The field 'Comments' MAY be added, changed, modified, or removed

2040 via the registration process or any of the processes or

2041 considerations described in this section.

2042

2043 9. The field 'Suppress-Script' MAY be added or removed via the

2044 registration process.

2045

2046 10. The field 'Macrolanguage' MAY be added or removed via the

2047 registration process, but only in response to changes made by

2048 ISO 639. The 'Macrolanguage' field appears whenever a language

2049 has a corresponding macrolanguage in ISO 639. That is, the

2050 'Macrolanguage' fields in the registry exactly match those of

2051 ISO 639. No other macrolanguage mappings will be considered for

2052 registration.

2053

2054 11. The field 'Scope' MAY be added or removed from a primary or

2055 extended language subtag after initial registration, and it MAY

2056 be modified in order to match any changes made by ISO 639.

2057 Changes to the 'Scope' field MUST mirror changes made by ISO

2058 639. Note that primary or extended language subtags whose

2059 records do not contain a 'Scope' field (that is, most of them)

2060 are individual languages as described in Section 3.1.11.

2061

2062 12. Primary and extended language subtags (other than independently

2063 registered values created using the registration process) are

2064 created according to the assignments of the various parts of ISO

2065 639, as follows:

2066

2067 A. Codes assigned by ISO 639-1 that do not conflict with

2068 existing two-letter primary language subtags and that have

2069 no corresponding three-letter primary defined in the

2070 registry are entered into the IANA registry as new records

2071

2072

2073

2074Phillips & Davis Best Current Practice [Page 37]

2075

2076RFC 5646 Language Tags September 2009

2077

2078

2079 of type 'language'. Note that languages given an ISO 639-1

2080 code cannot be given extended language subtags, even if

2081 encompassed by a macrolanguage.

2082

2083 B. Codes assigned by ISO 639-3 or ISO 639-5 that do not

2084 conflict with existing three-letter primary language subtags

2085 and that do not have ISO 639-1 codes assigned (or expected

2086 to be assigned) are entered into the IANA registry as new

2087 records of type 'language'. Note that these two standards

2088 now comprise a superset of ISO 639-2 codes. Codes that have

2089 a defined 'macrolanguage' mapping at the time of their

2090 registration MUST contain a 'Macrolanguage' field.

2091

2092 C. Codes assigned by ISO 639-3 MAY also be considered for an

2093 extended language subtag registration. Note that they MUST

2094 be assigned a primary language subtag record of type

2095 'language' even when an 'extlang' record is proposed. When

2096 considering extended language subtag assignment, these

2097 criteria apply:

2098

2099 1. If a language has a macrolanguage mapping, and that

2100 macrolanguage has other encompassed languages that are

2101 assigned extended language subtags, then the new

2102 language SHOULD have an 'extlang' record assigned to it

2103 as well. For example, any language with a macrolanguage

2104 of 'zh' or 'ar' would be assigned an 'extlang' record.

2105

2106 2. 'Extlang' records SHOULD NOT be created for languages if

2107 other languages encompassed by the macrolanguage do not

2108 also include 'extlang' records. For example, if a new

2109 Serbo-Croatian ('sh') language were registered, it would

2110 not get an extlang record because other languages

2111 encompassed, such as Serbian ('sr'), do not include one

2112 in the registry.

2113

2114 3. Sign languages SHOULD have an 'extlang' record with a

2115 'Prefix' of 'sgn'.

2116

2117 4. 'Extlang' records MUST NOT be created for items already

2118 in the registry. Extended language subtags will only be

2119 considered at the time of initial registration.

2120

2121 5. Extended language subtag records MUST include the fields

2122 'Prefix' and 'Preferred-Value' with field values

2123 assigned as described in Section 2.2.2.

2124

2125 D. Any other codes assigned by ISO 639-2 that do not conflict

2126 with existing three-letter primary or extended language

2127

2128

2129

2130Phillips & Davis Best Current Practice [Page 38]

2131

2132RFC 5646 Language Tags September 2009

2133

2134

2135 subtags and that do not have ISO 639-1 two-letter codes

2136 assigned are entered into the IANA registry as new records

2137 of type 'language'. This type of registration is not

2138 supposed to occur in the future.

2139

2140 13. Codes assigned by ISO 15924 and ISO 3166-1 that do not conflict

2141 with existing subtags of the associated type and whose meaning

2142 is not the same as an existing subtag of the same type are

2143 entered into the IANA registry as new records.

2144

2145 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that are

2146 withdrawn by their respective maintenance or registration

2147 authority remain valid in language tags. A 'Deprecated' field

2148 containing the date of withdrawal MUST be added to the record.

2149 If a new record of the same type is added that represents a

2150 replacement value, then a 'Preferred-Value' field MAY also be

2151 added. The registration process MAY be used to add comments

2152 about the withdrawal of the code by the respective standard.

2153

2154 For example: the region code 'TL' was assigned to the country

2155 'Timor-Leste', replacing the code 'TP' (which was assigned to

2156 'East Timor' when it was under administration by Portugal).

2157 The subtag 'TP' remains valid in language tags, but its

2158 record contains the 'Preferred-Value' of 'TL' and its field

2159 'Deprecated' contains the date the new code was assigned

2160 ('2004-07-06').

2161

2162 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that

2163 conflict with existing subtags of the associated type, including

2164 subtags that are deprecated, MUST NOT be entered into the

2165 registry. The following additional considerations apply to

2166 subtag values that are reassigned:

2167

2168 A. For ISO 639 codes, if the newly assigned code's meaning is

2169 not represented by a subtag in the IANA registry, the

2170 Language Subtag Reviewer, as described in Section 3.5, SHALL

2171 prepare a proposal for entering in the IANA registry, as

2172 soon as practical, a registered language subtag as an

2173 alternate value for the new code. The form of the

2174 registered language subtag will be at the discretion of the

2175 Language Subtag Reviewer and MUST conform to other

2176 restrictions on language subtags in this document.

2177

2178 B. For all subtags whose meaning is derived from an external

2179 standard (that is, by ISO 639, ISO 15924, ISO 3166-1, or UN

2180 M.49), if a new meaning is assigned to an existing code and

2181 the new meaning broadens the meaning of that code, then the

2182 meaning for the associated subtag MAY be changed to match.

2183

2184

2185

2186Phillips & Davis Best Current Practice [Page 39]

2187

2188RFC 5646 Language Tags September 2009

2189

2190

2191 The meaning of a subtag MUST NOT be narrowed, however, as

2192 this can result in an unknown proportion of the existing

2193 uses of a subtag becoming invalid. Note: the ISO 639

2194 registration authority (RA) has adopted a similar stability

2195 policy.

2196

2197 C. For ISO 15924 codes, if the newly assigned code's meaning is

2198 not represented by a subtag in the IANA registry, the

2199 Language Subtag Reviewer, as described in Section 3.5, SHALL

2200 prepare a proposal for entering in the IANA registry, as

2201 soon as practical, a registered variant subtag as an

2202 alternate value for the new code. The form of the

2203 registered variant subtag will be at the discretion of the

2204 Language Subtag Reviewer and MUST conform to other

2205 restrictions on variant subtags in this document.

2206

2207 D. For ISO 3166-1 codes, if the newly assigned code's meaning

2208 is associated with the same UN M.49 code as another 'region'

2209 subtag, then the existing region subtag remains as the

2210 preferred value for that region and no new entry is created.

2211 A comment MAY be added to the existing region subtag

2212 indicating the relationship to the new ISO 3166-1 code.

2213

2214 E. For ISO 3166-1 codes, if the newly assigned code's meaning

2215 is associated with a UN M.49 code that is not represented by

2216 an existing region subtag, then the Language Subtag

2217 Reviewer, as described in Section 3.5, SHALL prepare a

2218 proposal for entering the appropriate UN M.49 country code

2219 as an entry in the IANA registry.

2220

2221 F. For ISO 3166-1 codes, if there is no associated UN numeric

2222 code, then the Language Subtag Reviewer SHALL petition the

2223 UN to create one. If there is no response from the UN

2224 within 90 days of the request being sent, the Language

2225 Subtag Reviewer SHALL prepare a proposal for entering in the

2226 IANA registry, as soon as practical, a registered variant

2227 subtag as an alternate value for the new code. The form of

2228 the registered variant subtag will be at the discretion of

2229 the Language Subtag Reviewer and MUST conform to other

2230 restrictions on variant subtags in this document. This

2231 situation is very unlikely to ever occur.

2232

2233 16. UN M.49 has codes for both "countries and areas" (such as '276'

2234 for Germany) and "geographical regions and sub-regions" (such as

2235 '150' for Europe). UN M.49 country or area codes for which

2236 there is no corresponding ISO 3166-1 code MUST NOT be

2237 registered, except as a surrogate for an ISO 3166-1 code that is

2238 blocked from registration by an existing subtag.

2239

2240

2241

2242Phillips & Davis Best Current Practice [Page 40]

2243

2244RFC 5646 Language Tags September 2009

2245

2246

2247 If such a code becomes necessary, then the maintenance agency

2248 for ISO 3166-1 SHALL first be petitioned to assign a code to the

2249 region. If the petition for a code assignment by ISO 3166-1 is

2250 refused or not acted on in a timely manner, the registration

2251 process described in Section 3.5 can then be used to register

2252 the corresponding UN M.49 code. This way, UN M.49 codes remain

2253 available as the value of last resort in cases where ISO 3166-1

2254 reassigns a deprecated value in the registry.

2255

2256 17. The redundant and grandfathered entries together form the

2257 complete list of tags registered under [RFC3066]. The redundant

2258 tags are those previously registered tags that can now be formed

2259 using the subtags defined in the registry. The grandfathered

2260 entries include those that can never be legal because they are

2261 'irregular' (that is, they do not match the 'langtag' production

2262 in Figure 1), are limited by rule (subtags such as 'nyn' and

2263 'min' look like the extlang production, but cannot be registered

2264 as extended language subtags), or their subtags are

2265 inappropriate for registration. All of the grandfathered tags

2266 are listed in either the 'regular' or the 'irregular'

2267 productions in the ABNF. Under [RFC4646] it was possible for

2268 grandfathered tags to become redundant. However, all of the

2269 tags for which this was possible became redundant before this

2270 document was produced. So the set of redundant and

2271 grandfathered tags is now permanent and immutable: new entries

2272 of either type MUST NOT be added and existing entries MUST NOT

2273 be removed. The decision-making process about which tags were

2274 initially grandfathered and which were made redundant is

2275 described in [RFC4645].

2276

2277 Many of the grandfathered tags are deprecated -- indeed, they

2278 were deprecated even before [RFC4646]. For example, the tag

2279 "art-lojban" was deprecated in favor of the primary language

2280 subtag 'jbo'. These tags could have been made 'redundant' by

2281 registering some of their subtags as 'variants'. The 'variant-

2282 like' subtags in the grandfathered registrations SHALL NOT be

2283 registered in the future, even with a similar or identical

2284 meaning.

2285

22863.5. Registration Procedure for Subtags

2287

2288 The procedure given here MUST be used by anyone who wants to use a

2289 subtag not currently in the IANA Language Subtag Registry or who

2290 wishes to add, modify, update, or remove information in existing

2291 records as permitted by this document.

2292

2293 Only subtags of type 'language' and 'variant' will be considered for

2294 independent registration of new subtags. Subtags needed for

2295

2296

2297

2298Phillips & Davis Best Current Practice [Page 41]

2299

2300RFC 5646 Language Tags September 2009

2301

2302

2303 stability and subtags necessary to keep the registry synchronized

2304 with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits

2305 defined by this document also use this process, as described in

2306 Section 3.3 and subject to stability provisions as described in

2307 Section 3.4.

2308

2309 Registration requests are accepted relating to information in the

2310 'Comments', 'Deprecated', 'Description', 'Prefix', 'Preferred-Value',

2311 'Macrolanguage', or 'Suppress-Script' fields in a subtag's record as

2312 described in Section 3.4. Changes to all other fields in the IANA

2313 registry are NOT permitted.

2314

2315 Registering a new subtag or requesting modifications to an existing

2316 tag or subtag starts with the requester filling out the registration

2317 form reproduced below. Note that each response is not limited in

2318 size so that the request can adequately describe the registration.

2319 The fields in the "Record Requested" section need to follow the

2320 requirements in Section 3.1 before the record will be approved.

2321

2322 LANGUAGE SUBTAG REGISTRATION FORM

2323 1. Name of requester:

2324 2. E-mail address of requester:

2325 3. Record Requested:

2326

2327 Type:

2328 Subtag:

2329 Description:

2330 Prefix:

2331 Preferred-Value:

2332 Deprecated:

2333 Suppress-Script:

2334 Macrolanguage:

2335 Comments:

2336

2337 4. Intended meaning of the subtag:

2338 5. Reference to published description

2339 of the language (book or article):

2340 6. Any other relevant information:

2341

2342 Figure 5: The Language Subtag Registration Form

2343

2344 Examples of completed registration forms can be found in Appendix B.

2345 A complete list of approved registration forms is online through

2346 http://www.iana.org; readers should note that the Language Tag

2347 Registry is now obsolete and should instead look for the Language

2348 Subtag Registry.

2354Phillips & Davis Best Current Practice [Page 42]

2355

2356RFC 5646 Language Tags September 2009

2357

2358

2359 The subtag registration form MUST be sent to

2360 <ietf-languages@iana.org>. Registration requests receive a two-week

2361 review period before being approved and submitted to IANA for

2362 inclusion in the registry. If modifications are made to the request

2363 during the course of the registration process (such as corrections to

2364 meet the requirements in Section 3.1 or to make the 'Description'

2365 fields unique for the given record type), the modified form MUST also

2366 be sent to <ietf-languages@iana.org> at least one week prior to

2367 submission to IANA.

2368

2369 The ietf-languages list is an open list and can be joined by sending

2370 a request to <ietf-languages-request@iana.org>. The list can be

2371 hosted by IANA or any third party at the request of IESG.

2372

2373 Before forwarding any registration to IANA, the Language Subtag

2374 Reviewer MUST ensure that all requirements in this document are met.

2375 This includes ensuring that values in the 'Subtag' field match case

2376 according to the description in Section 3.1.4 and that 'Description'

2377 fields are unique for the given record type as described in

2378 Section 3.1.5. The Reviewer MUST also ensure that an appropriate

2379 File-Date record is included in the request, to assist IANA when

2380 updating the registry (see Section 5.1).

2381

2382 Some fields in both the registration form as well as the registry

2383 record itself permit the use of non-ASCII characters. Registration

2384 requests SHOULD use the UTF-8 encoding for consistency and clarity.

2385 However, since some mail clients do not support this encoding, other

2386 encodings MAY be used for the registration request. The Language

2387 Subtag Reviewer is responsible for ensuring that the proper Unicode

2388 characters appear in both the archived request form and the registry

2389 record. In the case of a transcription or encoding error by IANA,

2390 the Language Subtag Reviewer will request that the registry be

2391 repaired, providing any necessary information to assist IANA.

2392

2393 Extended language subtags (type 'extlang'), by definition, are always

2394 encompassed by another language. All records of type 'extlang' MUST,

2395 therefore, contain a 'Prefix' field at the time of registration.

2396 This 'Prefix' field can never be altered or removed, and requests to

2397 do so MUST be rejected.

2398

2399 Variant subtags are usually registered for use with a particular

2400 range of language tags, and variant subtags based on the terminology

2401 of the language to which they are apply are encouraged. For example,

2402 the subtag 'rozaj' (Resian) is intended for use with language tags

2403 that start with the primary language subtag "sl" (Slovenian), since

2404 Resian is a dialect of Slovenian. Thus, the subtag 'rozaj' would be

2405 appropriate in tags such as "sl-Latn-rozaj" or "sl-IT-rozaj". This

2406 information is stored in the 'Prefix' field in the registry. Variant

2407

2408

2409

2410Phillips & Davis Best Current Practice [Page 43]

2411

2412RFC 5646 Language Tags September 2009

2413

2414

2415 registration requests SHOULD include at least one 'Prefix' field in

2416 the registration form.

2417

2418 Requests to assign an additional record of a given type with an

2419 existing subtag value MUST be rejected. For example, the variant

2420 subtag 'rozaj' already exists in the registry, so adding a second

2421 record of type 'variant' with the subtag 'rozaj' is prohibited.

2422

2423 The 'Prefix' field for a given registered variant subtag exists in

2424 the IANA registry as a guide to usage. Additional 'Prefix' fields

2425 MAY be added by filing an additional registration form. In that

2426 form, the "Any other relevant information:" field MUST indicate that

2427 it is the addition of a prefix.

2428

2429 Requests to add a 'Prefix' field to a variant subtag that imply a

2430 different semantic meaning SHOULD be rejected. For example, a

2431 request to add the prefix "de" to the subtag '1994' so that the tag

2432 "de-1994" represented some German dialect or orthographic form would

2433 be rejected. The '1994' subtag represents a particular Slovenian

2434 orthography, and the additional registration would change or blur the

2435 semantic meaning assigned to the subtag. A separate subtag SHOULD be

2436 proposed instead.

2437

2438 Requests to add a 'Prefix' to a variant subtag that has no current

2439 'Prefix' field MUST be rejected. Variants are registered with no

2440 prefix because they are potentially useful with many or even all

2441 languages. Adding one or more 'Prefix' fields would be potentially

2442 harmful to the use of the variant, since it dramatically reduces the

2443 scope of the subtag (which is not allowed under the stability rules

2444 (Section 3.4) as opposed to broadening the scope of the subtag, which

2445 is what the addition of a 'Prefix' normally does. An example of such

2446 a "no-prefix" variant is the subtag 'fonipa', which represents the

2447 International Phonetic Alphabet, a scheme that can be used to

2448 transcribe many languages.

2449

2450 The 'Description' fields provided in the request MUST contain at

2451 least one description written or transcribed into the Latin script;

2452 the request MAY also include additional 'Description' fields in any

2453 script or language. The 'Description' field is used for

2454 identification purposes and doesn't necessarily represent the actual

2455 native name of the language or variation. It also doesn't have to be

2456 in any particular language, but SHOULD be both suitable and

2457 sufficient to identify the item in the record. The Language Subtag

2458 Reviewer will check and edit any proposed 'Description' fields so as

2459 to ensure uniqueness and prevent collisions with 'Description' fields

2460 in other records of the same type. If this occurs in an independent

2461 registration request, the Language Subtag Reviewer MUST resubmit the

2462 record to <ietf-languages@iana.org>, treating it as a modification of

2463

2464

2465

2466Phillips & Davis Best Current Practice [Page 44]

2467

2468RFC 5646 Language Tags September 2009

2469

2470

2471 a request due to discussion, as described in Section 3.5, unless the

2472 request's sole purpose is to introduce a duplicate 'Description'

2473 field, in which case the request SHALL be rejected.

2474

2475 The 'Description' field is not guaranteed to be stable. Corrections

2476 or clarifications of intent are examples of possible changes.

2477 Attempts to provide translations or transcriptions of entries in the

2478 registry (which, by definition, provide no new information) are

2479 unlikely to be approved.

2480

2481 Soon after the two-week review period has passed, the Language Subtag

2482 Reviewer MUST take one of the following actions:

2483

2484 o Explicitly accept the request and forward the form containing the

2485 record to be inserted or modified to <iana@iana.org> according to

2486 the procedure described in Section 3.3.

2487

2488 o Explicitly reject the request because of significant objections

2489 raised on the list or due to problems with constraints in this

2490 document (which MUST be explicitly cited).

2491

2492 o Extend the review period by granting an additional two-week

2493 increment to permit further discussion. After each two-week

2494 increment, the Language Subtag Reviewer MUST indicate on the list

2495 whether the registration has been accepted, rejected, or extended.

2496

2497 Note that the Language Subtag Reviewer MAY raise objections on the

2498 list if he or she so desires. The important thing is that the

2499 objection MUST be made publicly.

2500

2501 Sometimes the request needs to be modified as a result of discussion

2502 during the review period or due to requirements in this document.

2503 The applicant, Language Subtag Reviewer, or others MAY submit a

2504 modified version of the completed registration form, which will be

2505 considered in lieu of the original request with the explicit approval

2506 of the applicant. Such changes do not restart the two-week

2507 discussion period, although an application containing the final

2508 record submitted to IANA MUST appear on the list at least one week

2509 prior to the Language Subtag Reviewer forwarding the record to IANA.

2510 The applicant MAY modify a rejected application with more appropriate

2511 or additional information and submit it again; this starts a new two-

2512 week comment period.

2513

2514 Registrations initiated due to the provisions of Section 3.3 or

2515 Section 3.4 SHALL NOT be rejected altogether (since they have to

2516 ultimately appear in the registry) and SHOULD be completed as quickly

2517 as possible. The review process allows list members to comment on

2518 the specific information in the form and the record it contains and

2519

2520

2521

2522Phillips & Davis Best Current Practice [Page 45]

2523

2524RFC 5646 Language Tags September 2009

2525

2526

2527 thus help ensure that it is correct and consistent. The Language

2528 Subtag Reviewer MAY reject a specific version of the form, but MUST

2529 propose a suitable replacement, extending the review period as

2530 described above, until the form is in a format worthy of the

2531 reviewer's approval and meets with rough consensus of the list.

2532

2533 Decisions made by the Language Subtag Reviewer MAY be appealed to the

2534 IESG [RFC2028] under the same rules as other IETF decisions

2535 [RFC2026]. This includes a decision to extend the review period or

2536 the failure to announce a decision in a clear and timely manner.

2537

2538 The approved records appear in the Language Subtag Registry. The

2539 approved registration forms are available online from

2540 http://www.iana.org.

2541

2542 Updates or changes to existing records follow the same procedure as

2543 new registrations. The Language Subtag Reviewer decides whether

2544 there is consensus to update the registration following the two-week

2545 review period; normally, objections by the original registrant will

2546 carry extra weight in forming such a consensus.

2547

2548 Registrations are permanent and stable. Once registered, subtags

2549 will not be removed from the registry and will remain a valid way in

2550 which to specify a specific language or variant.

2551

2552 Note: The purpose of the "Reference to published description" section

2553 in the registration form is to aid in verifying whether a language is

2554 registered or to which language or language variation a particular

2555 subtag refers. In most cases, reference to an authoritative grammar

2556 or dictionary of that language will be useful; in cases where no such

2557 work exists, other well-known works describing that language or in

2558 that language MAY be appropriate. The Language Subtag Reviewer

2559 decides what constitutes "good enough" reference material. This

2560 requirement is not intended to exclude particular languages or

2561 dialects due to the size of the speaker population or lack of a

2562 standardized orthography. Minority languages will be considered

2563 equally on their own merits.

2564

25653.6. Possibilities for Registration

2566

2567 Possibilities for registration of subtags or information about

2568 subtags include:

2569

2570 o Primary language subtags for languages not listed in ISO 639 that

2571 are not variants of any listed or registered language MAY be

2572 registered. At the time this document was created, there were no

2573 examples of this form of subtag. Before attempting to register a

2574 language subtag, there MUST be an attempt to register the language

2575

2576

2577

2578Phillips & Davis Best Current Practice [Page 46]

2579

2580RFC 5646 Language Tags September 2009

2581

2582

2583 with ISO 639. Subtags MUST NOT be registered for languages

2584 defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3;

2585 that are under consideration by the ISO 639 registration

2586 authorities; or that have never been attempted for registration

2587 with those authorities. If ISO 639 has previously rejected a

2588 language for registration, it is reasonable to assume that there

2589 must be additional, very compelling evidence of need before it

2590 will be registered as a primary language subtag in the IANA

2591 registry (to the extent that it is very unlikely that any subtags

2592 will be registered of this type).

2593

2594 o Dialect or other divisions or variations within a language, its

2595 orthography, writing system, regional or historical usage,

2596 transliteration or other transformation, or distinguishing

2597 variation MAY be registered as variant subtags. An example is the

2598 'rozaj' subtag (the Resian dialect of Slovenian).

2599

2600 o The addition or maintenance of fields (generally of an

2601 informational nature) in tag or subtag records as described in

2602 Section 3.1 is allowed. Such changes are subject to the stability

2603 provisions in Section 3.4. This includes 'Description',

2604 'Comments', 'Deprecated', and 'Preferred-Value' fields for

2605 obsolete or withdrawn codes, or the addition of 'Suppress-Script'

2606 or 'Macrolanguage' fields to primary language subtags, as well as

2607 other changes permitted by this document, such as the addition of

2608 an appropriate 'Prefix' field to a variant subtag.

2609

2610 o The addition of records and related field value changes necessary

2611 to reflect assignments made by ISO 639, ISO 15924, ISO 3166-1, and

2612 UN M.49 as described in Section 3.4 is allowed.

2613

2614 Subtags proposed for registration that would cause all or part of a

2615 grandfathered tag to become redundant but whose meaning conflicts

2616 with or alters the meaning of the grandfathered tag MUST be rejected.

2617

2618 This document leaves the decision on what subtags or changes to

2619 subtags are appropriate (or not) to the registration process

2620 described in Section 3.5.

2621

2622 Note: Four-character primary language subtags are reserved to allow

2623 for the possibility of alpha4 codes in some future addition to the

2624 ISO 639 family of standards.

2625

2626 ISO 639 defines a registration authority for additions to and changes

2627 in the list of languages in ISO 639. This agency is:

2634Phillips & Davis Best Current Practice [Page 47]

2635

2636RFC 5646 Language Tags September 2009

2637

2638

2639 International Information Centre for Terminology (Infoterm)

2640 Aichholzgasse 6/12, AT-1120

2641 Wien, Austria

2642 Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72

2643

2644 ISO 639-2 defines a registration authority for additions to and

2645 changes in the list of languages in ISO 639-2. This agency is:

2646

2647 Library of Congress

2648 Network Development and MARC Standards Office

2649 Washington, DC 20540, USA

2650 Phone: +1 202 707 6237 Fax: +1 202 707 0115

2651 URL: http://www.loc.gov/standards/iso639-2

2652

2653 ISO 639-3 defines a registration authority for additions to and

2654 changes in the list of languages in ISO 639-3. This agency is:

2655

2656 SIL International

2657 ISO 639-3 Registrar

2658 7500 W. Camp Wisdom Rd.

2659 Dallas, TX 75236, USA

2660 Phone: +1 972 708 7400, ext. 2293

2661 Fax: +1 972 708 7546

2662 Email: iso639-3@sil.org

2663 URL: http://www.sil.org/iso639-3

2664

2665 ISO 639-5 defines a registration authority for additions to and

2666 changes in the list of languages in ISO 639-5. This agency is the

2667 same as for ISO 639-2 and is:

2668

2669 Library of Congress

2670 Network Development and MARC Standards Office

2671 Washington, DC 20540, USA

2672 Phone: +1 202 707 6237

2673 Fax: +1 202 707 0115

2674 URL: http://www.loc.gov/standards/iso639-5

2675

2676 The maintenance agency for ISO 3166-1 (country codes) is:

2677

2678 ISO 3166 Maintenance Agency

2679 c/o International Organization for Standardization

2680 Case postale 56

2681 CH-1211 Geneva 20, Switzerland

2682 Phone: +41 22 749 72 33 Fax: +41 22 749 73 49

2683 URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

2690Phillips & Davis Best Current Practice [Page 48]

2691

2692RFC 5646 Language Tags September 2009

2693

2694

2695 The registration authority for ISO 15924 (script codes) is:

2696

2697 Unicode Consortium

2698 Box 391476

2699 Mountain View, CA 94039-1476, USA

2700 URL: http://www.unicode.org/iso15924

2701

2702 The Statistics Division of the United Nations Secretariat maintains

2703 the Standard Country or Area Codes for Statistical Use and can be

2704 reached at:

2705

2706 Statistical Services Branch

2707 Statistics Division

2708 United Nations, Room DC2-1620

2709 New York, NY 10017, USA

2710 Fax: +1-212-963-0623

2711 Email: statistics@un.org

2712 URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm

2713

27143.7. Extensions and the Extensions Registry

2715

2716 Extension subtags are those introduced by single-character subtags

2717 ("singletons") other than 'x'. They are reserved for the generation

2718 of identifiers that contain a language component and are compatible

2719 with applications that understand language tags.

2720

2721 The structure and form of extensions are defined by this document so

2722 that implementations can be created that are forward compatible with

2723 applications that might be created using singletons in the future.

2724 In addition, defining a mechanism for maintaining singletons will

2725 lend stability to this document by reducing the likely need for

2726 future revisions or updates.

2727

2728 Single-character subtags are assigned by IANA using the "IETF Review"

2729 policy defined by [RFC5226]. This policy requires the development of

2730 an RFC, which SHALL define the name, purpose, processes, and

2731 procedures for maintaining the subtags. The maintaining or

2732 registering authority, including name, contact email, discussion list

2733 email, and URL location of the registry, MUST be indicated clearly in

2734 the RFC. The RFC MUST specify or include each of the following:

2735

2736 o The specification MUST reference the specific version or revision

2737 of this document that governs its creation and MUST reference this

2738 section of this document.

2739

2740 o The specification and all subtags defined by the specification

2741 MUST follow the ABNF and other rules for the formation of tags and

2742 subtags as defined in this document. In particular, it MUST

2743

2744

2745

2746Phillips & Davis Best Current Practice [Page 49]

2747

2748RFC 5646 Language Tags September 2009

2749

2750

2751 specify that case is not significant and that subtags MUST NOT

2752 exceed eight characters in length.

2753

2754 o The specification MUST specify a canonical representation.

2755

2756 o The specification of valid subtags MUST be available over the

2757 Internet and at no cost.

2758

2759 o The specification MUST be in the public domain or available via a

2760 royalty-free license acceptable to the IETF and specified in the

2761 RFC.

2762

2763 o The specification MUST be versioned, and each version of the

2764 specification MUST be numbered, dated, and stable.

2765

2766 o The specification MUST be stable. That is, extension subtags,

2767 once defined by a specification, MUST NOT be retracted or change

2768 in meaning in any substantial way.

2769

2770 o The specification MUST include, in a separate section, the

2771 registration form reproduced in this section (below) to be used in

2772 registering the extension upon publication as an RFC.

2773

2774 o IANA MUST be informed of changes to the contact information and

2775 URL for the specification.

2776

2777 IANA will maintain a registry of allocated single-character

2778 (singleton) subtags. This registry MUST use the record-jar format

2779 described by the ABNF in Section 3.1.1. Upon publication of an

2780 extension as an RFC, the maintaining authority defined in the RFC

2781 MUST forward this registration form to <iesg@ietf.org>, who MUST

2782 forward the request to <iana@iana.org>. The maintaining authority of

2783 the extension MUST maintain the accuracy of the record by sending an

2784 updated full copy of the record to <iana@iana.org> with the subject

2785 line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only

2786 the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY

2787 be modified in these updates.

2788

2789 Failure to maintain this record, maintain the corresponding registry,

2790 or meet other conditions imposed by this section of this document MAY

2791 be appealed to the IESG [RFC2028] under the same rules as other IETF

2792 decisions (see [RFC2026]) and MAY result in the authority to maintain

2793 the extension being withdrawn or reassigned by the IESG.

2802Phillips & Davis Best Current Practice [Page 50]

2803

2804RFC 5646 Language Tags September 2009

2805

2806

2807 %%

2808 Identifier:

2809 Description:

2810 Comments:

2811 Added:

2812 RFC:

2813 Authority:

2814 Contact_Email:

2815 Mailing_List:

2816 URL:

2817 %%

2818

2819 Figure 6: Format of Records in the Language Tag Extensions Registry

2820

2821 'Identifier' contains the single-character subtag (singleton)

2822 assigned to the extension. The Internet-Draft submitted to define

2823 the extension SHOULD specify which letter or digit to use, although

2824 the IESG MAY change the assignment when approving the RFC.

2825

2826 'Description' contains the name and description of the extension.

2827

2828 'Comments' is an OPTIONAL field and MAY contain a broader description

2829 of the extension.

2830

2831 'Added' contains the date the extension's RFC was published in the

2832 "full-date" format specified in [RFC3339]. For example: 2004-06-28

2833 represents June 28, 2004, in the Gregorian calendar.

2834

2835 'RFC' contains the RFC number assigned to the extension.

2836

2837 'Authority' contains the name of the maintaining authority for the

2838 extension.

2839

2840 'Contact_Email' contains the email address used to contact the

2841 maintaining authority.

2842

2843 'Mailing_List' contains the URL or subscription email address of the

2844 mailing list used by the maintaining authority.

2845

2846 'URL' contains the URL of the registry for this extension.

2847

2848 The determination of whether an Internet-Draft meets the above

2849 conditions and the decision to grant or withhold such authority rests

2850 solely with the IESG and is subject to the normal review and appeals

2851 process associated with the RFC process.

2852

2853 Extension authors are strongly cautioned that many (including most

2854 well-formed) processors will be unaware of any special relationships

2855

2856

2857

2858Phillips & Davis Best Current Practice [Page 51]

2859

2860RFC 5646 Language Tags September 2009

2861

2862

2863 or meaning inherent in the order of extension subtags. Extension

2864 authors SHOULD avoid subtag relationships or canonicalization

2865 mechanisms that interfere with matching or with length restrictions

2866 that sometimes exist in common protocols where the extension is used.

2867 In particular, applications MAY truncate the subtags in doing

2868 matching or in fitting into limited lengths, so it is RECOMMENDED

2869 that the most significant information be in the most significant

2870 (left-most) subtags and that the specification gracefully handle

2871 truncated subtags.

2872

2873 When a language tag is to be used in a specific, known protocol, it

2874 is RECOMMENDED that the language tag not contain extensions not

2875 supported by that protocol. In addition, note that some protocols

2876 MAY impose upper limits on the length of the strings used to store or

2877 transport the language tag.

2878

28793.8. Update of the Language Subtag Registry

2880

2881 After the adoption of this document, the IANA Language Subtag

2882 Registry needed an update so that it would contain the complete set

2883 of subtags valid in a language tag. [RFC5645] describes the process

2884 used to create this update.

2885

2886 Registrations that are in process under the rules defined in

2887 [RFC4646] when this document is adopted MUST be completed under the

2888 rules contained in this document.

2889

28903.9. Applicability of the Subtag Registry

2891

2892 The Language Subtag Registry is the source of data elements used to

2893 construct language tags, following the rules described in this

2894 document. Language tags are designed for indicating linguistic

2895 attributes of various content, including not only text but also most

2896 media formats, such as video or audio. They also form the basis for

2897 language and locale negotiation in various protocols and APIs.

2898

2899 The registry is therefore applicable to many applications that need

2900 some form of language identification, with these limitations:

2901

2902 o It is not designed to be the sole data source in the creation of a

2903 language-selection user interface. For example, the registry does

2904 not contain translations for subtag descriptions or for tags

2905 composed from the subtags. Sources for localized data based on

2906 the registry are generally available, notably [CLDR]. Nor does

2907 the registry indicate which subtag combinations are particularly

2908 useful or relevant.

2914Phillips & Davis Best Current Practice [Page 52]

2915

2916RFC 5646 Language Tags September 2009

2917

2918

2919 o It does not provide information indicating relationships between

2920 different languages, such as might be used in a user interface to

2921 select language tags hierarchically, regionally, or on some other

2922 organizational model.

2923

2924 o It does not supply information about potential overlap between

2925 different language tags, as the notion of what constitutes a

2926 language is not precise: several different language tags might be

2927 reasonable choices for the same given piece of content.

2928

2929 o It does not contain information about appropriate fallback choices

2930 when performing language negotiation. A good fallback language

2931 might be linguistically unrelated to the specified language. The

2932 fact that one language is often used as a fallback language for

2933 another is usually a result of outside factors, such as geography,

2934 history, or culture -- factors that might not apply in all cases.

2935 For example, most people who use Breton (a Celtic language used in

2936 the Northwest of France) would probably prefer to be served French

2937 (a Romance language) if Breton isn't available.

2938

29394. Formation and Processing of Language Tags

2940

2941 This section addresses how to use the information in the registry

2942 with the tag syntax to choose, form, and process language tags.

2943

29444.1. Choice of Language Tag

2945

2946 The guiding principle in forming language tags is to "tag content

2947 wisely." Sometimes there is a choice between several possible tags

2948 for the same content. The choice of which tag to use depends on the

2949 content and application in question, and some amount of judgment

2950 might be necessary when selecting a tag.

2951

2952 Interoperability is best served when the same language tag is used

2953 consistently to represent the same language. If an application has

2954 requirements that make the rules here inapplicable, then that

2955 application risks damaging interoperability. It is strongly

2956 RECOMMENDED that users not define their own rules for language tag

2957 choice.

2958

2959 Standards, protocols, and applications that reference this document

2960 normatively but apply different rules to the ones given in this

2961 section MUST specify how language tag selection varies from the

2962 guidelines given here.

2963

2964 To ensure consistent backward compatibility, this document contains

2965 several provisions to account for potential instability in the

2966 standards used to define the subtags that make up language tags.

2967

2968

2969

2970Phillips & Davis Best Current Practice [Page 53]

2971

2972RFC 5646 Language Tags September 2009

2973

2974

2975 These provisions mean that no valid language tag can become invalid,

2976 nor will a language tag have a narrower scope in the future (it may

2977 have a broader scope). The most appropriate language tag for a given

2978 application or content item might evolve over time, but once applied,

2979 the tag itself cannot become invalid or have its meaning wholly

2980 change.

2981

2982 A subtag SHOULD only be used when it adds useful distinguishing

2983 information to the tag. Extraneous subtags interfere with the

2984 meaning, understanding, and processing of language tags. In

2985 particular, users and implementations SHOULD follow the 'Prefix' and

2986 'Suppress-Script' fields in the registry (defined in Section 3.1):

2987 these fields provide guidance on when specific additional subtags

2988 SHOULD be used or avoided in a language tag.

2989

2990 The choice of subtags used to form a language tag SHOULD follow these

2991 guidelines:

2992

2993 1. Use as precise a tag as possible, but no more specific than is

2994 justified. Avoid using subtags that are not important for

2995 distinguishing content in an application.

2996

2997 * For example, 'de' might suffice for tagging an email written

2998 in German, while "de-CH-1996" is probably unnecessarily

2999 precise for such a task.

3000

3001 * Note that some subtag sequences might not represent the

3002 language a casual user might expect. For example, the Swiss

3003 German (Schweizerdeutsch) language is represented by "gsw-CH"

3004 and not by "de-CH". This latter tag represents German ('de')

3005 as used in Switzerland ('CH'), also known as Swiss High German

3006 (Schweizer Hochdeutsch). Both are real languages, and

3007 distinguishing between them could be important to an

3008 application.

3009

3010 2. The script subtag SHOULD NOT be used to form language tags unless

3011 the script adds some distinguishing information to the tag.

3012 Script subtags were first formally defined in [RFC4646]. Their

3013 use can affect matching and subtag identification for

3014 implementations of [RFC1766] or [RFC3066] (which are obsoleted by

3015 this document), as these subtags appear between the primary

3016 language and region subtags. Some applications can benefit from

3017 the use of script subtags in language tags, as long as the use is

3018 consistent for a given context. Script subtags are never

3019 appropriate for unwritten content (such as audio recordings).

3020 The field 'Suppress-Script' in the primary or extended language

3021 record in the registry indicates script subtags that do not add

3022 distinguishing information for most applications; this field

3023

3024

3025

3026Phillips & Davis Best Current Practice [Page 54]

3027

3028RFC 5646 Language Tags September 2009

3029

3030

3031 defines when users SHOULD NOT include a script subtag with a

3032 particular primary language subtag.

3033

3034 For example, if an implementation selects content using Basic

3035 Filtering [RFC4647] (originally described in Section 14.4 of

3036 [RFC2616]) and the user requested the language range "en-US",

3037 content labeled "en-Latn-US" will not match the request and thus

3038 not be selected. Therefore, it is important to know when script

3039 subtags will customarily be used and when they ought not be used.

3040

3041 For example:

3042

3043 * The subtag 'Latn' should not be used with the primary language

3044 'en' because nearly all English documents are written in the

3045 Latin script and it adds no distinguishing information.

3046 However, if a document were written in English mixing Latin

3047 script with another script such as Braille ('Brai'), then it

3048 might be appropriate to choose to indicate both scripts to aid

3049 in content selection, such as the application of a style

3050 sheet.

3051

3052 * When labeling content that is unwritten (such as a recording

3053 of human speech), the script subtag should not be used, even

3054 if the language is customarily written in several scripts.

3055 Thus, the subtitles to a movie might use the tag "uz-Arab"

3056 (Uzbek, Arabic script), but the audio track for the same

3057 language would be tagged simply "uz". (The tag "uz-Zxxx"

3058 could also be used where content is not written, as the subtag

3059 'Zxxx' represents the "Code for unwritten documents".)

3060

3061 3. If a tag or subtag has a 'Preferred-Value' field in its registry

3062 entry, then the value of that field SHOULD be used to form the

3063 language tag in preference to the tag or subtag in which the

3064 preferred value appears.

3065

3066 * For example, use 'jbo' for Lojban in preference to the

3067 grandfathered tag "art-lojban".

3068

3069 4. Use subtags or sequences of subtags for individual languages in

3070 preference to subtags for language collections. A "language

3071 collection" is a group of languages that are descended from a

3072 common ancestor, are spoken in the same geographical area, or are

3073 otherwise related. Certain language collections are assigned

3074 codes by [ISO639-5] (and some of these [ISO639-5] codes are also

3075 defined as collections in [ISO639-2]). These codes are included

3076 as primary language subtags in the registry. Subtags for a

3077 language collection in the registry have a 'Scope' field with a

3078 value of 'collection'. A subtag for a language collection is

3079

3080

3081

3082Phillips & Davis Best Current Practice [Page 55]

3083

3084RFC 5646 Language Tags September 2009

3085

3086

3087 always preferred to less specific alternatives such as 'mul' and

3088 'und' (see below), and a subtag representing a language

3089 collection MAY be used when more specific language information is

3090 not available. However, most users and implementations do not

3091 know there is a relationship between the collection and its

3092 individual languages. In addition, the relationship between the

3093 individual languages in the collection is not well defined; in

3094 particular, the languages are usually not mutually intelligible.

3095 Since the subtags are different, a request for the collection

3096 will typically only produce items tagged with the collection's

3097 subtag, not items tagged with subtags for the individual

3098 languages contained in the collection.

3099

3100 * For example, collections are interpreted inclusively, so the

3101 subtag 'gem' (Germanic languages) could, but SHOULD NOT, be

3102 used with content that would be better tagged with "en"

3103 (English), "de" (German), or "gsw" (Swiss German, Alemannic).

3104 While 'gem' collects all of these (and other) languages, most

3105 implementations will not match 'gem' to the individual

3106 languages; thus, using the subtag will not produce the desired

3107 result.

3108

3109 5. [ISO639-2] has defined several codes included in the subtag

3110 registry that require additional care when choosing language

3111 tags. In most of these cases, where omitting the language tag is

3112 permitted, such omission is preferable to using these codes.

3113 Language tags SHOULD NOT incorporate these subtags as a prefix,

3114 unless the additional information conveys some value to the

3115 application.

3116

3117 * The 'mul' (Multiple) primary language subtag identifies

3118 content in multiple languages. This subtag SHOULD NOT be used

3119 when a list of languages or individual tags for each content

3120 element can be used instead. For example, the 'Content-

3121 Language' header [RFC3282] allows a list of languages to be

3122 used, not just a single language tag.

3123

3124 * The 'und' (Undetermined) primary language subtag identifies

3125 linguistic content whose language is not determined. This

3126 subtag SHOULD NOT be used unless a language tag is required

3127 and language information is not available or cannot be

3128 determined. Omitting the language tag (where permitted) is

3129 preferred. The 'und' subtag might be useful for protocols

3130 that require a language tag to be provided or where a primary

3131 language subtag is required (such as in "und-Latn"). The

3132 'und' subtag MAY also be useful when matching language tags in

3133 certain situations.

3138Phillips & Davis Best Current Practice [Page 56]

3139

3140RFC 5646 Language Tags September 2009

3141

3142

3143 * The 'zxx' (Non-Linguistic, Not Applicable) primary language

3144 subtag identifies content for which a language classification

3145 is inappropriate or does not apply. Some examples might

3146 include instrumental or electronic music; sound recordings

3147 consisting of nonverbal sounds; audiovisual materials with no

3148 narration, dialog, printed titles, or subtitles; machine-

3149 readable data files consisting of machine languages or

3150 character codes; or programming source code.

3151

3152 * The 'mis' (Uncoded) primary language subtag identifies content

3153 whose language is known but that does not currently have a

3154 corresponding subtag. This subtag SHOULD NOT be used.

3155 Because the addition of other codes in the future can render

3156 its application invalid, it is inherently unstable and hence

3157 incompatible with the stability goals of BCP 47. It is always

3158 preferable to use other subtags: either 'und' or (with prior

3159 agreement) private use subtags.

3160

3161 6. Use variant subtags sparingly and in the correct order. Most

3162 variant subtags have one or more 'Prefix' fields in the registry

3163 that express the list of subtags with which they are appropriate.

3164 Variants SHOULD only be used with subtags that appear in one of

3165 these 'Prefix' fields. If a variant lists a second variant in

3166 one of its 'Prefix' fields, the first variant SHOULD appear

3167 directly after the second variant in any language tag where both

3168 occur. General purpose variants (those with no 'Prefix' fields

3169 at all) SHOULD appear after any other variant subtags. Order any

3170 remaining variants by placing the most significant subtag first.

3171 If none of the subtags is more significant or no relationship can

3172 be determined, alphabetize the subtags. Because variants are

3173 very specialized, using many of them together generally makes the

3174 tag so narrow as to override the additional precision gained.

3175 Putting the subtags into another order interferes with

3176 interoperability, as well as the overall interpretation of the

3177 tag.

3178

3179 For example:

3180

3181 * The tag "en-scotland-fonipa" (English, Scottish dialect, IPA

3182 phonetic transcription) is correctly ordered because

3183 'scotland' has a 'Prefix' of "en", while 'fonipa' has no

3184 'Prefix' field.

3185

3186 * The tag "sl-IT-rozaj-biske-1994" is correctly ordered: 'rozaj'

3187 lists "sl" as its sole 'Prefix'; 'biske' lists "sl-rozaj" as

3188 its sole 'Prefix'. The subtag '1994' has several prefixes,

3194Phillips & Davis Best Current Practice [Page 57]

3195

3196RFC 5646 Language Tags September 2009

3197

3198

3199 including "sl-rozaj". However, it follows both 'rozaj' and

3200 'biske' because one of its 'Prefix' fields is "sl-rozaj-

3201 biske".

3202

3203 7. The grandfathered tag "i-default" (Default Language) was

3204 originally registered according to [RFC1766] to meet the needs of

3205 [RFC2277]. It is not used to indicate a specific language, but

3206 rather to identify the condition or content used where the

3207 language preferences of the user cannot be established. It

3208 SHOULD NOT be used except as a means of labeling the default

3209 content for applications or protocols that require default

3210 language content to be labeled with that specific tag. It MAY

3211 also be used by an application or protocol to identify when the

3212 default language content is being returned.

3213

32144.1.1. Tagging Encompassed Languages

3215

3216 Some primary language records in the registry have a 'Macrolanguage'

3217 field (Section 3.1.10) that contains a mapping from each "encompassed

3218 language" to its macrolanguage. The 'Macrolanguage' mapping doesn't

3219 define what the relationship between the encompassed language and its

3220 macrolanguage is, nor does it define how languages encompassed by the

3221 same macrolanguage are related to each other. Two different

3222 languages encompassed by the same macrolanguage may differ from one

3223 another more than, say, French and Spanish do.

3224

3225 A few specific macrolanguages, such as Chinese ('zh') and Arabic

3226 ('ar'), are handled differently. See Section 4.1.2.

3227

3228 The more specific encompassed language subtag SHOULD be used to form

3229 the language tag, although either the macrolanguage's primary

3230 language subtag or the encompassed language's subtag MAY be used.

3231 This means, for example, tagging Plains Cree with 'crk' rather than

3232 'cr' (Cree), and so forth.

3233

3234 Each macrolanguage subtag's scope, by definition, includes all of its

3235 encompassed languages. Since the relationship between encompassed

3236 languages varies, users cannot assume that the macrolanguage subtag

3237 means any particular encompassed language, nor that any given pair of

3238 encompassed languages are mutually intelligible or otherwise

3239 interchangeable.

3240

3241 Applications MAY use macrolanguage information to improve matching or

3242 language negotiation. For example, the information that 'sr'

3243 (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a

3244 closer relation between those languages than between, say, 'sr'

3245 (Serbian) and 'ma' (Macedonian). However, this relationship is not

3246 guaranteed nor is it exclusive. For example, Romanian ('ro') and

3247

3248

3249

3250Phillips & Davis Best Current Practice [Page 58]

3251

3252RFC 5646 Language Tags September 2009

3253

3254

3255 Moldavian ('mo') do not share a macrolanguage, but are far more

3256 closely related to each other than Cantonese ('yue') and Wu ('wuu'),

3257 which do share a macrolanguage.

3258

32594.1.2. Using Extended Language Subtags

3260

3261 To accommodate language tag forms used prior to the adoption of this

3262 document, language tags provide a special compatibility mechanism:

3263 the extended language subtag. Selected languages have been provided

3264 with both primary and extended language subtags. These include

3265 macrolanguages, such as Malay ('ms') and Uzbek ('uz'), that have a

3266 specific dominant variety that is generally synonymous with the

3267 macrolanguage. Other languages, such as the Chinese ('zh') and

3268 Arabic ('ar') macrolanguages and the various sign languages ('sgn'),

3269 have traditionally used their primary language subtag, possibly

3270 coupled with various region subtags or as part of a registered

3271 grandfathered tag, to indicate the language.

3272

3273 With the adoption of this document, specific ISO 639-3 subtags became

3274 available to identify the languages contained within these diverse

3275 language families or groupings. This presents a choice of language

3276 tags where previously none existed:

3277

3278 o Each encompassed language's subtag SHOULD be used as the primary

3279 language subtag. For example, a document in Mandarin Chinese

3280 would be tagged "cmn" (the subtag for Mandarin Chinese) in

3281 preference to "zh" (Chinese).

3282

3283 o If compatibility is desired or needed, the encompassed subtag MAY

3284 be used as an extended language subtag. For example, a document

3285 in Mandarin Chinese could be tagged "zh-cmn" instead of either

3286 "cmn" or "zh".

3287

3288 o The macrolanguage or prefixing subtag MAY still be used to form

3289 the tag instead of the more specific encompassed language subtag.

3290 That is, tags such as "zh-HK" or "sgn-RU" are still valid.

3291

3292 Chinese ('zh') provides a useful illustration of this. In the past,

3293 various content has used tags beginning with the 'zh' subtag, with

3294 application-specific meaning being associated with region codes,

3295 private use sequences, or grandfathered registered values. This is

3296 because historically only the macrolanguage subtag 'zh' was available

3297 for forming language tags. However, the languages encompassed by the

3298 Chinese subtag 'zh' are, in the main, not mutually intelligible when

3299 spoken, and the written forms of these languages also show wide

3300 variation in form and usage.

3306Phillips & Davis Best Current Practice [Page 59]

3307

3308RFC 5646 Language Tags September 2009

3309

3310

3311 To provide compatibility, Chinese languages encompassed by the 'zh'

3312 subtag are in the registry both as primary language subtags and as

3313 extended language subtags. For example, the ISO 639-3 code for

3314 Cantonese is 'yue'. Content in Cantonese might historically have

3315 used a tag such as "zh-HK" (since Cantonese is commonly spoken in

3316 Hong Kong), although that tag actually means any type of Chinese as

3317 used in Hong Kong. With the availability of ISO 639-3 codes in the

3318 registry, content in Cantonese can be directly tagged using the 'yue'

3319 subtag. The content can use it as a primary language subtag, as in

3320 the tag "yue-HK" (Cantonese, Hong Kong). Or it can use an extended

3321 language subtag with 'zh', as in the tag "zh-yue-Hant" (Chinese,

3322 Cantonese, Traditional script).

3323

3324 As noted above, applications can choose to use the macrolanguage

3325 subtag to form the tag instead of using the more specific encompassed

3326 language subtag. For example, an application with large quantities

3327 of data already using tags with the 'zh' (Chinese) subtag might

3328 continue to use this more general subtag even for new data, even

3329 though the content could be more precisely tagged with 'cmn'

3330 (Mandarin), 'yue' (Cantonese), 'wuu' (Wu), and so on. Similarly, an

3331 application already using tags that start with the 'ar' (Arabic)

3332 subtag might continue to use this more general subtag even for new

3333 data, which could be more precisely tagged with 'arb' (Standard

3334 Arabic).

3335

3336 In some cases, the encompassed languages had tags registered for them

3337 during the RFC 3066 era. Those grandfathered tags not already

3338 deprecated or rendered redundant were deprecated in the registry upon

3339 adoption of this document. As grandfathered values, they remain

3340 valid for use, and some content or applications might use them. As

3341 with other grandfathered tags, since implementations might not be

3342 able to associate the grandfathered tags with the encompassed

3343 language subtag equivalents that are recommended by this document,

3344 implementations are encouraged to canonicalize tags for comparison

3345 purposes. Some examples of this include the tags "zh-hakka" (Hakka)

3346 and "zh-guoyu" (Mandarin or Standard Chinese).

3347

3348 Sign languages share a mode of communication rather than a linguistic

3349 heritage. There are many sign languages that have developed

3350 independently, and the subtag 'sgn' indicates only the presence of a

3351 sign language. A number of sign languages also had grandfathered

3352 tags registered for them during the RFC 3066 era. For example, the

3353 grandfathered tag "sgn-US" was registered to represent 'American Sign

3354 Language' specifically, without reference to the United States. This

3355 is still valid, but deprecated: a document in American Sign Language

3356 can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the

3357 language called 'American Sign Language').

3362Phillips & Davis Best Current Practice [Page 60]

3363

3364RFC 5646 Language Tags September 2009

3365

3366

33674.2. Meaning of the Language Tag

3368

3369 The meaning of a language tag is related to the meaning of the

3370 subtags that it contains. Each subtag, in turn, implies a certain

3371 range of expectations one might have for related content, although it

3372 is not a guarantee. For example, the use of a script subtag such as

3373 'Arab' (Arabic script) does not mean that the content contains only

3374 Arabic characters. It does mean that the language involved is

3375 predominantly in the Arabic script. Thus, a language tag and its

3376 subtags can encompass a very wide range of variation and yet remain

3377 appropriate in each particular instance.

3378

3379 Validity of a tag is not the only factor determining its usefulness.

3380 While every valid tag has a meaning, it might not represent any real-

3381 world language usage. This is unavoidable in a system in which

3382 subtags can be combined freely. For example, tags such as

3383 "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia) or "tlh-

3384 Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA

3385 phonetic transcription) are both valid and unlikely to represent a

3386 useful combination of language attributes.

3387

3388 The meaning of a given tag doesn't depend on the context in which it

3389 appears. The relationship between a tag's meaning and the

3390 information objects to which that tag is applied, however, can vary.

3391

3392 o For a single information object, the associated language tags

3393 might be interpreted as the set of languages that is necessary for

3394 a complete comprehension of the complete object. Example: Plain

3395 text documents.

3396

3397 o For an aggregation of information objects, the associated language

3398 tags could be taken as the set of languages used inside components

3399 of that aggregation. Examples: Document stores and libraries.

3400

3401 o For information objects whose purpose is to provide alternatives,

3402 the associated language tags could be regarded as a hint that the

3403 content is provided in several languages and that one has to

3404 inspect each of the alternatives in order to find its language or

3405 languages. In this case, the presence of multiple tags might not

3406 mean that one needs to be multilingual to get complete

3407 understanding of the document. Example: MIME multipart/

3408 alternative [RFC2046].

3409

3410 o For markup languages, such as HTML and XML, language information

3411 can be added to each part of the document identified by the markup

3412 structure (including the whole document itself). For example, one

3413 could write <span lang="fr">C'est la vie.</span> inside a German

3414 document; the German-speaking user could then access a French-

3415

3416

3417

3418Phillips & Davis Best Current Practice [Page 61]

3419

3420RFC 5646 Language Tags September 2009

3421

3422

3423 German dictionary to find out what the marked section meant. If

3424 the user were listening to that document through a speech

3425 synthesis interface, this formation could be used to signal the

3426 synthesizer to appropriately apply French text-to-speech

3427 pronunciation rules to that span of text, instead of applying the

3428 inappropriate German rules.

3429

3430 o For markup languages and document formats that allow the audience

3431 to be identified, a language tag could indicate the audience(s)

3432 appropriate for that document. For example, the same HTML

3433 document described in the preceding bullet might have an HTTP

3434 header "Content-Language: de" to indicate that the intended

3435 audience for the file is German (even though three words appear

3436 and are identified as being in French within it).

3437

3438 o For systems and APIs, language tags form the basis for most

3439 implementations of locale identifiers. For example, see Unicode's

3440 CLDR (Common Locale Data Repository) (see UTS #35 [UTS35])

3441 project.

3442

3443 Language tags are related when they contain a similar sequence of

3444 subtags. For example, if a language tag B contains language tag A as

3445 a prefix, then B is typically "narrower" or "more specific" than A.

3446 Thus, "zh-Hant-TW" is more specific than "zh-Hant".

3447

3448 This relationship is not guaranteed in all cases: specifically,

3449 languages that begin with the same sequence of subtags are NOT

3450 guaranteed to be mutually intelligible, although they might be. For

3451 example, the tag "az" shares a prefix with both "az-Latn"

3452 (Azerbaijani written using the Latin script) and "az-Cyrl"

3453 (Azerbaijani written using the Cyrillic script). A person fluent in

3454 one script might not be able to read the other, even though the

3455 linguistic content (e.g., what would be heard if both texts were read

3456 aloud) might be identical. Content tagged as "az" most probably is

3457 written in just one script and thus might not be intelligible to a

3458 reader familiar with the other script.

3459

3460 Similarly, not all subtags specify an actual distinction in language.

3461 For example, the tags "en-US" and "en-CA" mean, roughly, English with

3462 features generally thought to be characteristic of the United States

3463 and Canada, respectively. They do not imply that a significant

3464 dialectical boundary exists between any arbitrarily selected point in

3465 the United States and any arbitrarily selected point in Canada.

3466 Neither does a particular region subtag imply that linguistic

3467 distinctions do not exist within that region.

3474Phillips & Davis Best Current Practice [Page 62]

3475

3476RFC 5646 Language Tags September 2009

3477

3478

34794.3. Lists of Languages

3480

3481 In some applications, a single content item might best be associated

3482 with more than one language tag. Examples of such a usage include:

3483

3484 o Content items that contain multiple, distinct varieties. Often

3485 this is used to indicate an appropriate audience for a given

3486 content item when multiple choices might be appropriate. Examples

3487 of this could include:

3488

3489 * Metadata about the appropriate audience for a movie title. For

3490 example, a DVD might label its individual audio tracks 'de'

3491 (German), 'fr' (French), and 'es' (Spanish), but the overall

3492 title would list "de, fr, es" as its overall audience.

3493

3494 * A French/English, English/French dictionary tagged as both "en"

3495 and "fr" to specify that it applies equally to French and

3496 English.

3497

3498 * A side-by-side or interlinear translation of a document, as is

3499 commonly done with classical works in Latin or Greek.

3500

3501 o Content items that contain a single language but that require

3502 multiple levels of specificity. For example, a library might wish

3503 to classify a particular work as both Norwegian ('no') and as

3504 Nynorsk ('nn') for audiences capable of appreciating the

3505 distinction or needing to select content more narrowly.

3506

35074.4. Length Considerations

3508

3509 There is no defined upper limit on the size of language tags. While

3510 historically most language tags have consisted of language and region

3511 subtags with a combined total length of up to six characters, larger

3512 tags have always been both possible and have actually appeared in

3513 use.

3514

3515 Neither the language tag syntax nor other requirements in this

3516 document impose a fixed upper limit on the number of subtags in a

3517 language tag (and thus an upper bound on the size of a tag). The

3518 language tag syntax suggests that, depending on the specific

3519 language, more subtags (and thus a longer tag) are sometimes

3520 necessary to completely identify the language for certain

3521 applications; thus, it is possible to envision long or complex subtag

3522 sequences.

3530Phillips & Davis Best Current Practice [Page 63]

3531

3532RFC 5646 Language Tags September 2009

3533

3534

35354.4.1. Working with Limited Buffer Sizes

3536

3537 Some applications and protocols are forced to allocate fixed buffer

3538 sizes or otherwise limit the length of a language tag. A conformant

3539 implementation or specification MAY refuse to support the storage of

3540 language tags that exceed a specified length. Any such limitation

3541 SHOULD be clearly documented, and such documentation SHOULD include

3542 what happens to longer tags (for example, whether an error value is

3543 generated or the language tag is truncated). A protocol that allows

3544 tags to be truncated at an arbitrary limit, without giving any

3545 indication of what that limit is, has the potential to cause harm by

3546 changing the meaning of tags in substantial ways.

3547

3548 In practice, most language tags do not require more than a few

3549 subtags and will not approach reasonably sized buffer limitations;

3550 see Section 4.1.

3551

3552 Some specifications or protocols have limits on tag length but do not

3553 have a fixed length limitation. For example, [RFC2231] has no

3554 explicit length limitation: the length available for the language tag

3555 is constrained by the length of other header components (such as the

3556 charset's name) coupled with the 76-character limit in [RFC2047].

3557 Thus, the "limit" might be 50 or more characters, but it could

3558 potentially be quite small.

3559

3560 The considerations for assigning a buffer limit are:

3561

3562 Implementations SHOULD NOT truncate language tags unless the

3563 meaning of the tag is purposefully being changed, or unless the

3564 tag does not fit into a limited buffer size specified by a

3565 protocol for storage or transmission.

3566

3567 Implementations SHOULD warn the user when a tag is truncated since

3568 truncation changes the semantic meaning of the tag.

3569

3570 Implementations of protocols or specifications that are space

3571 constrained but do not have a fixed limit SHOULD use the longest

3572 possible tag in preference to truncation.

3573

3574 Protocols or specifications that specify limited buffer sizes for

3575 language tags MUST allow for language tags of at least 35

3576 characters. Note that [RFC4646] recommended a minimum field size

3577 of 42 characters because it included all three elements of the

3578 'extlang' production. Two of these are now permanently reserved,

3579 so a registered primary language subtag of the maximum length of 8

3580 characters is now longer than the longest language-extlang

3581 combination. Protocols or specifications that commonly use

3586Phillips & Davis Best Current Practice [Page 64]

3587

3588RFC 5646 Language Tags September 2009

3589

3590

3591 extensions or private use subtags might wish to reserve or

3592 recommend a longer "minimum buffer" size.

3593

3594 The following illustration shows how the 35-character recommendation

3595 was derived:

3596

3597 language = 8 ; longest allowed registered value

3598 ; longer than primary+extlang

3599 ; which requires 7 characters

3600 script = 5 ; if not suppressed: see Section 4.1

3601 region = 4 ; UN M.49 numeric region code

3602 ; ISO 3166-1 codes require 3

3603 variant1 = 9 ; needs 'language' as a prefix

3604 variant2 = 9 ; very rare, as it needs

3605 ; 'language-variant1' as a prefix

3606

3607 total = 35 characters

3608

3609 Figure 7: Derivation of the Limit on Tag Length

3610

36114.4.2. Truncation of Language Tags

3612

3613 Truncation of a language tag alters the meaning of the tag, and thus

3614 SHOULD be avoided. However, truncation of language tags is sometimes

3615 necessary due to limited buffer sizes. Such truncation MUST NOT

3616 permit a subtag to be chopped off in the middle or the formation of

3617 invalid tags (for example, one ending with the "-" character).

3618

3619 This means that applications or protocols that truncate tags MUST do

3620 so by progressively removing subtags along with their preceding "-"

3621 from the right side of the language tag until the tag is short enough

3622 for the given buffer. If the resulting tag ends with a single-

3623 character subtag, that subtag and its preceding "-" MUST also be

3624 removed. For example:

3625

3626 Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1

3627 1. zh-Latn-CN-variant1-a-extend1-x-wadegile

3628 2. zh-Latn-CN-variant1-a-extend1

3629 3. zh-Latn-CN-variant1

3630 4. zh-Latn-CN

3631 5. zh-Latn

3632 6. zh

3633

3634 Figure 8: Example of Tag Truncation

3642Phillips & Davis Best Current Practice [Page 65]

3643

3644RFC 5646 Language Tags September 2009

3645

3646

36474.5. Canonicalization of Language Tags

3648

3649 Since a particular language tag can be used by many processes,

3650 language tags SHOULD always be created or generated in canonical

3651 form.

3652

3653 A language tag is in 'canonical form' when the tag is well-formed

3654 according to the rules in Sections 2.1 and 2.2 and it has been

3655 canonicalized by applying each of the following steps in order, using

3656 data from the IANA registry (see Section 3.1):

3657

3658 1. Extension sequences are ordered into case-insensitive ASCII order

3659 by singleton subtag.

3660

3661 * For example, the subtag sequence '-a-babble' comes before

3662 '-b-warble'.

3663

3664 2. Redundant or grandfathered tags are replaced by their 'Preferred-

3665 Value', if there is one.

3666

3667 * The field-body of the 'Preferred-Value' for grandfathered and

3668 redundant tags is an "extended language range" [RFC4647] and

3669 might consist of more than one subtag.

3670

3671 * 'Preferred-Value' fields in the registry provide mappings from

3672 deprecated tags to modern equivalents. Many of these were

3673 created before the adoption of this document (such as the

3674 mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others

3675 are the result of later registrations or additions to the

3676 registry as permitted or required by this document (for

3677 example, "zh-hakka" was deprecated in favor of the ISO 639-3

3678 code 'hak' when this document was adopted).

3679

3680 3. Subtags are replaced by their 'Preferred-Value', if there is one.

3681 For extlangs, the original primary language subtag is also

3682 replaced if there is a primary language subtag in the 'Preferred-

3683 Value'.

3684

3685 * The field-body of the 'Preferred-Value' for extlangs is an

3686 "extended language range" and typically maps to a primary

3687 language subtag. For example, the subtag sequence "zh-hak"

3688 (Chinese, Hakka) is replaced with the subtag 'hak' (Hakka).

3689

3690 * Most of the non-extlang subtags are either Region subtags

3691 where the country name or designation has changed or clerical

3692 corrections to ISO 639-1.

3698Phillips & Davis Best Current Practice [Page 66]

3699

3700RFC 5646 Language Tags September 2009

3701

3702

3703 The canonical form contains no 'extlang' subtags. There is an

3704 alternate 'extlang form' that maintains or reinstates extlang

3705 subtags. This form can be useful in environments where the presence

3706 of the 'Prefix' subtag is considered beneficial in matching or

3707 selection (see Section 4.1.2).

3708

3709 A language tag is in 'extlang form' when the tag is well-formed

3710 according to the rules in Sections 2.1 and 2.2 and it has been

3711 processed by applying each of the following two steps in order, using

3712 data from the IANA registry:

3713

3714 1. The language tag is first transformed into canonical form, as

3715 described above.

3716

3717 2. If the language tag starts with a primary language subtag that is

3718 also an extlang subtag, then the language tag is prepended with

3719 the extlang's 'Prefix'.

3720

3721 * For example, "hak-CN" (Hakka, China) has the primary language

3722 subtag 'hak', which in turn has an 'extlang' record with a

3723 'Prefix' 'zh' (Chinese). The extlang form is "zh-hak-CN"

3724 (Chinese, Hakka, China).

3725

3726 * Note that Step 2 (prepending a prefix) can restore a subtag

3727 that was removed by Step 1 (canonicalizing).

3728

3729 Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical

3730 form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially

3731 valid (extensions 'a' and 'b' are not defined as of the publication

3732 of this document) but not in canonical form (the extensions are not

3733 in alphabetical order).

3734

3735 Example: Although the tag "en-BU" (English as used in Burma)

3736 maintains its validity, the language tag "en-BU" is not in canonical

3737 form because the 'BU' subtag has a canonical mapping to 'MM'

3738 (Myanmar).

3739

3740 Canonicalization of language tags does not imply anything about the

3741 use of upper- or lowercase letters when processing or comparing

3742 subtags (and as described in Section 2.1). All comparisons MUST be

3743 performed in a case-insensitive manner.

3744

3745 When performing canonicalization of language tags, processors MAY

3746 regularize the case of the subtags (that is, this process is

3747 OPTIONAL), following the case used in the registry (see

3748 Section 2.1.1).

3754Phillips & Davis Best Current Practice [Page 67]

3755

3756RFC 5646 Language Tags September 2009

3757

3758

3759 If more than one variant appears within a tag, processors MAY reorder

3760 the variants to obtain better matching behavior or more consistent

3761 presentation. Reordering of the variants SHOULD follow the

3762 recommendations for variant ordering in Section 4.1.

3763

3764 If the field 'Deprecated' appears in a registry record without an

3765 accompanying 'Preferred-Value' field, then that tag or subtag is

3766 deprecated without a replacement. These values are canonical when

3767 they appear in a language tag. However, tags that include these

3768 values SHOULD NOT be selected by users or generated by

3769 implementations.

3770

3771 An extension MUST define any relationships that exist between the

3772 various subtags in the extension and thus MAY define an alternate

3773 canonicalization scheme for the extension's subtags. Extensions MAY

3774 define how the order of the extension's subtags is interpreted. For

3775 example, an extension could define that its subtags are in canonical

3776 order when the subtags are placed into ASCII order: that is, "en-a-

3777 aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might

3778 define that the order of the subtags influences their semantic

3779 meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-

3780 aaa-bbb-ccc"). However, extension specifications SHOULD be designed

3781 so that they are tolerant of the typical processes described in

3782 Section 3.7.

3783

37844.6. Considerations for Private Use Subtags

3785

3786 Private use subtags, like all other subtags, MUST conform to the

3787 format and content constraints in the ABNF. Private use subtags have

3788 no meaning outside the private agreement between the parties that

3789 intend to use or exchange language tags that employ them. The same

3790 subtags MAY be used with a different meaning under a separate private

3791 agreement. They SHOULD NOT be used where alternatives exist and

3792 SHOULD NOT be used in content or protocols intended for general use.

3793

3794 Private use subtags are simply useless for information exchange

3795 without prior arrangement. The value and semantic meaning of private

3796 use tags and of the subtags used within such a language tag are not

3797 defined by this document.

3798

3799 Private use sequences introduced by the 'x' singleton are completely

3800 opaque to users or implementations outside of the private use

3801 agreement. So, in addition to private use subtag sequences

3802 introduced by the singleton subtag 'x', the Language Subtag Registry

3803 provides private use language, script, and region subtags derived

3804 from the private use codes assigned by the underlying standards.

3805 These subtags are valid for use in forming language tags; they are

3806 RECOMMENDED over the 'x' singleton private use subtag sequences

3807

3808

3809

3810Phillips & Davis Best Current Practice [Page 68]

3811

3812RFC 5646 Language Tags September 2009

3813

3814

3815 because they convey more information via their linkage to the

3816 language tag's inherent structure.

3817

3818 For example, the region subtags 'AA', 'ZZ', and those in the ranges

3819 'QM'-'QZ' and 'XA'-'XZ' (derived from the ISO 3166-1 private use

3820 codes) can be used to form a language tag. A tag such as

3821 "zh-Hans-XQ" conveys a great deal of public, interchangeable

3822 information about the language material (that it is Chinese in the

3823 simplified Chinese script and is suitable for some geographic region

3824 'XQ'). While the precise geographic region is not known outside of

3825 private agreement, the tag conveys far more information than an

3826 opaque tag such as "x-somelang" or even "zh-Hans-x-xq" (where the

3827 'xq' subtag's meaning is entirely opaque).

3828

3829 However, in some cases content tagged with private use subtags can

3830 interact with other systems in a different and possibly unsuitable

3831 manner compared to tags that use opaque, privately defined subtags,

3832 so the choice of the best approach sometimes depends on the

3833 particular domain in question.

3834

38355. IANA Considerations

3836

3837 This section deals with the processes and requirements necessary for

3838 IANA to maintain the subtag and extension registries as defined by

3839 this document and in accordance with the requirements of [RFC5226].

3840

3841 The impact on the IANA maintainers of the two registries defined by

3842 this document will be a small increase in the frequency of new

3843 entries or updates. IANA also is required to create a new mailing

3844 list (described below in Section 5.1) to announce registry changes

3845 and updates.

3846

38475.1. Language Subtag Registry

3848

3849 IANA updated the registry using instructions and content provided in

3850 a companion document [RFC5645]. The criteria and process for

3851 selecting the updated set of records are described in that document.

3852 The updated set of records represents no impact on IANA, since the

3853 work to create it will be performed externally.

3854

3855 Future work on the Language Subtag Registry includes the following

3856 activities:

3857

3858 o Inserting or replacing whole records. These records are

3859 preformatted for IANA by the Language Subtag Reviewer, as

3860 described in Section 3.3.

3861

3862 o Archiving and making publicly available the registration forms.

3863

3864

3865

3866Phillips & Davis Best Current Practice [Page 69]

3867

3868RFC 5646 Language Tags September 2009

3869

3870

3871 o Announcing each updated version of the registry on the

3872 "ietf-languages-announcements@iana.org" mailing list.

3873

3874 Each registration form sent to IANA contains a single record for

3875 incorporation into the registry. The form will be sent to

3876 <iana@iana.org> by the Language Subtag Reviewer. It will have a

3877 subject line indicating whether the enclosed form represents an

3878 insertion of a new record (indicated by the word "INSERT" in the

3879 subject line) or a replacement of an existing record (indicated by

3880 the word "MODIFY" in the subject line). At no time can a record be

3881 deleted from the registry.

3882

3883 IANA will extract the record from the form and place the inserted or

3884 modified record into the appropriate section of the Language Subtag

3885 Registry, grouping the records by their 'Type' field. Inserted

3886 records can be placed anywhere within the appropriate section; there

3887 is no guarantee that the registry's records will be placed in any

3888 particular order except that they will always be grouped by 'Type'.

3889 Modified records overwrite the record they replace.

3890

3891 Whenever an entry is created or modified in the registry, the 'File-

3892 Date' record at the start of the registry is updated to reflect the

3893 most recent modification date. The date format SHALL be the "full-

3894 date" format of [RFC3339]. The date SHALL be the date on which that

3895 version of the registry was first published by IANA. There SHALL be

3896 at most one version of the registry published in a day. A 'File-

3897 Date' record is also included in each request to IANA to insert or

3898 modify records, indicating the acceptance date of the records in the

3899 request.

3900

3901 The updated registry file MUST use the UTF-8 character encoding, and

3902 IANA MUST check the registry file for proper encoding. Non-ASCII

3903 characters can be sent to IANA by attaching the registration form to

3904 the email message or by using various encodings in the mail message

3905 body (UTF-8 is recommended). IANA will verify any unclear or

3906 corrupted characters with the Language Subtag Reviewer prior to

3907 posting the updated registry.

3908

3909 IANA will also archive and make publicly available from

3910 http://www.iana.org each registration form. Note that multiple

3911 registrations can pertain to the same record in the registry.

3912

3913 Developers who are dependent upon the Language Subtag Registry

3914 sometimes would like to be informed of changes in the registry so

3915 that they can update their implementations. When any change is made

3916 to the Language Subtag Registry, IANA will send an announcement

3917 message to <ietf-languages-announcements@iana.org> (a self-

3918 subscribing list to which only IANA can post).

3919

3920

3921

3922Phillips & Davis Best Current Practice [Page 70]

3923

3924RFC 5646 Language Tags September 2009

3925

3926

39275.2. Extensions Registry

3928

3929 The Language Tag Extensions Registry can contain at most 35 records,

3930 and thus changes to this registry are expected to be very infrequent.

3931

3932 Future work by IANA on the Language Tag Extensions Registry is

3933 limited to two cases. First, the IESG MAY request that new records

3934 be inserted into this registry from time to time. These requests

3935 MUST include the record to insert in the exact format described in

3936 Section 3.7. In addition, there MAY be occasional requests from the

3937 maintaining authority for a specific extension to update the contact

3938 information or URLs in the record. These requests MUST include the

3939 complete, updated record. IANA is not responsible for validating the

3940 information provided, only that it is properly formatted. IANA

3941 SHOULD take reasonable steps to ascertain that the request comes from

3942 the maintaining authority named in the record present in the

3943 registry.

3944

39456. Security Considerations

3946

3947 Language tags used in content negotiation, like any other information

3948 exchanged on the Internet, might be a source of concern because they

3949 might be used to infer the nationality of the sender, and thus

3950 identify potential targets for surveillance.

3951

3952 This is a special case of the general problem that anything sent is

3953 visible to the receiving party and possibly to third parties as well.

3954 It is useful to be aware that such concerns can exist in some cases.

3955

3956 The evaluation of the exact magnitude of the threat, and any possible

3957 countermeasures, is left to each application protocol (see BCP 72

3958 [RFC3552] for best current practice guidance on security threats and

3959 defenses).

3960

3961 The language tag associated with a particular information item is of

3962 no consequence whatsoever in determining whether that content might

3963 contain possible homographs. The fact that a text is tagged as being

3964 in one language or using a particular script subtag provides no

3965 assurance whatsoever that it does not contain characters from scripts

3966 other than the one(s) associated with or specified by that language

3967 tag.

3968

3969 Since there is no limit to the number of variant, private use, and

3970 extension subtags, and consequently no limit on the possible length

3971 of a tag, implementations need to guard against buffer overflow

3972 attacks. See Section 4.4 for details on language tag truncation,

3973 which can occur as a consequence of defenses against buffer overflow.

3978Phillips & Davis Best Current Practice [Page 71]

3979

3980RFC 5646 Language Tags September 2009

3981

3982

3983 To prevent denial-of-service attacks, applications SHOULD NOT depend

3984 on either the Language Subtag Registry or the Language Tag Extensions

3985 Registry being always accessible. Additionally, although the

3986 specification of valid subtags for an extension (see Section 3.7)

3987 MUST be available over the Internet, implementations SHOULD NOT

3988 mechanically depend on those sources being always accessible.

3989

3990 The registries specified in this document are not suitable for

3991 frequent or real-time access to, or retrieval of, the full registry

3992 contents. Most applications do not need registry data at all. For

3993 others, being able to validate or canonicalize language tags as of a

3994 particular registry date will be sufficient, as the registry contents

3995 change only occasionally. Changes are announced to

3996 <ietf-languages-announcements@iana.org>. This mailing list is

3997 intended for interested organizations and individuals, not for bulk

3998 subscription to trigger automatic software updates. The size of the

3999 registry makes it unsuitable for automatic software updates.

4000 Implementers considering integrating the Language Subtag Registry in

4001 an automatic updating scheme are strongly advised to distribute only

4002 suitably encoded differences, and only via their own infrastructure

4003 -- not directly from IANA.

4004

4005 Changes, or the absence thereof, can also easily be detected by

4006 looking at the 'File-Date' record at the start of the registry, or by

4007 using features of the protocol used for downloading, without having

4008 to download the full registry. At the time of publication of this

4009 document, IANA is making the Language Tag Registry available over

4010 HTTP 1.1. The proper way to update a local copy of the Language

4011 Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616].

4012

40137. Character Set Considerations

4014

4015 The syntax in this document requires that language tags use only the

4016 characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most

4017 character sets, so the composition of language tags shouldn't have

4018 any character set issues.

4019

4020 The rendering of text based on the language tag is not addressed

4021 here. Historically, some processes have relied on the use of

4022 character set/encoding information (or other external information) in

4023 order to infer how a specific string of characters should be

4024 rendered. Notably, this applies to language- and culture-specific

4025 variations of Han ideographs as used in Japanese, Chinese, and

4026 Korean, where use of, for example, a Japanese character encoding such

4027 as EUC-JP implies that the text itself is in Japanese. When language

4028 tags are applied to spans of text, rendering engines might be able to

4029 use that information to better select fonts or make other rendering

4034Phillips & Davis Best Current Practice [Page 72]

4035

4036RFC 5646 Language Tags September 2009

4037

4038

4039 choices, particularly where languages with distinct writing

4040 traditions use the same characters.

4041

40428. Changes from RFC 4646

4043

4044 The main goal for this revision of RFC 4646 was to incorporate two

4045 new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their attendant

4046 sets of language codes into the IANA Language Subtag Registry. This

4047 permits the identification of many more languages and language

4048 collections than previously supported.

4049

4050 The specific changes in this document to meet these goals are:

4051

4052 o Defined the incorporation of ISO 639-3 and ISO 639-5 codes for use

4053 as primary and extended language subtags. It also permanently

4054 reserves and disallows the use of additional 'extlang' subtags.

4055 The changes necessary to achieve this were:

4056

4057 * Modified the ABNF comments.

4058

4059 * Updated various registration and stability requirements

4060 sections to reference ISO 639-3 and ISO 639-5 in addition to

4061 ISO 639-1 and ISO 639-2.

4062

4063 * Edited the text to eliminate references to extended language

4064 subtags where they are no longer used.

4065

4066 * Explained the change in the section on extended language

4067 subtags.

4068

4069 o Changed the ABNF related to grandfathered tags. The irregular

4070 tags are now listed. Well-formed grandfathered tags are now

4071 described by the 'langtag' production, and the 'grandfathered'

4072 production was removed as a result. Also: added description of

4073 both types of grandfathered tags to Section 2.2.8.

4074

4075 o Added the paragraph on "collections" to Section 4.1.

4076

4077 o Changed the capitalization rules for 'Tag' fields in Section 3.1.

4078

4079 o Split Section 3.1 up into subsections.

4080

4081 o Modified Section 3.5 to allow 'Suppress-Script' fields to be

4082 added, modified, or removed via the registration process. This

4083 was an erratum from RFC 4646.

4084

4085 o Modified examples that used region code 'CS' (formerly Serbia and

4086 Montenegro) to use 'RS' (Serbia) instead.

4087

4088

4089

4090Phillips & Davis Best Current Practice [Page 73]

4091

4092RFC 5646 Language Tags September 2009

4093

4094

4095 o Modified the rules for creating and maintaining record

4096 'Description' fields to prevent duplicates, including inverted

4097 duplicates.

4098

4099 o Removed the lengthy description of why RFC 4646 was created from

4100 this section, which also caused the removal of the reference to

4101 XML Schema.

4102

4103 o Modified the text in Section 2.1 to place more emphasis on the

4104 fact that language tags are not case sensitive.

4105

4106 o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"

4107 and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the

4108 'Suppress-Script' on 'Latn' with 'fr'.

4109

4110 o Changed the requirements for well-formedness to make singleton

4111 repetition checking optional (it is required for validity

4112 checking) in Section 2.2.9.

4113

4114 o Changed the text in Section 2.2.9 referring to grandfathered

4115 checking to note that the list is now included in the ABNF.

4116

4117 o Modified and added text to Section 3.2. The job description was

4118 placed first. A note was added making clear that the Language

4119 Subtag Reviewer may delegate various non-critical duties,

4120 including list moderation. Finally, additional text was added to

4121 make the appointment process clear and to clarify that decisions

4122 and performance of the reviewer are appealable.

4123

4124 o Added text to Section 3.5 clarifying that the

4125 ietf-languages@iana.org list is operated by whomever the IESG

4126 appoints.

4127

4128 o Added text to Section 3.1.5 clarifying that the first Description

4129 in a 'language' record matches the corresponding Reference Name

4130 for the language in ISO 639-3.

4131

4132 o Modified Section 2.2.9 to define classes of conformance related to

4133 specific tags (formerly 'well-formed' and 'valid' referred to

4134 implementations). Notes were added about the removal of 'extlang'

4135 from the ABNF provided in RFC 4646, allowing for well-formedness

4136 using this older definition. Reference to RFC 3066 well-

4137 formedness was also added.

4138

4139 o Added text to the end of Section 3.1.2 noting that future versions

4140 of this document might add new field types to the registry format

4141 and recommending that implementations ignore any unrecognized

4142 fields.

4143

4144

4145

4146Phillips & Davis Best Current Practice [Page 74]

4147

4148RFC 5646 Language Tags September 2009

4149

4150

4151 o Added text about what the lack of a 'Suppress-Script' field means

4152 in a record to Section 3.1.9.

4153

4154 o Added text allowing the correction of misspellings and typographic

4155 errors to Section 3.1.5.

4156

4157 o Added text to Section 3.1.8 disallowing 'Prefix' field conflicts

4158 (such as circular prefix references).

4159

4160 o Modified text in Section 3.5 to require the subtag reviewer to

4161 announce his/her decision (or extension) following the two-week

4162 period. Also clarified that any decision or failure to decide can

4163 be appealed.

4164

4165 o Modified text in Section 4.1 to include the (heretofore anecdotal)

4166 guiding principle of tag choice, and clarifying the non-use of

4167 script subtags in non-written applications.

4168

4169 o Prohibited multiple use of the same variant in a tag (i.e., "de-

4170 1901-1901"). Previously, this was only a recommendation

4171 ("SHOULD").

4172

4173 o Removed inappropriate [RFC2119] language from the illustration in

4174 Section 4.4.1.

4175

4176 o Replaced the example of deprecating "zh-guoyu" with "zh-

4177 hakka"->"hak" in Section 4.5, noting that it was this document

4178 that caused the change.

4179

4180 o Replaced the section in Section 4.1 dealing with "mul"/"und" to

4181 include the subtags 'zxx' and 'mis', as well as the tag

4182 "i-default". A normative reference to RFC 2277 was added.

4183

4184 o Added text to Section 3.5 clarifying that any modifications of a

4185 registration request must be sent to the <ietf-languages@iana.org>

4186 list before submission to IANA.

4187

4188 o Changed the ABNF for the record-jar format from using the LWSP

4189 production to use a folding whitespace production similar to obs-

4190 FWS in [RFC5234]. This effectively prevents unintentional blank

4191 lines inside a field.

4192

4193 o Clarified and revised text in Sections 3.3, 3.5, and 5.1 to

4194 clarify that the Language Subtag Reviewer sends the complete

4195 registration forms to IANA, that IANA extracts the record from the

4196 form, and that the forms must also be archived separately from the

4197 registry.

4202Phillips & Davis Best Current Practice [Page 75]

4203

4204RFC 5646 Language Tags September 2009

4205

4206

4207 o Added text to Section 5 requiring IANA to send an announcement to

4208 an ietf-languages-announcements list whenever the registry is

4209 updated.

4210

4211 o Modification of the registry to use UTF-8 as its character

4212 encoding. This also entails additional instructions to IANA and

4213 the Language Subtag Reviewer in the registration process.

4214

4215 o Modified the rules in Section 2.2.4 so that "exceptionally

4216 reserved" ISO 3166-1 codes other than 'UK' were included into the

4217 registry. In particular, this allows the code 'EU' (European

4218 Union) to be used to form language tags or (more commonly) for

4219 applications that use the registry for region codes to reference

4220 this subtag.

4221

4222 o Modified the IANA considerations section (Section 5) to remove

4223 unnecessary normative [RFC2119] language.

4224

42259. References

4226

42279.1. Normative References

4228

4229 [ISO15924] International Organization for Standardization, "ISO

4230 15924:2004. Information and documentation -- Codes

4231 for the representation of names of scripts",

4232 January 2004.

4233

4234 [ISO3166-1] International Organization for Standardization, "ISO

4235 3166-1:2006. Codes for the representation of names

4236 of countries and their subdivisions -- Part 1:

4237 Country codes", November 2006.

4238

4239 [ISO639-1] International Organization for Standardization, "ISO

4240 639-1:2002. Codes for the representation of names

4241 of languages -- Part 1: Alpha-2 code", July 2002.

4242

4243 [ISO639-2] International Organization for Standardization, "ISO

4244 639-2:1998. Codes for the representation of names

4245 of languages -- Part 2: Alpha-3 code", October 1998.

4246

4247 [ISO639-3] International Organization for Standardization, "ISO

4248 639-3:2007. Codes for the representation of names

4249 of languages - Part 3: Alpha-3 code for

4250 comprehensive coverage of languages", February 2007.

4258Phillips & Davis Best Current Practice [Page 76]

4259

4260RFC 5646 Language Tags September 2009

4261

4262

4263 [ISO639-5] International Organization for Standardization, "ISO

4264 639-5:2008. Codes for the representation of names of

4265 languages -- Part 5: Alpha-3 code for language

4266 families and groups", May 2008.

4267

4268 [ISO646] International Organization for Standardization,

4269 "ISO/IEC 646:1991, Information technology -- ISO

4270 7-bit coded character set for information

4271 interchange.", 1991.

4272

4273 [RFC2026] Bradner, S., "The Internet Standards Process --

4274 Revision 3", BCP 9, RFC 2026, October 1996.

4275

4276 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate

4277 Requirement Levels", BCP 14, RFC 2119, March 1997.

4278

4279 [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and

4280 Languages", BCP 18, RFC 2277, January 1998.

4281

4282 [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the

4283 Internet: Timestamps", RFC 3339, July 2002.

4284

4285 [RFC4647] Phillips, A. and M. Davis, "Matching of Language

4286 Tags", BCP 47, RFC 4647, September 2006.

4287

4288 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for

4289 Writing an IANA Considerations Section in RFCs",

4290 BCP 26, RFC 5226, May 2008.

4291

4292 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for

4293 Syntax Specifications: ABNF", STD 68, RFC 5234,

4294 January 2008.

4295

4296 [SpecialCasing] The Unicode Consoritum, "Unicode Character Database,

4297 Special Casing Properties", March 2008, <http://

4298 unicode.org/Public/UNIDATA/SpecialCasing.txt>.

4299

4300 [UAX14] Freitag, A., "Unicode Standard Annex #14: Line

4301 Breaking Properties", August 2006,

4302 <http://www.unicode.org/reports/tr14/>.

4303

4304 [UN_M.49] Statistics Division, United Nations, "Standard

4305 Country or Area Codes for Statistical Use", Revision

4306 4 (United Nations publication, Sales No. 98.XVII.9,

4307 June 1999.

4314Phillips & Davis Best Current Practice [Page 77]

4315

4316RFC 5646 Language Tags September 2009

4317

4318

4319 [Unicode] Unicode Consortium, "The Unicode Consortium. The

4320 Unicode Standard, Version 5.0, (Boston, MA, Addison-

4321 Wesley, 2003. ISBN 0-321-49081-0)", January 2007.

4322

43239.2. Informative References

4324

4325 [CLDR] "The Common Locale Data Repository Project",

4326 <http://cldr.unicode.org>.

4327

4328 [RFC1766] Alvestrand, H., "Tags for the Identification of

4329 Languages", RFC 1766, March 1995.

4330

4331 [RFC2028] Hovey, R. and S. Bradner, "The Organizations

4332 Involved in the IETF Standards Process", BCP 11,

4333 RFC 2028, October 1996.

4334

4335 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet

4336 Mail Extensions (MIME) Part Two: Media Types",

4337 RFC 2046, November 1996.

4338

4339 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail

4340 Extensions) Part Three: Message Header Extensions

4341 for Non-ASCII Text", RFC 2047, November 1996.

4342

4343 [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and

4344 Encoded Word Extensions:

4345 Character Sets, Languages, and Continuations",

4346 RFC 2231, November 1997.

4347

4348 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,

4349 Masinter, L., Leach, P., and T. Berners-Lee,

4350 "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,

4351 June 1999.

4352

4353 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of

4354 ISO 10646", RFC 2781, February 2000.

4355

4356 [RFC3066] Alvestrand, H., "Tags for the Identification of

4357 Languages", RFC 3066, January 2001.

4358

4359 [RFC3282] Alvestrand, H., "Content Language Headers",

4360 RFC 3282, May 2002.

4361

4362 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing

4363 RFC Text on Security Considerations", BCP 72,

4364 RFC 3552, July 2003.

4370Phillips & Davis Best Current Practice [Page 78]

4371

4372RFC 5646 Language Tags September 2009

4373

4374

4375 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO

4376 10646", STD 63, RFC 3629, November 2003.

4377

4378 [RFC4645] Ewell, D., "Initial Language Subtag Registry",

4379 RFC 4645, September 2006.

4380

4381 [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying

4382 Languages", BCP 47, RFC 4646, September 2006.

4383

4384 [RFC5645] Ewell, D., Ed., "Update to the Language Subtag

4385 Registry", September 2009.

4386

4387 [UTS35] Davis, M., "Unicode Technical Standard #35: Locale

4388 Data Markup Language (LDML)", December 2007,

4389 <http://www.unicode.org/reports/tr35/>.

4390

4391 [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint

4392 Advisory Committee: Working principles for ISO 639

4393 maintenance", March 2000, <http://www.loc.gov/

4394 standards/iso639-2/iso639jac_n3r.html>.

4395

4396 [record-jar] Raymond, E., "The Art of Unix Programming", 2003,

4397 <urn:isbn:0-13-142901-9>.

4426Phillips & Davis Best Current Practice [Page 79]

4427

4428RFC 5646 Language Tags September 2009

4429

4430

4431Appendix A. Examples of Language Tags (Informative)

4432

4433 Simple language subtag:

4435 de (German)

4437 fr (French)

4439 ja (Japanese)

4441 i-enochian (example of a grandfathered tag)

4442

4443 Language subtag plus Script subtag:

4444

4445 zh-Hant (Chinese written using the Traditional Chinese script)

4446

4447 zh-Hans (Chinese written using the Simplified Chinese script)

4448

4449 sr-Cyrl (Serbian written using the Cyrillic script)

4450

4451 sr-Latn (Serbian written using the Latin script)

4452

4453 Extended language subtags and their primary language subtag

4454 counterparts:

4455

4456 zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in

4457 China)

4458

4459 cmn-Hans-CN (Mandarin Chinese, Simplified script, as used in

4460 China)

4461

4462 zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR)

4463

4464 yue-HK (Cantonese Chinese, as used in Hong Kong SAR)

4465

4466 Language-Script-Region:

4467

4468 zh-Hans-CN (Chinese written using the Simplified script as used in

4469 mainland China)

4470

4471 sr-Latn-RS (Serbian written using the Latin script as used in

4472 Serbia)

4482Phillips & Davis Best Current Practice [Page 80]

4483

4484RFC 5646 Language Tags September 2009

4485

4486

4487 Language-Variant:

4488

4489 sl-rozaj (Resian dialect of Slovenian)

4490

4491 sl-rozaj-biske (San Giorgio dialect of Resian dialect of

4492 Slovenian)

4493

4494 sl-nedis (Nadiza dialect of Slovenian)

4495

4496 Language-Region-Variant:

4497

4498 de-CH-1901 (German as used in Switzerland using the 1901 variant

4499 [orthography])

4500

4501 sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)

4502

4503 Language-Script-Region-Variant:

4504

4505 hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as

4506 used in Italy)

4507

4508 Language-Region:

4509

4510 de-DE (German for Germany)

4511

4512 en-US (English as used in the United States)

4513

4514 es-419 (Spanish appropriate for the Latin America and Caribbean

4515 region using the UN region code)

4516

4517 Private use subtags:

4518

4519 de-CH-x-phonebk

4520

4521 az-Arab-x-AZE-derbend

4522

4523 Private use registry values:

4524

4525 x-whatever (private use using the singleton 'x')

4526

4527 qaa-Qaaa-QM-x-southern (all private tags)

4528

4529 de-Qaaa (German, with a private script)

4530

4531 sr-Latn-QM (Serbian, Latin script, private region)

4532

4533 sr-Qaaa-RS (Serbian, private script, for Serbia)

4538Phillips & Davis Best Current Practice [Page 81]

4539

4540RFC 5646 Language Tags September 2009

4541

4542

4543 Tags that use extensions (examples ONLY -- extensions MUST be defined

4544 by revision or update to this document, or by RFC):

4545

4546 en-US-u-islamcal

4547

4548 zh-CN-a-myext-x-private

4549

4550 en-a-myext-b-another

4551

4552 Some Invalid Tags:

4553

4554 de-419-DE (two region tags)

4555

4556 a-DE (use of a single-character subtag in primary position; note

4557 that there are a few grandfathered tags that start with "i-" that

4558 are valid)

4559

4560 ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter

4561 prefix)

4562

4563Appendix B. Examples of Registration Forms

4564

4565 LANGUAGE SUBTAG REGISTRATION FORM

4566

4567 1. Name of requester: Han Steenwijk

4568 2. E-mail address of requester: han.steenwijk @ unipd.it

4569 3. Record Requested:

4570

4571 Type: variant

4572 Subtag: biske

4573 Description: The San Giorgio dialect of Resian

4574 Description: The Bila dialect of Resian

4575 Prefix: sl-rozaj

4576 Comments: The dialect of San Giorgio/Bila is one of the

4577 four major local dialects of Resian

4578

4579 4. Intended meaning of the subtag:

4580

4581 The local variety of Resian as spoken in San Giorgio/Bila

4582

4583 5. Reference to published description of the language (book or

4584 article):

4585

4586 -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich

4587 govorov, Varsava - Peterburg: Vende - Kozancikov, 1875.

4594Phillips & Davis Best Current Practice [Page 82]

4595

4596RFC 5646 Language Tags September 2009

4597

4598

4599 LANGUAGE SUBTAG REGISTRATION FORM

4600

4601 1. Name of requester: Jaska Zedlik

4602 2. E-mail address of requester: jz53 @ zedlik.com

4603 3. Record Requested:

4604

4605 Type: variant

4606 Subtag: tarask

4607 Description: Belarusian in Taraskievica orthography

4608 Prefix: be

4609 Comments: The subtag represents Branislau Taraskievic's Belarusian

4610 orthography as published in "Bielaruski klasycny pravapis" by

4611 Juras Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka

4612 (Vilnia-Miensk 2005).

4613

4614 4. Intended meaning of the subtag:

4615

4616 The subtag is intended to represent the Belarusian orthography as

4617 published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk

4618 Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005).

4619

4620 5. Reference to published description of the language (book or

4621 article):

4622

4623 Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd.

4624 "Bielaruskaha kamitetu", 1929, 5th edition.

4625

4626 Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier.

4627 Bielaruski klasycny pravapis. Vilnia-Miensk, 2005.

4628

4629 6. Any other relevant information:

4630

4631 Belarusian in Taraskievica orthography became widely used, especially

4632 in Belarusian-speaking Internet segment, but besides this some books

4633 and newspapers are also printed using this orthography of Belarusian.

4634

4635Appendix C. Acknowledgements

4636

4637 Any list of contributors is bound to be incomplete; please regard the

4638 following as only a selection from the group of people who have

4639 contributed to make this document what it is today.

4640

4641 The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the

4642 precursors of this document, made enormous contributions directly or

4643 indirectly to this document and are generally responsible for the

4644 success of language tags.

4650Phillips & Davis Best Current Practice [Page 83]

4651

4652RFC 5646 Language Tags September 2009

4653

4654

4655 The following people contributed to this document:

4656

4657 Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,

4658 Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion

4659 Gunn, Alfred Hoenes, Kent Karlsson, Chris Newman, Randy Presuhn,

4660 Stephen Silver, Shawn Steele, and many, many others.

4661

4662 Very special thanks must go to Harald Tveit Alvestrand, who

4663 originated RFCs 1766 and 3066, and without whom this document would

4664 not have been possible.

4665

4666 Special thanks go to Michael Everson, who served as the Language Tag

4667 Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as

4668 the Language Subtag Reviewer since the adoption of RFC 4646.

4669

4670 Special thanks also go to Doug Ewell, for his production of the first

4671 complete subtag registry, his work to support and maintain new

4672 registrations, and his careful editorship of both RFC 4645 and

4673 [RFC5645].

4674

4675Authors' Addresses

4676

4677 Addison Phillips (editor)

4678 Lab126

4679

4680 EMail: addison@inter-locale.com

4681 URI: http://www.inter-locale.com

4682

4683

4684 Mark Davis (editor)

4685 Google

4686

4687 EMail: markdavis@google.com

4706Phillips & Davis Best Current Practice [Page 84]

4707

4708