1
2
3
4
5
6
7Internet Engineering Task Force (IETF) A. Yang
8Request for Comments: 6532 TWNIC
9Obsoletes: 5335 S. Steele
10Updates: 2045 Microsoft
11Category: Standards Track N. Freed
12ISSN: 2070-1721 Oracle
13 February 2012
14
15
16 Internationalized Email Headers
17
18Abstract
19
20 Internet mail was originally limited to 7-bit ASCII. MIME added
21 support for the use of 8-bit character sets in body parts, and also
22 defined an encoded-word construct so other character sets could be
23 used in certain header field values. However, full
24 internationalization of electronic mail requires additional
25 enhancements to allow the use of Unicode, including characters
26 outside the ASCII repertoire, in mail addresses as well as direct use
27 of Unicode in header fields like "From:", "To:", and "Subject:",
28 without requiring the use of complex encoded-word constructs. This
29 document specifies an enhancement to the Internet Message Format and
30 to MIME that allows use of Unicode in mail addresses and most header
31 field content.
32
33 This specification updates Section 6.4 of RFC 2045 to eliminate the
34 restriction prohibiting the use of non-identity content-transfer-
35 encodings on subtypes of "message/".
36
37Status of This Memo
38
39 This is an Internet Standards Track document.
40
41 This document is a product of the Internet Engineering Task Force
42 (IETF). It represents the consensus of the IETF community. It has
43 received public review and has been approved for publication by the
44 Internet Engineering Steering Group (IESG). Further information on
45 Internet Standards is available in Section 2 of RFC 5741.
46
47 Information about the current status of this document, any errata,
48 and how to provide feedback on it may be obtained at
49 http://www.rfc-editor.org/info/rfc6532.
50
51
52
53
54
55
56
57
58Yang, et al. Standards Track [Page 1]
59
60RFC 6532 Internationalized Email Headers February 2012
61
62
63Copyright Notice
64
65 Copyright (c) 2012 IETF Trust and the persons identified as the
66 document authors. All rights reserved.
67
68 This document is subject to BCP 78 and the IETF Trust's Legal
69 Provisions Relating to IETF Documents
70 (http://trustee.ietf.org/license-info) in effect on the date of
71 publication of this document. Please review these documents
72 carefully, as they describe your rights and restrictions with respect
73 to this document. Code Components extracted from this document must
74 include Simplified BSD License text as described in Section 4.e of
75 the Trust Legal Provisions and are provided without warranty as
76 described in the Simplified BSD License.
77
78Table of Contents
79
80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
81 2. Terminology Used in This Specification . . . . . . . . . . . . 3
82 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4
83 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4
84 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5
85 3.3. Use of 8-bit UTF-8 in Message-IDs . . . . . . . . . . . . 5
86 3.4. Effects on Line Length Limits . . . . . . . . . . . . . . 5
87 3.5. Changes to MIME Message Type Encoding Restrictions . . . . 6
88 3.6. Use of MIME Encoded-Words . . . . . . . . . . . . . . . . 6
89 3.7. The message/global Media Type . . . . . . . . . . . . . . 7
90 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
91 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
92 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
93 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
94 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
95 7.2. Informative References . . . . . . . . . . . . . . . . . . 10
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114Yang, et al. Standards Track [Page 2]
115
116RFC 6532 Internationalized Email Headers February 2012
117
118
1191. Introduction
120
121 Internet mail distinguishes a message from its transport and further
122 divides a message between a header and a body [RFC5322]. Internet
123 mail header field values contain a variety of strings that are
124 intended to be user-visible. The range of supported characters for
125 these strings was originally limited to [ASCII] in 7-bit form. MIME
126 [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional
127 character sets, but this support is limited to body part data and to
128 special encoded-word constructs that were only allowed in a limited
129 number of places in header field values.
130
131 Globalization of the Internet requires support of the much larger set
132 of characters provided by Unicode [RFC5198] in both mail addresses
133 and most header field values. Additionally, complex encoding schemes
134 like encoded-words introduce inefficiencies as well as significant
135 opportunities for processing errors. And finally, native support for
136 the UTF-8 charset is now available on most systems. Hence, it is
137 strongly desirable for Internet mail to support UTF-8 [RFC3629]
138 directly.
139
140 This document specifies an enhancement to the Internet Message Format
141 [RFC5322] and to MIME that permits the direct use of UTF-8, rather
142 than only ASCII, in header field values, including mail addresses. A
143 new media type, message/global, is defined for messages that use this
144 extended format. This specification also lifts the MIME restriction
145 on having non-identity content-transfer-encodings on any subtype of
146 the message top-level type so that message/global parts can be safely
147 transmitted across existing mail infrastructure.
148
149 This specification is based on a model of native, end-to-end support
150 for UTF-8, which depends on having an "8-bit-clean" environment
151 assured by the transport system. Support for carriage across legacy,
152 7-bit infrastructure and for processing by 7-bit receivers requires
153 additional mechanisms that are not provided by these specifications.
154
155 This specification is a revision of and replacement for [RFC5335].
156 Section 6 of [RFC6530] describes the change in approach between this
157 specification and the previous version.
158
1592. Terminology Used in This Specification
160
161 A plain ASCII string is fully compatible with [RFC5321] and
162 [RFC5322]. In this document, non-ASCII strings are UTF-8 strings if
163 they are in header field values that contain at least one
164 <UTF8-non-ascii> (see Section 3.1).
165
166
167
168
169
170Yang, et al. Standards Track [Page 3]
171
172RFC 6532 Internationalized Email Headers February 2012
173
174
175 Unless otherwise noted, all terms used here are defined in [RFC5321],
176 [RFC5322], [RFC6530], or [RFC6531].
177
178 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
179 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
180 document are to be interpreted as described in [RFC2119].
181
182 The term "8-bit" means octets are present in the data with values
183 above 0x7F.
184
1853. Changes to Message Header Fields
186
187 To permit non-ASCII Unicode characters in field values, the header
188 definition in [RFC5322] is extended to support the new format. The
189 following sections specify the necessary changes to RFC 5322's ABNF.
190
191 The syntax rules not mentioned below remain defined as in [RFC5322].
192
193 Note that this protocol does not change rules in RFC 5322 for 5322:1689 ../dkim/dkim.go:830
194 defining header field names. The bodies of header fields are allowed
195 to contain Unicode characters, but the header field names themselves
196 must consist of ASCII characters only.
197
198 Also note that messages in this format require the use of the
199 SMTPUTF8 extension [RFC6531] to be transferred via SMTP.
200
2013.1. UTF-8 Syntax and Normalization
202
203 UTF-8 characters can be defined in terms of octets using the
204 following ABNF [RFC5234], taken from [RFC3629]:
205
206 UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
207
208 UTF8-2 = <Defined in Section 4 of RFC3629>
209
210 UTF8-3 = <Defined in Section 4 of RFC3629>
211
212 UTF8-4 = <Defined in Section 4 of RFC3629>
213
214 See [RFC5198] for a discussion of Unicode normalization;
215 normalization form NFC [UNF] SHOULD be used. Actually, if one is
216 going to do internationalization properly, one of the most often
217 cited goals is to permit people to spell their names correctly.
218 Since many mailbox local parts reflect personal names, that principle
219 applies to mailboxes as well. The NFKC normalization form [UNF]
220 SHOULD NOT be used because it may lose information that is needed to
221 correctly spell some names in some unusual circumstances.
222
223
224
225
226Yang, et al. Standards Track [Page 4]
227
228RFC 6532 Internationalized Email Headers February 2012
229
230
2313.2. Syntax Extensions to RFC 5322
232
233 The following rules extend the ABNF syntax defined in [RFC5322] and
234 [RFC5234] in order to allow UTF-8 content.
235
236 VCHAR =/ UTF8-non-ascii 5234:774 ../message/todo.go:7
237
238 ctext =/ UTF8-non-ascii 5322:602 ../message/todo.go:8
239
240 atext =/ UTF8-non-ascii 5322:679 ../message/todo.go:9
241
242 qtext =/ UTF8-non-ascii 5322:735 ../message/authresults.go:107 ../message/todo.go:10
243
244 text =/ UTF8-non-ascii 5322:1001 ../message/todo.go:11
245 ; note that this upgrades the body to UTF-8
246
247 dtext =/ UTF8-non-ascii 5322:967 ../message/todo.go:12
248
249 The preceding changes mean that the following constructs now allow
250 UTF-8:
251
252 1. Unstructured text, used in header fields like "Subject:" or
253 "Content-description:".
254
255 2. Any construct that uses atoms, including but not limited to the
256 local parts of addresses and Message-IDs. This includes
257 addresses in the "for" clauses of "Received:" header fields.
258
259 3. Quoted strings.
260
261 4. Domains.
262
263 Note that header field names are not on this list; these are still
264 restricted to ASCII.
265
2663.3. Use of 8-bit UTF-8 in Message-IDs
267
268 Implementers of Message-ID generation algorithms MAY prefer to
269 restrain their output to ASCII since that has some advantages, such
270 as when constructing "In-reply-to:" and "References:" header fields
271 in mailing-list threads where some senders use internationalized
272 addresses and others do not.
273
2743.4. Effects on Line Length Limits
275
276 Section 2.1.1 of [RFC5322] limits lines to 998 characters and
277 recommends that the lines be restricted to only 78 characters. This
278 specification changes the former limit to 998 octets. (Note that, in
279
280
281
282Yang, et al. Standards Track [Page 5]
283
284RFC 6532 Internationalized Email Headers February 2012
285
286
287 ASCII, octets and characters are effectively the same, but this is
288 not true in UTF-8.) The 78-character limit remains defined in terms
289 of characters, not octets, since it is intended to address display
290 width issues, not line-length issues.
291
2923.5. Changes to MIME Message Type Encoding Restrictions
293
294 This specification updates Section 6.4 of [RFC2045]. [RFC2045]
295 prohibits applying a content-transfer-encoding to any subtypes of
296 "message/". This specification relaxes that rule -- it allows newly
297 defined MIME types to permit content-transfer-encoding, and it allows
298 content-transfer-encoding for message/global (see Section 3.7).
299
300 Background: Normally, transfer of message/global will be done in
301 8-bit-clean channels, and body parts will have "identity" encodings,
302 that is, no decoding is necessary.
303
304 But in the case where a message containing a message/global is
305 downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding
306 might have to be applied to the message. If the message travels
307 multiple times between a 7-bit environment and an environment
308 implementing these extensions, multiple levels of encoding may occur.
309 This is expected to be rarely seen in practice, and the potential
310 complexity of other ways of dealing with the issue is thought to be
311 larger than the complexity of allowing nested encodings where
312 necessary.
313
3143.6. Use of MIME Encoded-Words
315
316 The MIME encoded-words facility [RFC2047] provides the ability to
317 place non-ASCII text, but only in a subset of the places allowed by
318 this extension. Additionally, encoded-words are substantially more
319 complex since they allow the use of arbitrary charsets. Accordingly,
320 encoded-words SHOULD NOT be used when generating header fields for
321 messages employing this extension. Agents MAY, when incorporating
322 material from another message, convert encoded-word use to direct use
323 of UTF-8.
324
325 Note that care must be taken when decoding encoded-words because the
326 results after replacing an encoded-word with its decoded equivalent
327 in UTF-8 may be syntactically invalid. Processors that elect to
328 decode encoded-words MUST NOT generate syntactically invalid fields.
329
330
331
332
333
334
335
336
337
338Yang, et al. Standards Track [Page 6]
339
340RFC 6532 Internationalized Email Headers February 2012
341
342
3433.7. The message/global Media Type
344
345 Internationalized messages in this format MUST only be transmitted as
346 authorized by [RFC6531] or within a non-SMTP environment that
347 supports these messages. A message is a "message/global message" if:
348
349 o it contains 8-bit UTF-8 header values as specified in this
350 document, or
351
352 o it contains 8-bit UTF-8 values in the header fields of body parts.
353
354 The content of a message/global part is otherwise identical to that
355 of a message/rfc822 part.
356
357 If an object of this type is sent to a 7-bit-only system, it MUST
358 have an appropriate content-transfer-encoding applied. (Note that a
359 system compliant with MIME that doesn't recognize message/global is
360 supposed to treat it as "application/octet-stream" as described in
361 Section 5.2.4 of [RFC2046].)
362
363 The registration is as follows:
364
365 Type name: message
366
367 Subtype name: global
368
369 Required parameters: none
370
371 Optional parameters: none
372
373 Encoding considerations: Any content-transfer-encoding is permitted.
374 The 8-bit or binary content-transfer-encodings are recommended
375 where permitted.
376
377 Security considerations: See Section 4.
378
379 Interoperability considerations: This media type provides
380 functionality similar to the message/rfc822 content type for email
381 messages with internationalized email headers. When there is a
382 need to embed or return such content in another message, there is
383 generally an option to use this media type and leave the content
384 unchanged or down-convert the content to message/rfc822. Each of
385 these choices will interoperate with the installed base, but with
386 different properties. Systems unaware of internationalized
387 headers will typically treat a message/global body part as an
388 unknown attachment, while they will understand the structure of a
389 message/rfc822. However, systems that understand message/global
390
391
392
393
394Yang, et al. Standards Track [Page 7]
395
396RFC 6532 Internationalized Email Headers February 2012
397
398
399 will provide functionality superior to the result of a down-
400 conversion to message/rfc822. The most interoperable choice
401 depends on the deployed software.
402
403 Published specification: RFC 6532
404
405 Applications that use this media type: SMTP servers and email
406 clients that support multipart/report generation or parsing.
407 Email clients that forward messages with internationalized headers
408 as attachments.
409
410 Additional information:
411
412 Magic number(s): none
413
414 File extension(s): The extension ".u8msg" is suggested.
415
416 Macintosh file type code(s): A uniform type identifier (UTI) of
417 "public.utf8-email-message" is suggested. This conforms to
418 "public.message" and "public.composite-content", but does not
419 necessarily conform to "public.utf8-plain-text".
420
421 Person & email address to contact for further information: See the
422 Authors' Addresses section of this document.
423
424 Intended usage: COMMON
425
426 Restrictions on usage: This is a structured media type that embeds
427 other MIME media types. An 8-bit or binary content-transfer-
428 encoding SHOULD be used unless this media type is sent over a
429 7-bit-only transport.
430
431 Author: See the Authors' Addresses section of this document.
432
433 Change controller: IETF Standards Process
434
4354. Security Considerations
436
437 Because UTF-8 often requires several octets to encode a single
438 character, internationalization may cause header field values (in
439 general) and mail addresses (in particular) to become longer. As
440 specified in [RFC5322], each line of characters MUST be no more than
441 998 octets, excluding the CRLF. On the other hand, MDA (Mail
442 Delivery Agent) processes that parse, store, or handle email
443 addresses or local parts must take extra care not to overflow
444 buffers, truncate addresses, or exceed storage allotments. Also,
445 they must take care, when comparing, to use the entire lengths of the
446 addresses.
447
448
449
450Yang, et al. Standards Track [Page 8]
451
452RFC 6532 Internationalized Email Headers February 2012
453
454
455 There are lots of ways to use UTF-8 to represent something equivalent
456 or similar to a particular displayed character or group of
457 characters; see the security considerations in [RFC3629] for details
458 on the problems this can cause. The normalization process described
459 in Section 3.1 is recommended to minimize these issues.
460
461 The security impact of UTF-8 headers on email signature systems such
462 as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
463 discussed in Section 14 of [RFC6530].
464
465 If a user has a non-ASCII mailbox address and an ASCII mailbox
466 address, a digital certificate that identifies that user might have
467 both addresses in the identity. Having multiple email addresses as
468 identities in a single certificate is already supported in PKIX
469 (Public Key Infrastructure using X.509) [RFC5280] and OpenPGP
470 [RFC3156], but there may be user-interface issues associated with the
471 introduction of UTF-8 into addresses in this context.
472
4735. IANA Considerations
474
475 IANA has updated the registration of the message/global MIME type
476 using the registration form contained in Section 3.7.
477
4786. Acknowledgements
479
480 This document incorporates many ideas first described in a draft
481 document by Paul Hoffman, although many details have changed from
482 that earlier work.
483
484 The authors especially thank Jeff Yeh for his efforts and
485 contributions on editing previous versions.
486
487 Most of the content of this document was provided by John C Klensin
488 and Dave Crocker. Significant comments and suggestions were received
489 from Martin Duerst, Julien Elie, Arnt Gulbrandsen, Kristin Hubner,
490 Kari Hurtta, Yangwoo Ko, Charles H. Lindsey, Alexey Melnikov, Chris
491 Newman, Pete Resnick, Yoshiro Yoneya, and additional members of the
492 Joint Engineering Team (JET) and were incorporated into the document.
493 The authors wish to sincerely thank them all for their contributions.
494
495
496
497
498
499
500
501
502
503
504
505
506Yang, et al. Standards Track [Page 9]
507
508RFC 6532 Internationalized Email Headers February 2012
509
510
5117. References
512
5137.1. Normative References
514
515 [ASCII] "Coded Character Set -- 7-bit American Standard Code for
516 Information Interchange", ANSI X3.4, 1986.
517
518 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
519 Requirement Levels", BCP 14, RFC 2119, March 1997.
520
521 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
522 10646", STD 63, RFC 3629, November 2003.
523
524 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
525 Interchange", RFC 5198, March 2008.
526
527 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
528 Specifications: ABNF", STD 68, RFC 5234, January 2008.
529
530 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
531 October 2008.
532
533 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
534 October 2008.
535
536 [RFC6530] Klensin, J. and Y. Ko, "Overview and Framework for
537 Internationalized Email", RFC 6530, February 2012.
538
539 [RFC6531] Yao, J. and W. Mao, "SMTP Extension for Internationalized
540 Email", RFC 6531, February 2012.
541
542 [UNF] Davis, M. and K. Whistler, "Unicode Standard Annex #15:
543 Unicode Normalization Forms", September 2010,
544 <http://www.unicode.org/reports/tr15/>.
545
5467.2. Informative References
547
548 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
549 Extensions (MIME) Part One: Format of Internet Message
550 Bodies", RFC 2045, November 1996.
551
552 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
553 Extensions (MIME) Part Two: Media Types", RFC 2046,
554 November 1996.
555
556 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
557 Part Three: Message Header Extensions for Non-ASCII Text",
558 RFC 2047, November 1996.
559
560
561
562Yang, et al. Standards Track [Page 10]
563
564RFC 6532 Internationalized Email Headers February 2012
565
566
567 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler,
568 "MIME Security with OpenPGP", RFC 3156, August 2001.
569
570 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
571 Housley, R., and W. Polk, "Internet X.509 Public Key
572 Infrastructure Certificate and Certificate Revocation List
573 (CRL) Profile", RFC 5280, May 2008.
574
575 [RFC5335] Yang, A., "Internationalized Email Headers", RFC 5335,
576 September 2008.
577
578 [RFC6152] Klensin, J., Freed, N., Rose, M., and D. Crocker, "SMTP
579 Service Extension for 8-bit MIME Transport", STD 71,
580 RFC 6152, March 2011.
581
582Authors' Addresses
583
584 Abel Yang
585 TWNIC
586 4F-2, No. 9, Sec 2, Roosevelt Rd.
587 Taipei 100
588 Taiwan
589
590 Phone: +886 2 23411313 ext 505
591 EMail: abelyang@twnic.net.tw
592
593
594 Shawn Steele
595 Microsoft
596
597 EMail: Shawn.Steele@microsoft.com
598
599
600 Ned Freed
601 Oracle
602 800 Royal Oaks
603 Monrovia, CA 91016-6347
604 USA
605
606 EMail: ned+ietf@mrochek.com
607
608
609
610
611
612
613
614
615
616
617
618Yang, et al. Standards Track [Page 11]
619
620