1
2
3
4
5
6
7Network Working Group N. Freed
8Request for Comments: 2231 Innosoft
9Updates: 2045, 2047, 2183 K. Moore
10Obsoletes: 2184 University of Tennessee
11Category: Standards Track November 1997
12
13
14 MIME Parameter Value and Encoded Word Extensions:
15 Character Sets, Languages, and Continuations
16
17
18Status of this Memo
19
20 This document specifies an Internet standards track protocol for the
21 Internet community, and requests discussion and suggestions for
22 improvements. Please refer to the current edition of the "Internet
23 Official Protocol Standards" (STD 1) for the standardization state
24 and status of this protocol. Distribution of this memo is unlimited.
25
26Copyright Notice
27
28 Copyright (C) The Internet Society (1997). All Rights Reserved.
29
301. Abstract
31
32 This memo defines extensions to the RFC 2045 media type and RFC 2183
33 disposition parameter value mechanisms to provide
34
35 (1) a means to specify parameter values in character sets
36 other than US-ASCII,
37
38 (2) to specify the language to be used should the value be
39 displayed, and
40
41 (3) a continuation mechanism for long parameter values to
42 avoid problems with header line wrapping.
43
44 This memo also defines an extension to the encoded words defined in
45 RFC 2047 to allow the specification of the language to be used for
46 display as well as the character set.
47
482. Introduction
49
50 The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
51 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
52 allows for:
53
54
55
56
57
58Freed & Moore Standards Track [Page 1]
59
60RFC 2231 MIME Value and Encoded Word Extensions November 1997
61
62
63 (1) textual message bodies in character sets other than
64 US-ASCII,
65
66 (2) non-textual message bodies,
67
68 (3) multi-part message bodies, and
69
70 (4) textual header information in character sets other than
71 US-ASCII.
72
73 MIME is now widely deployed and is used by a variety of Internet
74 protocols, including, of course, Internet email. However, MIME's
75 success has resulted in the need for additional mechanisms that were
76 not provided in the original protocol specification.
77
78 In particular, existing MIME mechanisms provide for named media type
79 (content-type field) parameters as well as named disposition
80 (content-disposition field). A MIME media type may specify any
81 number of parameters associated with all of its subtypes, and any
82 specific subtype may specify additional parameters for its own use. A
83 MIME disposition value may specify any number of associated
84 parameters, the most important of which is probably the attachment
85 disposition's filename parameter.
86
87 These parameter names and values end up appearing in the content-type
88 and content-disposition header fields in Internet email. This
89 inherently imposes three crucial limitations:
90
91 (1) Lines in Internet email header fields are folded
92 according to RFC 822 folding rules. This makes long
93 parameter values problematic.
94
95 (2) MIME headers, like the RFC 822 headers they often
96 appear in, are limited to 7bit US-ASCII, and the
97 encoded-word mechanisms of RFC 2047 are not available
98 to parameter values. This makes it impossible to have
99 parameter values in character sets other than US-ASCII
100 without specifying some sort of private per-parameter
101 encoding.
102
103 (3) It has recently become clear that character set
104 information is not sufficient to properly display some
105 sorts of information -- language information is also
106 needed [RFC-2130]. For example, support for
107 handicapped users may require reading text string
108
109
110
111
112
113
114Freed & Moore Standards Track [Page 2]
115
116RFC 2231 MIME Value and Encoded Word Extensions November 1997
117
118
119 aloud. The language the text is written in is needed
120 for this to be done correctly. Some parameter values
121 may need to be displayed, hence there is a need to
122 allow for the inclusion of language information.
123
124 The last problem on this list is also an issue for the encoded words
125 defined by RFC 2047, as encoded words are intended primarily for
126 display purposes.
127
128 This document defines extensions that address all of these
129 limitations. All of these extensions are implemented in a fashion
130 that is completely compatible at a syntactic level with existing MIME
131 implementations. In addition, the extensions are designed to have as
132 little impact as possible on existing uses of MIME.
133
134 IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when
135 they actually are used. As such, these mechanisms should not be used
136 lightly; they should be reserved for situations where a real need for
137 them exists.
138
1392.1. Requirements notation
140
141 This document occasionally uses terms that appear in capital letters.
142 When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
143 appear capitalized, they are being used to indicate particular
144 requirements of this specification. A discussion of the meanings of
145 these terms appears in [RFC- 2119].
146
1473. Parameter Value Continuations
148
149 Long MIME media type or disposition parameter values do not interact
150 well with header line wrapping conventions. In particular, proper
151 header line wrapping depends on there being places where linear
152 whitespace (LWSP) is allowed, which may or may not be present in a
153 parameter value, and even if present may not be recognizable as such
154 since specific knowledge of parameter value syntax may not be
155 available to the agent doing the line wrapping. The result is that
156 long parameter values may end up getting truncated or otherwise
157 damaged by incorrect line wrapping implementations.
158
159 A mechanism is therefore needed to break up parameter values into
160 smaller units that are amenable to line wrapping. Any such mechanism
161 MUST be compatible with existing MIME processors. This means that
162
163 (1) the mechanism MUST NOT change the syntax of MIME media
164 type and disposition lines, and
165
166
167
168
169
170Freed & Moore Standards Track [Page 3]
171
172RFC 2231 MIME Value and Encoded Word Extensions November 1997
173
174
175 (2) the mechanism MUST NOT depend on parameter ordering
176 since MIME states that parameters are not order
177 sensitive. Note that while MIME does prohibit
178 modification of MIME headers during transport, it is
179 still possible that parameters will be reordered when
180 user agent level processing is done.
181
182 The obvious solution, then, is to use multiple parameters to contain
183 a single parameter value and to use some kind of distinguished name
184 to indicate when this is being done. And this obvious solution is
185 exactly what is specified here: The asterisk character ("*") followed
186 by a decimal count is employed to indicate that multiple parameters
187 are being used to encapsulate a single parameter value. The count
188 starts at 0 and increments by 1 for each subsequent section of the
189 parameter value. Decimal values are used and neither leading zeroes
190 nor gaps in the sequence are allowed.
191
192 The original parameter value is recovered by concatenating the
193 various sections of the parameter, in order. For example, the
194 content-type field
195
196 Content-Type: message/external-body; access-type=URL;
197 URL*0="ftp://";
198 URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
199
200 is semantically identical to
201
202 Content-Type: message/external-body; access-type=URL;
203 URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
204
205 Note that quotes around parameter values are part of the value
206 syntax; they are NOT part of the value itself. Furthermore, it is
207 explicitly permitted to have a mixture of quoted and unquoted
208 continuation fields.
209
2104. Parameter Value Character Set and Language Information
211
212 Some parameter values may need to be qualified with character set or
213 language information. It is clear that a distinguished parameter
214 name is needed to identify when this information is present along
215 with a specific syntax for the information in the value itself. In
216 addition, a lightweight encoding mechanism is needed to accommodate 8
217 bit information in parameter values.
218
219
220
221
222
223
224
225
226Freed & Moore Standards Track [Page 4]
227
228RFC 2231 MIME Value and Encoded Word Extensions November 1997
229
230
231 Asterisks ("*") are reused to provide the indicator that language and
232 character set information is present and encoding is being used. A
233 single quote ("'") is used to delimit the character set and language
234 information at the beginning of the parameter value. Percent signs
235 ("%") are used as the encoding flag, which agrees with RFC 2047.
236
237 Specifically, an asterisk at the end of a parameter name acts as an
238 indicator that character set and language information may appear at
239 the beginning of the parameter value. A single quote is used to
240 separate the character set, language, and actual value information in
241 the parameter value string, and an percent sign is used to flag
242 octets encoded in hexadecimal. For example:
243
244 Content-Type: application/x-stuff;
245 title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
246
247 Note that it is perfectly permissible to leave either the character
248 set or language field blank. Note also that the single quote
249 delimiters MUST be present even when one of the field values is
250 omitted. This is done when either character set, language, or both
251 are not relevant to the parameter value at hand. This MUST NOT be
252 done in order to indicate a default character set or language --
253 parameter field definitions MUST NOT assign a default character set
254 or language.
255
2564.1. Combining Character Set, Language, and Parameter Continuations
257
258 Character set and language information may be combined with the
259 parameter continuation mechanism. For example:
260
261 Content-Type: application/x-stuff
262 title*0*=us-ascii'en'This%20is%20even%20more%20
263 title*1*=%2A%2A%2Afun%2A%2A%2A%20
264 title*2="isn't it!"
265
266 Note that:
267
268 (1) Language and character set information only appear at
269 the beginning of a given parameter value.
270
271 (2) Continuations do not provide a facility for using more
272 than one character set or language in the same
273 parameter value.
274
275 (3) A value presented using multiple continuations may
276 contain a mixture of encoded and unencoded segments.
277
278
279
280
281
282Freed & Moore Standards Track [Page 5]
283
284RFC 2231 MIME Value and Encoded Word Extensions November 1997
285
286
287 (4) The first segment of a continuation MUST be encoded if
288 language and character set information are given.
289
290 (5) If the first segment of a continued parameter value is
291 encoded the language and character set field delimiters
292 MUST be present even when the fields are left blank.
293
2945. Language specification in Encoded Words
295
296 RFC 2047 provides support for non-US-ASCII character sets in RFC 822
297 message header comments, phrases, and any unstructured text field.
298 This is done by defining an encoded word construct which can appear
299 in any of these places. Given that these are fields intended for
300 display, it is sometimes necessary to associate language information
301 with encoded words as well as just the character set. This
302 specification extends the definition of an encoded word to allow the
303 inclusion of such information. This is simply done by suffixing the
304 character set specification with an asterisk followed by the language
305 tag. For example:
306
307 From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
308
3096. IMAP4 Handling of Parameter Values
310
311 IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
312 when generating the BODY and BODYSTRUCTURE fetch attributes.
313
3147. Modifications to MIME ABNF
315
316 The ABNF for MIME parameter values given in RFC 2045 is:
317
318 parameter := attribute "=" value
319
320 attribute := token
321 ; Matching of attributes
322 ; is ALWAYS case-insensitive.
323
324 This specification changes this ABNF to:
325
326 parameter := regular-parameter / extended-parameter
327
328 regular-parameter := regular-parameter-name "=" value
329
330 regular-parameter-name := attribute [section]
331
332 attribute := 1*attribute-char
333
334
335
336
337
338Freed & Moore Standards Track [Page 6]
339
340RFC 2231 MIME Value and Encoded Word Extensions November 1997
341
342
343 attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
344 "*", "'", "%", or tspecials>
345
346 section := initial-section / other-sections
347
348 initial-section := "*0"
349
350 other-sections := "*" ("1" / "2" / "3" / "4" / "5" /
351 "6" / "7" / "8" / "9") *DIGIT)
352
353 extended-parameter := (extended-initial-name "="
354 extended-value) /
355 (extended-other-names "="
356 extended-other-values)
357
358 extended-initial-name := attribute [initial-section] "*"
359
360 extended-other-names := attribute other-sections "*"
361
362 extended-initial-value := [charset] "'" [language] "'"
363 extended-other-values
364
365 extended-other-values := *(ext-octet / attribute-char)
366
367 ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
368
369 charset := <registered character set name>
370
371 language := <registered language tag [RFC-1766]>
372
373 The ABNF given in RFC 2047 for encoded-words is:
374
375 encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
376
377 This specification changes this ABNF to:
378
379 encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
380
3818. Character sets which allow specification of language
382
383 In the future it is likely that some character sets will provide
384 facilities for inline language labeling. Such facilities are
385 inherently more flexible than those defined here as they allow for
386 language switching in the middle of a string.
387
388
389
390
391
392
393
394Freed & Moore Standards Track [Page 7]
395
396RFC 2231 MIME Value and Encoded Word Extensions November 1997
397
398
399 If and when such facilities are developed they SHOULD be used in
400 preference to the language labeling facilities specified here. Note
401 that all the mechanisms defined here allow for the omission of
402 language labels so as to be able to accommodate this possible future
403 usage.
404
4059. Security Considerations
406
407 This RFC does not discuss security issues and is not believed to
408 raise any security issues not already endemic in electronic mail and
409 present in fully conforming implementations of MIME.
410
41110. References
412
413 [RFC-822]
414 Crocker, D., "Standard for the Format of ARPA Internet
415 Text Messages", STD 11, RFC 822 August 1982.
416
417 [RFC-1766]
418 Alvestrand, H., "Tags for the Identification of
419 Languages", RFC 1766, March 1995.
420
421 [RFC-2045]
422 Freed, N., and N. Borenstein, "Multipurpose Internet Mail
423 Extensions (MIME) Part One: Format of Internet Message
424 Bodies", RFC 2045, December 1996.
425
426 [RFC-2046]
427 Freed, N. and N. Borenstein, "Multipurpose Internet Mail
428 Extensions (MIME) Part Two: Media Types", RFC 2046,
429 December 1996.
430
431 [RFC-2047]
432 Moore, K., "Multipurpose Internet Mail Extensions (MIME)
433 Part Three: Representation of Non-ASCII Text in Internet
434 Message Headers", RFC 2047, December 1996.
435
436 [RFC-2048]
437 Freed, N., Klensin, J. and J. Postel, "Multipurpose
438 Internet Mail Extensions (MIME) Part Four: MIME
439 Registration Procedures", RFC 2048, December 1996.
440
441 [RFC-2049]
442 Freed, N. and N. Borenstein, "Multipurpose Internet Mail
443 Extensions (MIME) Part Five: Conformance Criteria and
444 Examples", RFC 2049, December 1996.
445
446
447
448
449
450Freed & Moore Standards Track [Page 8]
451
452RFC 2231 MIME Value and Encoded Word Extensions November 1997
453
454
455 [RFC-2060]
456 Crispin, M., "Internet Message Access Protocol - Version
457 4rev1", RFC 2060, December 1996.
458
459 [RFC-2119]
460 Bradner, S., "Key words for use in RFCs to Indicate
461 Requirement Levels", RFC 2119, March 1997.
462
463 [RFC-2130]
464 Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
465 Atkinson, R., Crispin, M., and P. Svanberg, "Report from the
466 IAB Character Set Workshop", RFC 2130, April 1997.
467
468 [RFC-2183]
469 Troost, R., Dorner, S. and K. Moore, "Communicating
470 Presentation Information in Internet Messages: The
471 Content-Disposition Header", RFC 2183, August 1997.
472
47311. Authors' Addresses
474
475 Ned Freed
476 Innosoft International, Inc.
477 1050 Lakes Drive
478 West Covina, CA 91790
479 USA
480
481 Phone: +1 626 919 3600
482 Fax: +1 626 919 3614
483 EMail: ned.freed@innosoft.com
484
485
486 Keith Moore
487 Computer Science Dept.
488 University of Tennessee
489 107 Ayres Hall
490 Knoxville, TN 37996-1301
491 USA
492
493 EMail: moore@cs.utk.edu
494
495
496
497
498
499
500
501
502
503
504
505
506Freed & Moore Standards Track [Page 9]
507
508RFC 2231 MIME Value and Encoded Word Extensions November 1997
509
510
51112. Full Copyright Statement
512
513 Copyright (C) The Internet Society (1997). All Rights Reserved.
514
515 This document and translations of it may be copied and furnished to
516 others, and derivative works that comment on or otherwise explain it
517 or assist in its implementation may be prepared, copied, published
518 and distributed, in whole or in part, without restriction of any
519 kind, provided that the above copyright notice and this paragraph are
520 included on all such copies and derivative works. However, this
521 document itself may not be modified in any way, such as by removing
522 the copyright notice or references to the Internet Society or other
523 Internet organizations, except as needed for the purpose of
524 developing Internet standards in which case the procedures for
525 copyrights defined in the Internet Standards process must be
526 followed, or as required to translate it into languages other than
527 English.
528
529 The limited permissions granted above are perpetual and will not be
530 revoked by the Internet Society or its successors or assigns.
531
532 This document and the information contained herein is provided on an
533 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
534 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
535 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
536 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
537 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562Freed & Moore Standards Track [Page 10]
563
564