1
2
3
4
5
6
7Network Working Group N. Freed
8Request for Comments: 2046 Innosoft
9Obsoletes: 1521, 1522, 1590 N. Borenstein
10Category: Standards Track First Virtual
11 November 1996
12
13
14 Multipurpose Internet Mail Extensions
15 (MIME) Part Two:
16 Media Types
17
18Status of this Memo
19
20 This document specifies an Internet standards track protocol for the
21 Internet community, and requests discussion and suggestions for
22 improvements. Please refer to the current edition of the "Internet
23 Official Protocol Standards" (STD 1) for the standardization state
24 and status of this protocol. Distribution of this memo is unlimited.
25
26Abstract
27
28 STD 11, RFC 822 defines a message representation protocol specifying
29 considerable detail about US-ASCII message headers, but which leaves
30 the message content, or message body, as flat US-ASCII text. This
31 set of documents, collectively called the Multipurpose Internet Mail
32 Extensions, or MIME, redefines the format of messages to allow for
33
34 (1) textual message bodies in character sets other than
35 US-ASCII,
36
37 (2) an extensible set of different formats for non-textual
38 message bodies,
39
40 (3) multi-part message bodies, and
41
42 (4) textual header information in character sets other than
43 US-ASCII.
44
45 These documents are based on earlier work documented in RFC 934, STD
46 11, and RFC 1049, but extends and revises them. Because RFC 822 said
47 so little about message bodies, these documents are largely
48 orthogonal to (rather than a revision of) RFC 822.
49
50 The initial document in this set, RFC 2045, specifies the various
51 headers used to describe the structure of MIME messages. This second
52 document defines the general structure of the MIME media typing
53 system and defines an initial set of media types. The third document,
54 RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text
55
56
57
58Freed & Borenstein Standards Track [Page 1]
59
60RFC 2046 Media Types November 1996
61
62
63 data in Internet mail header fields. The fourth document, RFC 2048,
64 specifies various IANA registration procedures for MIME-related
65 facilities. The fifth and final document, RFC 2049, describes MIME
66 conformance criteria as well as providing some illustrative examples
67 of MIME message formats, acknowledgements, and the bibliography.
68
69 These documents are revisions of RFCs 1521 and 1522, which themselves
70 were revisions of RFCs 1341 and 1342. An appendix in RFC 2049
71 describes differences and changes from previous versions.
72
73Table of Contents
74
75 1. Introduction ......................................... 3
76 2. Definition of a Top-Level Media Type ................. 4
77 3. Overview Of The Initial Top-Level Media Types ........ 4
78 4. Discrete Media Type Values ........................... 6
79 4.1 Text Media Type ..................................... 6
80 4.1.1 Representation of Line Breaks ..................... 7
81 4.1.2 Charset Parameter ................................. 7
82 4.1.3 Plain Subtype ..................................... 11
83 4.1.4 Unrecognized Subtypes ............................. 11
84 4.2 Image Media Type .................................... 11
85 4.3 Audio Media Type .................................... 11
86 4.4 Video Media Type .................................... 12
87 4.5 Application Media Type .............................. 12
88 4.5.1 Octet-Stream Subtype .............................. 13
89 4.5.2 PostScript Subtype ................................ 14
90 4.5.3 Other Application Subtypes ........................ 17
91 5. Composite Media Type Values .......................... 17
92 5.1 Multipart Media Type ................................ 17
93 5.1.1 Common Syntax ..................................... 19
94 5.1.2 Handling Nested Messages and Multiparts ........... 24
95 5.1.3 Mixed Subtype ..................................... 24
96 5.1.4 Alternative Subtype ............................... 24
97 5.1.5 Digest Subtype .................................... 26
98 5.1.6 Parallel Subtype .................................. 27
99 5.1.7 Other Multipart Subtypes .......................... 28
100 5.2 Message Media Type .................................. 28
101 5.2.1 RFC822 Subtype .................................... 28
102 5.2.2 Partial Subtype ................................... 29
103 5.2.2.1 Message Fragmentation and Reassembly ............ 30
104 5.2.2.2 Fragmentation and Reassembly Example ............ 31
105 5.2.3 External-Body Subtype ............................. 33
106 5.2.4 Other Message Subtypes ............................ 40
107 6. Experimental Media Type Values ....................... 40
108 7. Summary .............................................. 41
109 8. Security Considerations .............................. 41
110 9. Authors' Addresses ................................... 42
111
112
113
114Freed & Borenstein Standards Track [Page 2]
115
116RFC 2046 Media Types November 1996
117
118
119 A. Collected Grammar .................................... 43
120
1211. Introduction
122
123 The first document in this set, RFC 2045, defines a number of header
124 fields, including Content-Type. The Content-Type field is used to
125 specify the nature of the data in the body of a MIME entity, by
126 giving media type and subtype identifiers, and by providing auxiliary
127 information that may be required for certain media types. After the
128 type and subtype names, the remainder of the header field is simply a
129 set of parameters, specified in an attribute/value notation. The
130 ordering of parameters is not significant.
131
132 In general, the top-level media type is used to declare the general
133 type of data, while the subtype specifies a specific format for that
134 type of data. Thus, a media type of "image/xyz" is enough to tell a
135 user agent that the data is an image, even if the user agent has no
136 knowledge of the specific image format "xyz". Such information can
137 be used, for example, to decide whether or not to show a user the raw
138 data from an unrecognized subtype -- such an action might be
139 reasonable for unrecognized subtypes of "text", but not for
140 unrecognized subtypes of "image" or "audio". For this reason,
141 registered subtypes of "text", "image", "audio", and "video" should
142 not contain embedded information that is really of a different type.
143 Such compound formats should be represented using the "multipart" or
144 "application" types.
145
146 Parameters are modifiers of the media subtype, and as such do not
147 fundamentally affect the nature of the content. The set of
148 meaningful parameters depends on the media type and subtype. Most
149 parameters are associated with a single specific subtype. However, a
150 given top-level media type may define parameters which are applicable
151 to any subtype of that type. Parameters may be required by their
152 defining media type or subtype or they may be optional. MIME
153 implementations must also ignore any parameters whose names they do
154 not recognize.
155
156 MIME's Content-Type header field and media type mechanism has been
157 carefully designed to be extensible, and it is expected that the set
158 of media type/subtype pairs and their associated parameters will grow
159 significantly over time. Several other MIME facilities, such as
160 transfer encodings and "message/external-body" access types, are
161 likely to have new values defined over time. In order to ensure that
162 the set of such values is developed in an orderly, well-specified,
163 and public manner, MIME sets up a registration process which uses the
164 Internet Assigned Numbers Authority (IANA) as a central registry for
165 MIME's various areas of extensibility. The registration process for
166 these areas is described in a companion document, RFC 2048.
167
168
169
170Freed & Borenstein Standards Track [Page 3]
171
172RFC 2046 Media Types November 1996
173
174
175 The initial seven standard top-level media type are defined and
176 described in the remainder of this document.
177
1782. Definition of a Top-Level Media Type
179
180 The definition of a top-level media type consists of:
181
182 (1) a name and a description of the type, including
183 criteria for whether a particular type would qualify
184 under that type,
185
186 (2) the names and definitions of parameters, if any, which
187 are defined for all subtypes of that type (including
188 whether such parameters are required or optional),
189
190 (3) how a user agent and/or gateway should handle unknown
191 subtypes of this type,
192
193 (4) general considerations on gatewaying entities of this
194 top-level type, if any, and
195
196 (5) any restrictions on content-transfer-encodings for
197 entities of this top-level type.
198
1993. Overview Of The Initial Top-Level Media Types
200
201 The five discrete top-level media types are:
202
203 (1) text -- textual information. The subtype "plain" in
204 particular indicates plain text containing no
205 formatting commands or directives of any sort. Plain
206 text is intended to be displayed "as-is". No special
207 software is required to get the full meaning of the
208 text, aside from support for the indicated character
209 set. Other subtypes are to be used for enriched text in
210 forms where application software may enhance the
211 appearance of the text, but such software must not be
212 required in order to get the general idea of the
213 content. Possible subtypes of "text" thus include any
214 word processor format that can be read without
215 resorting to software that understands the format. In
216 particular, formats that employ embeddded binary
217 formatting information are not considered directly
218 readable. A very simple and portable subtype,
219 "richtext", was defined in RFC 1341, with a further
220 revision in RFC 1896 under the name "enriched".
221
222
223
224
225
226Freed & Borenstein Standards Track [Page 4]
227
228RFC 2046 Media Types November 1996
229
230
231 (2) image -- image data. "Image" requires a display device
232 (such as a graphical display, a graphics printer, or a
233 FAX machine) to view the information. An initial
234 subtype is defined for the widely-used image format
235 JPEG. . subtypes are defined for two widely-used image
236 formats, jpeg and gif.
237
238 (3) audio -- audio data. "Audio" requires an audio output
239 device (such as a speaker or a telephone) to "display"
240 the contents. An initial subtype "basic" is defined in
241 this document.
242
243 (4) video -- video data. "Video" requires the capability
244 to display moving images, typically including
245 specialized hardware and software. An initial subtype
246 "mpeg" is defined in this document.
247
248 (5) application -- some other kind of data, typically
249 either uninterpreted binary data or information to be
250 processed by an application. The subtype "octet-
251 stream" is to be used in the case of uninterpreted
252 binary data, in which case the simplest recommended
253 action is to offer to write the information into a file
254 for the user. The "PostScript" subtype is also defined
255 for the transport of PostScript material. Other
256 expected uses for "application" include spreadsheets,
257 data for mail-based scheduling systems, and languages
258 for "active" (computational) messaging, and word
259 processing formats that are not directly readable.
260 Note that security considerations may exist for some
261 types of application data, most notably
262 "application/PostScript" and any form of active
263 messaging. These issues are discussed later in this
264 document.
265
266 The two composite top-level media types are:
267
268 (1) multipart -- data consisting of multiple entities of
269 independent data types. Four subtypes are initially
270 defined, including the basic "mixed" subtype specifying
271 a generic mixed set of parts, "alternative" for
272 representing the same data in multiple formats,
273 "parallel" for parts intended to be viewed
274 simultaneously, and "digest" for multipart entities in
275 which each part has a default type of "message/rfc822".
276
277
278
279
280
281
282Freed & Borenstein Standards Track [Page 5]
283
284RFC 2046 Media Types November 1996
285
286
287 (2) message -- an encapsulated message. A body of media
288 type "message" is itself all or a portion of some kind
289 of message object. Such objects may or may not in turn
290 contain other entities. The "rfc822" subtype is used
291 when the encapsulated content is itself an RFC 822
292 message. The "partial" subtype is defined for partial
293 RFC 822 messages, to permit the fragmented transmission
294 of bodies that are thought to be too large to be passed
295 through transport facilities in one piece. Another
296 subtype, "external-body", is defined for specifying
297 large bodies by reference to an external data source.
298
299 It should be noted that the list of media type values given here may
300 be augmented in time, via the mechanisms described above, and that
301 the set of subtypes is expected to grow substantially.
302
3034. Discrete Media Type Values
304
305 Five of the seven initial media type values refer to discrete bodies.
306 The content of these types must be handled by non-MIME mechanisms;
307 they are opaque to MIME processors.
308
3094.1. Text Media Type
310
311 The "text" media type is intended for sending material which is
312 principally textual in form. A "charset" parameter may be used to
313 indicate the character set of the body text for "text" subtypes,
314 notably including the subtype "text/plain", which is a generic
315 subtype for plain text. Plain text does not provide for or allow
316 formatting commands, font attribute specifications, processing
317 instructions, interpretation directives, or content markup. Plain
318 text is seen simply as a linear sequence of characters, possibly
319 interrupted by line breaks or page breaks. Plain text may allow the
320 stacking of several characters in the same position in the text.
321 Plain text in scripts like Arabic and Hebrew may also include
322 facilitites that allow the arbitrary mixing of text segments with
323 opposite writing directions.
324
325 Beyond plain text, there are many formats for representing what might
326 be known as "rich text". An interesting characteristic of many such
327 representations is that they are to some extent readable even without
328 the software that interprets them. It is useful, then, to
329 distinguish them, at the highest level, from such unreadable data as
330 images, audio, or text represented in an unreadable form. In the
331 absence of appropriate interpretation software, it is reasonable to
332 show subtypes of "text" to the user, while it is not reasonable to do
333 so with most nontextual data. Such formatted textual data should be
334 represented using subtypes of "text".
335
336
337
338Freed & Borenstein Standards Track [Page 6]
339
340RFC 2046 Media Types November 1996
341
342
3434.1.1. Representation of Line Breaks
344
345 The canonical form of any MIME "text" subtype MUST always represent a
346 line break as a CRLF sequence. Similarly, any occurrence of CRLF in
347 MIME "text" MUST represent a line break. Use of CR and LF outside of
348 line break sequences is also forbidden.
349
350 This rule applies regardless of format or character set or sets
351 involved.
352
353 NOTE: The proper interpretation of line breaks when a body is
354 displayed depends on the media type. In particular, while it is
355 appropriate to treat a line break as a transition to a new line when
356 displaying a "text/plain" body, this treatment is actually incorrect
357 for other subtypes of "text" like "text/enriched" [RFC-1896].
358 Similarly, whether or not line breaks should be added during display
359 operations is also a function of the media type. It should not be
360 necessary to add any line breaks to display "text/plain" correctly,
361 whereas proper display of "text/enriched" requires the appropriate
362 addition of line breaks.
363
364 NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-
365 821] allows a maximum of 998 octets before the next CRLF sequence.
366 To be transported by such protocols, data which includes too long
367 segments without CRLF sequences must be encoded with a suitable
368 content-transfer-encoding.
369
3704.1.2. Charset Parameter
371
372 A critical parameter that may be specified in the Content-Type field
373 for "text/plain" data is the character set. This is specified with a
374 "charset" parameter, as in:
375
376 Content-type: text/plain; charset=iso-8859-1
377
378 Unlike some other parameter values, the values of the charset
379 parameter are NOT case sensitive. The default character set, which
380 must be assumed in the absence of a charset parameter, is US-ASCII.
381
382 The specification for any future subtypes of "text" must specify
383 whether or not they will also utilize a "charset" parameter, and may
384 possibly restrict its values as well. For other subtypes of "text"
385 than "text/plain", the semantics of the "charset" parameter should be
386 defined to be identical to those specified here for "text/plain",
387 i.e., the body consists entirely of characters in the given charset.
388 In particular, definers of future "text" subtypes should pay close
389 attention to the implications of multioctet character sets for their
390 subtype definitions.
391
392
393
394Freed & Borenstein Standards Track [Page 7]
395
396RFC 2046 Media Types November 1996
397
398
399 The charset parameter for subtypes of "text" gives a name of a
400 character set, as "character set" is defined in RFC 2045. The rules
401 regarding line breaks detailed in the previous section must also be
402 observed -- a character set whose definition does not conform to
403 these rules cannot be used in a MIME "text" subtype.
404
405 An initial list of predefined character set names can be found at the
406 end of this section. Additional character sets may be registered
407 with IANA.
408
409 Other media types than subtypes of "text" might choose to employ the
410 charset parameter as defined here, but with the CRLF/line break
411 restriction removed. Therefore, all character sets that conform to
412 the general definition of "character set" in RFC 2045 can be
413 registered for MIME use.
414
415 Note that if the specified character set includes 8-bit characters
416 and such characters are used in the body, a Content-Transfer-Encoding
417 header field and a corresponding encoding on the data are required in
418 order to transmit the body via some mail transfer protocols, such as
419 SMTP [RFC-821].
420
421 The default character set, US-ASCII, has been the subject of some
422 confusion and ambiguity in the past. Not only were there some
423 ambiguities in the definition, there have been wide variations in
424 practice. In order to eliminate such ambiguity and variations in the
425 future, it is strongly recommended that new user agents explicitly
426 specify a character set as a media type parameter in the Content-Type
427 header field. "US-ASCII" does not indicate an arbitrary 7-bit
428 character set, but specifies that all octets in the body must be
429 interpreted as characters according to the US-ASCII character set.
430 National and application-oriented versions of ISO 646 [ISO-646] are
431 usually NOT identical to US-ASCII, and in that case their use in
432 Internet mail is explicitly discouraged. The omission of the ISO 646
433 character set from this document is deliberate in this regard. The
434 character set name of "US-ASCII" explicitly refers to the character
435 set defined in ANSI X3.4-1986 [US- ASCII]. The new international
436 reference version (IRV) of the 1991 edition of ISO 646 is identical
437 to US-ASCII. The character set name "ASCII" is reserved and must not
438 be used for any purpose.
439
440 NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
441 version of the American Standard. Insofar as one of the purposes of
442 specifying a media type and character set is to permit the receiver
443 to unambiguously determine how the sender intended the coded message
444 to be interpreted, assuming anything other than "strict ASCII" as the
445 default would risk unintentional and incompatible changes to the
446 semantics of messages now being transmitted. This also implies that
447
448
449
450Freed & Borenstein Standards Track [Page 8]
451
452RFC 2046 Media Types November 1996
453
454
455 messages containing characters coded according to other versions of
456 ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
457 procedures (e.g., those of ISO 2022), as well as 8bit or multiple
458 octet character encodings MUST use an appropriate character set
459 specification to be consistent with MIME.
460
461 The complete US-ASCII character set is listed in ANSI X3.4- 1986.
462 Note that the control characters including DEL (0-31, 127) have no
463 defined meaning in apart from the combination CRLF (US-ASCII values
464 13 and 10) indicating a new line. Two of the characters have de
465 facto meanings in wide use: FF (12) often means "start subsequent
466 text on the beginning of a new page"; and TAB or HT (9) often (though
467 not always) means "move the cursor to the next available column after
468 the current position where the column number is a multiple of 8
469 (counting the first column as column 0)." Aside from these
470 conventions, any use of the control characters or DEL in a body must
471 either occur
472
473 (1) because a subtype of text other than "plain"
474 specifically assigns some additional meaning, or
475
476 (2) within the context of a private agreement between the
477 sender and recipient. Such private agreements are
478 discouraged and should be replaced by the other
479 capabilities of this document.
480
481 NOTE: An enormous proliferation of character sets exist beyond US-
482 ASCII. A large number of partially or totally overlapping character
483 sets is NOT a good thing. A SINGLE character set that can be used
484 universally for representing all of the world's languages in Internet
485 mail would be preferrable. Unfortunately, existing practice in
486 several communities seems to point to the continued use of multiple
487 character sets in the near future. A small number of standard
488 character sets are, therefore, defined for Internet use in this
489 document.
490
491 The defined charset values are:
492
493 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
494
495 (2) ISO-8859-X -- where "X" is to be replaced, as
496 necessary, for the parts of ISO-8859 [ISO-8859]. Note
497 that the ISO 646 character sets have deliberately been
498 omitted in favor of their 8859 replacements, which are
499 the designated character sets for Internet mail. As of
500 the publication of this document, the legitimate values
501 for "X" are the digits 1 through 10.
502
503
504
505
506Freed & Borenstein Standards Track [Page 9]
507
508RFC 2046 Media Types November 1996
509
510
511 Characters in the range 128-159 has no assigned meaning in ISO-8859-
512 X. Characters with values below 128 in ISO-8859-X have the same
513 assigned meaning as they do in US-ASCII.
514
515 Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
516 alphabet) includes both characters for which the normal writing
517 direction is right to left and characters for which it is left to
518 right, but do not define a canonical ordering method for representing
519 bi-directional text. The charset values "ISO-8859-6" and "ISO-8859-
520 8", however, specify that the visual method is used [RFC-1556].
521
522 All of these character sets are used as pure 7bit or 8bit sets
523 without any shift or escape functions. The meaning of shift and
524 escape sequences in these character sets is not defined.
525
526 The character sets specified above are the ones that were relatively
527 uncontroversial during the drafting of MIME. This document does not
528 endorse the use of any particular character set other than US-ASCII,
529 and recognizes that the future evolution of world character sets
530 remains unclear.
531
532 Note that the character set used, if anything other than US- ASCII,
533 must always be explicitly specified in the Content-Type field.
534
535 No character set name other than those defined above may be used in
536 Internet mail without the publication of a formal specification and
537 its registration with IANA, or by private agreement, in which case
538 the character set name must begin with "X-".
539
540 Implementors are discouraged from defining new character sets unless
541 absolutely necessary.
542
543 The "charset" parameter has been defined primarily for the purpose of
544 textual data, and is described in this section for that reason.
545 However, it is conceivable that non-textual data might also wish to
546 specify a charset value for some purpose, in which case the same
547 syntax and values should be used.
548
549 In general, composition software should always use the "lowest common
550 denominator" character set possible. For example, if a body contains
551 only US-ASCII characters, it SHOULD be marked as being in the US-
552 ASCII character set, not ISO-8859-1, which, like all the ISO-8859
553 family of character sets, is a superset of US-ASCII. More generally,
554 if a widely-used character set is a subset of another character set,
555 and a body contains only characters in the widely-used subset, it
556 should be labelled as being in that subset. This will increase the
557 chances that the recipient will be able to view the resulting entity
558 correctly.
559
560
561
562Freed & Borenstein Standards Track [Page 10]
563
564RFC 2046 Media Types November 1996
565
566
5674.1.3. Plain Subtype
568
569 The simplest and most important subtype of "text" is "plain". This
570 indicates plain text that does not contain any formatting commands or
571 directives. Plain text is intended to be displayed "as-is", that is,
572 no interpretation of embedded formatting commands, font attribute
573 specifications, processing instructions, interpretation directives,
574 or content markup should be necessary for proper display. The
575 default media type of "text/plain; charset=us-ascii" for Internet
576 mail describes existing Internet practice. That is, it is the type
577 of body defined by RFC 822.
578
579 No other "text" subtype is defined by this document.
580
5814.1.4. Unrecognized Subtypes
582
583 Unrecognized subtypes of "text" should be treated as subtype "plain"
584 as long as the MIME implementation knows how to handle the charset.
585 Unrecognized subtypes which also specify an unrecognized charset
586 should be treated as "application/octet- stream".
587
5884.2. Image Media Type
589
590 A media type of "image" indicates that the body contains an image.
591 The subtype names the specific image format. These names are not
592 case sensitive. An initial subtype is "jpeg" for the JPEG format
593 using JFIF encoding [JPEG].
594
595 The list of "image" subtypes given here is neither exclusive nor
596 exhaustive, and is expected to grow as more types are registered with
597 IANA, as described in RFC 2048.
598
599 Unrecognized subtypes of "image" should at a miniumum be treated as
600 "application/octet-stream". Implementations may optionally elect to
601 pass subtypes of "image" that they do not specifically recognize to a
602 secure and robust general-purpose image viewing application, if such
603 an application is available.
604
605 NOTE: Using of a generic-purpose image viewing application this way
606 inherits the security problems of the most dangerous type supported
607 by the application.
608
6094.3. Audio Media Type
610
611 A media type of "audio" indicates that the body contains audio data.
612 Although there is not yet a consensus on an "ideal" audio format for
613 use with computers, there is a pressing need for a format capable of
614 providing interoperable behavior.
615
616
617
618Freed & Borenstein Standards Track [Page 11]
619
620RFC 2046 Media Types November 1996
621
622
623 The initial subtype of "basic" is specified to meet this requirement
624 by providing an absolutely minimal lowest common denominator audio
625 format. It is expected that richer formats for higher quality and/or
626 lower bandwidth audio will be defined by a later document.
627
628 The content of the "audio/basic" subtype is single channel audio
629 encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.
630
631 Unrecognized subtypes of "audio" should at a miniumum be treated as
632 "application/octet-stream". Implementations may optionally elect to
633 pass subtypes of "audio" that they do not specifically recognize to a
634 robust general-purpose audio playing application, if such an
635 application is available.
636
6374.4. Video Media Type
638
639 A media type of "video" indicates that the body contains a time-
640 varying-picture image, possibly with color and coordinated sound.
641 The term 'video' is used in its most generic sense, rather than with
642 reference to any particular technology or format, and is not meant to
643 preclude subtypes such as animated drawings encoded compactly. The
644 subtype "mpeg" refers to video coded according to the MPEG standard
645 [MPEG].
646
647 Note that although in general this document strongly discourages the
648 mixing of multiple media in a single body, it is recognized that many
649 so-called video formats include a representation for synchronized
650 audio, and this is explicitly permitted for subtypes of "video".
651
652 Unrecognized subtypes of "video" should at a minumum be treated as
653 "application/octet-stream". Implementations may optionally elect to
654 pass subtypes of "video" that they do not specifically recognize to a
655 robust general-purpose video display application, if such an
656 application is available.
657
6584.5. Application Media Type
659
660 The "application" media type is to be used for discrete data which do
661 not fit in any of the other categories, and particularly for data to
662 be processed by some type of application program. This is
663 information which must be processed by an application before it is
664 viewable or usable by a user. Expected uses for the "application"
665 media type include file transfer, spreadsheets, data for mail-based
666 scheduling systems, and languages for "active" (computational)
667 material. (The latter, in particular, can pose security problems
668 which must be understood by implementors, and are considered in
669 detail in the discussion of the "application/PostScript" media type.)
670
671
672
673
674Freed & Borenstein Standards Track [Page 12]
675
676RFC 2046 Media Types November 1996
677
678
679 For example, a meeting scheduler might define a standard
680 representation for information about proposed meeting dates. An
681 intelligent user agent would use this information to conduct a dialog
682 with the user, and might then send additional material based on that
683 dialog. More generally, there have been several "active" messaging
684 languages developed in which programs in a suitably specialized
685 language are transported to a remote location and automatically run
686 in the recipient's environment.
687
688 Such applications may be defined as subtypes of the "application"
689 media type. This document defines two subtypes:
690
691 octet-stream, and PostScript.
692
693 The subtype of "application" will often be either the name or include
694 part of the name of the application for which the data are intended.
695 This does not mean, however, that any application program name may be
696 used freely as a subtype of "application".
697
6984.5.1. Octet-Stream Subtype
699
700 The "octet-stream" subtype is used to indicate that a body contains
701 arbitrary binary data. The set of currently defined parameters is:
702
703 (1) TYPE -- the general type or category of binary data.
704 This is intended as information for the human recipient
705 rather than for any automatic processing.
706
707 (2) PADDING -- the number of bits of padding that were
708 appended to the bit-stream comprising the actual
709 contents to produce the enclosed 8bit byte-oriented
710 data. This is useful for enclosing a bit-stream in a
711 body when the total number of bits is not a multiple of
712 8.
713
714 Both of these parameters are optional.
715
716 An additional parameter, "CONVERSIONS", was defined in RFC 1341 but
717 has since been removed. RFC 1341 also defined the use of a "NAME"
718 parameter which gave a suggested file name to be used if the data
719 were to be written to a file. This has been deprecated in
720 anticipation of a separate Content-Disposition header field, to be
721 defined in a subsequent RFC.
722
723 The recommended action for an implementation that receives an
724 "application/octet-stream" entity is to simply offer to put the data
725 in a file, with any Content-Transfer-Encoding undone, or perhaps to
726 use it as input to a user-specified process.
727
728
729
730Freed & Borenstein Standards Track [Page 13]
731
732RFC 2046 Media Types November 1996
733
734
735 To reduce the danger of transmitting rogue programs, it is strongly
736 recommended that implementations NOT implement a path-search
737 mechanism whereby an arbitrary program named in the Content-Type
738 parameter (e.g., an "interpreter=" parameter) is found and executed
739 using the message body as input.
740
7414.5.2. PostScript Subtype
742
743 A media type of "application/postscript" indicates a PostScript
744 program. Currently two variants of the PostScript language are
745 allowed; the original level 1 variant is described in [POSTSCRIPT]
746 and the more recent level 2 variant is described in [POSTSCRIPT2].
747
748 PostScript is a registered trademark of Adobe Systems, Inc. Use of
749 the MIME media type "application/postscript" implies recognition of
750 that trademark and all the rights it entails.
751
752 The PostScript language definition provides facilities for internal
753 labelling of the specific language features a given program uses.
754 This labelling, called the PostScript document structuring
755 conventions, or DSC, is very general and provides substantially more
756 information than just the language level. The use of document
757 structuring conventions, while not required, is strongly recommended
758 as an aid to interoperability. Documents which lack proper
759 structuring conventions cannot be tested to see whether or not they
760 will work in a given environment. As such, some systems may assume
761 the worst and refuse to process unstructured documents.
762
763 The execution of general-purpose PostScript interpreters entails
764 serious security risks, and implementors are discouraged from simply
765 sending PostScript bodies to "off- the-shelf" interpreters. While it
766 is usually safe to send PostScript to a printer, where the potential
767 for harm is greatly constrained by typical printer environments,
768 implementors should consider all of the following before they add
769 interactive display of PostScript bodies to their MIME readers.
770
771 The remainder of this section outlines some, though probably not all,
772 of the possible problems with the transport of PostScript entities.
773
774 (1) Dangerous operations in the PostScript language
775 include, but may not be limited to, the PostScript
776 operators "deletefile", "renamefile", "filenameforall",
777 and "file". "File" is only dangerous when applied to
778 something other than standard input or output.
779 Implementations may also define additional nonstandard
780 file operators; these may also pose a threat to
781 security. "Filenameforall", the wildcard file search
782 operator, may appear at first glance to be harmless.
783
784
785
786Freed & Borenstein Standards Track [Page 14]
787
788RFC 2046 Media Types November 1996
789
790
791 Note, however, that this operator has the potential to
792 reveal information about what files the recipient has
793 access to, and this information may itself be
794 sensitive. Message senders should avoid the use of
795 potentially dangerous file operators, since these
796 operators are quite likely to be unavailable in secure
797 PostScript implementations. Message receiving and
798 displaying software should either completely disable
799 all potentially dangerous file operators or take
800 special care not to delegate any special authority to
801 their operation. These operators should be viewed as
802 being done by an outside agency when interpreting
803 PostScript documents. Such disabling and/or checking
804 should be done completely outside of the reach of the
805 PostScript language itself; care should be taken to
806 insure that no method exists for re-enabling full-
807 function versions of these operators.
808
809 (2) The PostScript language provides facilities for exiting
810 the normal interpreter, or server, loop. Changes made
811 in this "outer" environment are customarily retained
812 across documents, and may in some cases be retained
813 semipermanently in nonvolatile memory. The operators
814 associated with exiting the interpreter loop have the
815 potential to interfere with subsequent document
816 processing. As such, their unrestrained use
817 constitutes a threat of service denial. PostScript
818 operators that exit the interpreter loop include, but
819 may not be limited to, the exitserver and startjob
820 operators. Message sending software should not
821 generate PostScript that depends on exiting the
822 interpreter loop to operate, since the ability to exit
823 will probably be unavailable in secure PostScript
824 implementations. Message receiving and displaying
825 software should completely disable the ability to make
826 retained changes to the PostScript environment by
827 eliminating or disabling the "startjob" and
828 "exitserver" operations. If these operations cannot be
829 eliminated or completely disabled the password
830 associated with them should at least be set to a hard-
831 to-guess value.
832
833 (3) PostScript provides operators for setting system-wide
834 and device-specific parameters. These parameter
835 settings may be retained across jobs and may
836 potentially pose a threat to the correct operation of
837 the interpreter. The PostScript operators that set
838 system and device parameters include, but may not be
839
840
841
842Freed & Borenstein Standards Track [Page 15]
843
844RFC 2046 Media Types November 1996
845
846
847 limited to, the "setsystemparams" and "setdevparams"
848 operators. Message sending software should not
849 generate PostScript that depends on the setting of
850 system or device parameters to operate correctly. The
851 ability to set these parameters will probably be
852 unavailable in secure PostScript implementations.
853 Message receiving and displaying software should
854 disable the ability to change system and device
855 parameters. If these operators cannot be completely
856 disabled the password associated with them should at
857 least be set to a hard-to-guess value.
858
859 (4) Some PostScript implementations provide nonstandard
860 facilities for the direct loading and execution of
861 machine code. Such facilities are quite obviously open
862 to substantial abuse. Message sending software should
863 not make use of such features. Besides being totally
864 hardware-specific, they are also likely to be
865 unavailable in secure implementations of PostScript.
866 Message receiving and displaying software should not
867 allow such operators to be used if they exist.
868
869 (5) PostScript is an extensible language, and many, if not
870 most, implementations of it provide a number of their
871 own extensions. This document does not deal with such
872 extensions explicitly since they constitute an unknown
873 factor. Message sending software should not make use
874 of nonstandard extensions; they are likely to be
875 missing from some implementations. Message receiving
876 and displaying software should make sure that any
877 nonstandard PostScript operators are secure and don't
878 present any kind of threat.
879
880 (6) It is possible to write PostScript that consumes huge
881 amounts of various system resources. It is also
882 possible to write PostScript programs that loop
883 indefinitely. Both types of programs have the
884 potential to cause damage if sent to unsuspecting
885 recipients. Message-sending software should avoid the
886 construction and dissemination of such programs, which
887 is antisocial. Message receiving and displaying
888 software should provide appropriate mechanisms to abort
889 processing after a reasonable amount of time has
890 elapsed. In addition, PostScript interpreters should be
891 limited to the consumption of only a reasonable amount
892 of any given system resource.
893
894
895
896
897
898Freed & Borenstein Standards Track [Page 16]
899
900RFC 2046 Media Types November 1996
901
902
903 (7) It is possible to include raw binary information inside
904 PostScript in various forms. This is not recommended
905 for use in Internet mail, both because it is not
906 supported by all PostScript interpreters and because it
907 significantly complicates the use of a MIME Content-
908 Transfer-Encoding. (Without such binary, PostScript
909 may typically be viewed as line-oriented data. The
910 treatment of CRLF sequences becomes extremely
911 problematic if binary and line-oriented data are mixed
912 in a single Postscript data stream.)
913
914 (8) Finally, bugs may exist in some PostScript interpreters
915 which could possibly be exploited to gain unauthorized
916 access to a recipient's system. Apart from noting this
917 possibility, there is no specific action to take to
918 prevent this, apart from the timely correction of such
919 bugs if any are found.
920
9214.5.3. Other Application Subtypes
922
923 It is expected that many other subtypes of "application" will be
924 defined in the future. MIME implementations must at a minimum treat
925 any unrecognized subtypes as being equivalent to "application/octet-
926 stream".
927
9285. Composite Media Type Values
929
930 The remaining two of the seven initial Content-Type values refer to
931 composite entities. Composite entities are handled using MIME
932 mechanisms -- a MIME processor typically handles the body directly.
933
9345.1. Multipart Media Type
935
936 In the case of multipart entities, in which one or more different
937 sets of data are combined in a single body, a "multipart" media type
938 field must appear in the entity's header. The body must then contain
939 one or more body parts, each preceded by a boundary delimiter line,
940 and the last one followed by a closing boundary delimiter line.
941 After its boundary delimiter line, each body part then consists of a
942 header area, a blank line, and a body area. Thus a body part is
943 similar to an RFC 822 message in syntax, but different in meaning.
944
945 A body part is an entity and hence is NOT to be interpreted as
946 actually being an RFC 822 message. To begin with, NO header fields
947 are actually required in body parts. A body part that starts with a
948 blank line, therefore, is allowed and is a body part for which all
949 default values are to be assumed. In such a case, the absence of a
950 Content-Type header usually indicates that the corresponding body has
951
952
953
954Freed & Borenstein Standards Track [Page 17]
955
956RFC 2046 Media Types November 1996
957
958
959 a content-type of "text/plain; charset=US-ASCII".
960
961 The only header fields that have defined meaning for body parts are
962 those the names of which begin with "Content-". All other header
963 fields may be ignored in body parts. Although they should generally
964 be retained if at all possible, they may be discarded by gateways if
965 necessary. Such other fields are permitted to appear in body parts
966 but must not be depended on. "X-" fields may be created for
967 experimental or private purposes, with the recognition that the
968 information they contain may be lost at some gateways.
969
970 NOTE: The distinction between an RFC 822 message and a body part is
971 subtle, but important. A gateway between Internet and X.400 mail,
972 for example, must be able to tell the difference between a body part
973 that contains an image and a body part that contains an encapsulated
974 message, the body of which is a JPEG image. In order to represent
975 the latter, the body part must have "Content-Type: message/rfc822",
976 and its body (after the blank line) must be the encapsulated message,
977 with its own "Content-Type: image/jpeg" header field. The use of
978 similar syntax facilitates the conversion of messages to body parts,
979 and vice versa, but the distinction between the two must be
980 understood by implementors. (For the special case in which parts
981 actually are messages, a "digest" subtype is also defined.)
982
983 As stated previously, each body part is preceded by a boundary
984 delimiter line that contains the boundary delimiter. The boundary
985 delimiter MUST NOT appear inside any of the encapsulated parts, on a
986 line by itself or as the prefix of any line. This implies that it is
987 crucial that the composing agent be able to choose and specify a
988 unique boundary parameter value that does not contain the boundary
989 parameter value of an enclosing multipart as a prefix.
990
991 All present and future subtypes of the "multipart" type must use an
992 identical syntax. Subtypes may differ in their semantics, and may
993 impose additional restrictions on syntax, but must conform to the
994 required syntax for the "multipart" type. This requirement ensures
995 that all conformant user agents will at least be able to recognize
996 and separate the parts of any multipart entity, even those of an
997 unrecognized subtype.
998
999 As stated in the definition of the Content-Transfer-Encoding field
1000 [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
1001 permitted for entities of type "multipart". The "multipart" boundary
1002 delimiters and header fields are always represented as 7bit US-ASCII
1003 in any case (though the header fields may encode non-US-ASCII header
1004 text as per RFC 2047) and data within the body parts can be encoded
1005 on a part-by-part basis, with Content-Transfer-Encoding fields for
1006 each appropriate body part.
1007
1008
1009
1010Freed & Borenstein Standards Track [Page 18]
1011
1012RFC 2046 Media Types November 1996
1013
1014
10155.1.1. Common Syntax
1016
1017 This section defines a common syntax for subtypes of "multipart".
1018 All subtypes of "multipart" must use this syntax. A simple example
1019 of a multipart message also appears in this section. An example of a
1020 more complex multipart message is given in RFC 2049.
1021
1022 The Content-Type field for multipart entities requires one parameter,
1023 "boundary". The boundary delimiter line is then defined as a line
1024 consisting entirely of two hyphen characters ("-", decimal value 45)
1025 followed by the boundary parameter value from the Content-Type header
1026 field, optional linear whitespace, and a terminating CRLF.
1027
1028 NOTE: The hyphens are for rough compatibility with the earlier RFC
1029 934 method of message encapsulation, and for ease of searching for
1030 the boundaries in some implementations. However, it should be noted
1031 that multipart messages are NOT completely compatible with RFC 934
1032 encapsulations; in particular, they do not obey RFC 934 quoting
1033 conventions for embedded lines that begin with hyphens. This
1034 mechanism was chosen over the RFC 934 mechanism because the latter
1035 causes lines to grow with each level of quoting. The combination of
1036 this growth with the fact that SMTP implementations sometimes wrap
1037 long lines made the RFC 934 mechanism unsuitable for use in the event
1038 that deeply-nested multipart structuring is ever desired.
1039
1040 WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-
1041 type field is such that it is often necessary to enclose the boundary
1042 parameter values in quotes on the Content-type line. This is not
1043 always necessary, but never hurts. Implementors should be sure to
1044 study the grammar carefully in order to avoid producing invalid
1045 Content-type fields. Thus, a typical "multipart" Content-Type header
1046 field might look like this:
1047
1048 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p
1049
1050 But the following is not valid:
1051
1052 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p
1053
1054 (because of the colon) and must instead be represented as
1055
1056 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"
1057
1058 This Content-Type value indicates that the content consists of one or
1059 more parts, each with a structure that is syntactically identical to
1060 an RFC 822 message, except that the header area is allowed to be
1061 completely empty, and that the parts are each preceded by the line
1062
1063
1064
1065
1066Freed & Borenstein Standards Track [Page 19]
1067
1068RFC 2046 Media Types November 1996
1069
1070
1071 --gc0pJq0M:08jU534c0p
1072
1073 The boundary delimiter MUST occur at the beginning of a line, i.e.,
1074 following a CRLF, and the initial CRLF is considered to be attached
1075 to the boundary delimiter line rather than part of the preceding
1076 part. The boundary may be followed by zero or more characters of
1077 linear whitespace. It is then terminated by either another CRLF and
1078 the header fields for the next part, or by two CRLFs, in which case
1079 there are no header fields for the next part. If no Content-Type
1080 field is present it is assumed to be "message/rfc822" in a
1081 "multipart/digest" and "text/plain" otherwise.
1082
1083 NOTE: The CRLF preceding the boundary delimiter line is conceptually
1084 attached to the boundary so that it is possible to have a part that
1085 does not end with a CRLF (line break). Body parts that must be
1086 considered to end with line breaks, therefore, must have two CRLFs
1087 preceding the boundary delimiter line, the first of which is part of
1088 the preceding body part, and the second of which is part of the
1089 encapsulation boundary.
1090
1091 Boundary delimiters must not appear within the encapsulated material,
1092 and must be no longer than 70 characters, not counting the two
1093 leading hyphens.
1094
1095 The boundary delimiter line following the last body part is a
1096 distinguished delimiter that indicates that no further body parts
1097 will follow. Such a delimiter line is identical to the previous
1098 delimiter lines, with the addition of two more hyphens after the
1099 boundary parameter value.
1100
1101 --gc0pJq0M:08jU534c0p--
1102
1103 NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the ../message/part.go:380
1104 boundary value with the beginning of each candidate line. An exact
1105 match of the entire candidate line is not required; it is sufficient
1106 that the boundary appear in its entirety following the CRLF.
1107
1108 There appears to be room for additional information prior to the
1109 first boundary delimiter line and following the final boundary
1110 delimiter line. These areas should generally be left blank, and
1111 implementations must ignore anything that appears before the first
1112 boundary delimiter line or after the last one.
1113
1114 NOTE: These "preamble" and "epilogue" areas are generally not used
1115 because of the lack of proper typing of these parts and the lack of
1116 clear semantics for handling these areas at gateways, particularly
1117 X.400 gateways. However, rather than leaving the preamble area
1118 blank, many MIME implementations have found this to be a convenient
1119
1120
1121
1122Freed & Borenstein Standards Track [Page 20]
1123
1124RFC 2046 Media Types November 1996
1125
1126
1127 place to insert an explanatory note for recipients who read the
1128 message with pre-MIME software, since such notes will be ignored by
1129 MIME-compliant software.
1130
1131 NOTE: Because boundary delimiters must not appear in the body parts
1132 being encapsulated, a user agent must exercise care to choose a
1133 unique boundary parameter value. The boundary parameter value in the
1134 example above could have been the result of an algorithm designed to
1135 produce boundary delimiters with a very low probability of already
1136 existing in the data to be encapsulated without having to prescan the
1137 data. Alternate algorithms might result in more "readable" boundary
1138 delimiters for a recipient with an old user agent, but would require
1139 more attention to the possibility that the boundary delimiter might
1140 appear at the beginning of some line in the encapsulated part. The
1141 simplest boundary delimiter line possible is something like "---",
1142 with a closing boundary delimiter line of "-----".
1143
1144 As a very simple example, the following multipart message has two
1145 parts, both of them plain text, one of them explicitly typed and one
1146 of them implicitly typed:
1147
1148 From: Nathaniel Borenstein <nsb@bellcore.com> ../message/part_test.go:237
1149 To: Ned Freed <ned@innosoft.com>
1150 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)
1151 Subject: Sample message
1152 MIME-Version: 1.0
1153 Content-type: multipart/mixed; boundary="simple boundary"
1154
1155 This is the preamble. It is to be ignored, though it
1156 is a handy place for composition agents to include an
1157 explanatory note to non-MIME conformant readers.
1158
1159 --simple boundary
1160
1161 This is implicitly typed plain US-ASCII text.
1162 It does NOT end with a linebreak.
1163 --simple boundary
1164 Content-type: text/plain; charset=us-ascii
1165
1166 This is explicitly typed plain US-ASCII text.
1167 It DOES end with a linebreak.
1168
1169 --simple boundary--
1170
1171 This is the epilogue. It is also to be ignored.
1172
1173
1174
1175
1176
1177
1178Freed & Borenstein Standards Track [Page 21]
1179
1180RFC 2046 Media Types November 1996
1181
1182
1183 The use of a media type of "multipart" in a body part within another
1184 "multipart" entity is explicitly allowed. In such cases, for obvious
1185 reasons, care must be taken to ensure that each nested "multipart"
1186 entity uses a different boundary delimiter. See RFC 2049 for an
1187 example of nested "multipart" entities.
1188
1189 The use of the "multipart" media type with only a single body part
1190 may be useful in certain contexts, and is explicitly permitted.
1191
1192 NOTE: Experience has shown that a "multipart" media type with a
1193 single body part is useful for sending non-text media types. It has
1194 the advantage of providing the preamble as a place to include
1195 decoding instructions. In addition, a number of SMTP gateways move
1196 or remove the MIME headers, and a clever MIME decoder can take a good
1197 guess at multipart boundaries even in the absence of the Content-Type
1198 header and thereby successfully decode the message.
1199
1200 The only mandatory global parameter for the "multipart" media type is
1201 the boundary parameter, which consists of 1 to 70 characters from a
1202 set of characters known to be very robust through mail gateways, and
1203 NOT ending with white space. (If a boundary delimiter line appears to
1204 end with white space, the white space must be presumed to have been
1205 added by a gateway, and must be deleted.) It is formally specified
1206 by the following BNF:
1207
1208 boundary := 0*69<bchars> bcharsnospace
1209
1210 bchars := bcharsnospace / " "
1211
1212 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
1213 "+" / "_" / "," / "-" / "." /
1214 "/" / ":" / "=" / "?"
1215
1216 Overall, the body of a "multipart" entity may be specified as
1217 follows:
1218
1219 dash-boundary := "--" boundary
1220 ; boundary taken from the value of
1221 ; boundary parameter of the
1222 ; Content-Type field.
1223
1224 multipart-body := [preamble CRLF]
1225 dash-boundary transport-padding CRLF
1226 body-part *encapsulation
1227 close-delimiter transport-padding
1228 [CRLF epilogue]
1229
1230
1231
1232
1233
1234Freed & Borenstein Standards Track [Page 22]
1235
1236RFC 2046 Media Types November 1996
1237
1238
1239 transport-padding := *LWSP-char
1240 ; Composers MUST NOT generate
1241 ; non-zero length transport
1242 ; padding, but receivers MUST
1243 ; be able to handle padding
1244 ; added by message transports.
1245
1246 encapsulation := delimiter transport-padding
1247 CRLF body-part
1248
1249 delimiter := CRLF dash-boundary
1250
1251 close-delimiter := delimiter "--"
1252
1253 preamble := discard-text
1254
1255 epilogue := discard-text
1256
1257 discard-text := *(*text CRLF) *text
1258 ; May be ignored or discarded.
1259
1260 body-part := MIME-part-headers [CRLF *OCTET]
1261 ; Lines in a body-part must not start
1262 ; with the specified dash-boundary and
1263 ; the delimiter must not appear anywhere
1264 ; in the body part. Note that the
1265 ; semantics of a body-part differ from
1266 ; the semantics of a message, as
1267 ; described in the text.
1268
1269 OCTET := <any 0-255 octet value>
1270
1271 IMPORTANT: The free insertion of linear-white-space and RFC 822
1272 comments between the elements shown in this BNF is NOT allowed since
1273 this BNF does not specify a structured header field.
1274
1275 NOTE: In certain transport enclaves, RFC 822 restrictions such as
1276 the one that limits bodies to printable US-ASCII characters may not
1277 be in force. (That is, the transport domains may exist that resemble
1278 standard Internet mail transport as specified in RFC 821 and assumed
1279 by RFC 822, but without certain restrictions.) The relaxation of
1280 these restrictions should be construed as locally extending the
1281 definition of bodies, for example to include octets outside of the
1282 US-ASCII range, as long as these extensions are supported by the
1283 transport and adequately documented in the Content- Transfer-Encoding
1284 header field. However, in no event are headers (either message
1285 headers or body part headers) allowed to contain anything other than
1286 US-ASCII characters.
1287
1288
1289
1290Freed & Borenstein Standards Track [Page 23]
1291
1292RFC 2046 Media Types November 1996
1293
1294
1295 NOTE: Conspicuously missing from the "multipart" type is a notion of
1296 structured, related body parts. It is recommended that those wishing
1297 to provide more structured or integrated multipart messaging
1298 facilities should define subtypes of multipart that are syntactically
1299 identical but define relationships between the various parts. For
1300 example, subtypes of multipart could be defined that include a
1301 distinguished part which in turn is used to specify the relationships
1302 between the other parts, probably referring to them by their
1303 Content-ID field. Old implementations will not recognize the new
1304 subtype if this approach is used, but will treat it as
1305 multipart/mixed and will thus be able to show the user the parts that
1306 are recognized.
1307
13085.1.2. Handling Nested Messages and Multiparts
1309
1310 The "message/rfc822" subtype defined in a subsequent section of this
1311 document has no terminating condition other than running out of data.
1312 Similarly, an improperly truncated "multipart" entity may not have
1313 any terminating boundary marker, and can turn up operationally due to
1314 mail system malfunctions.
1315
1316 It is essential that such entities be handled correctly when they are
1317 themselves imbedded inside of another "multipart" structure. MIME
1318 implementations are therefore required to recognize outer level
1319 boundary markers at ANY level of inner nesting. It is not sufficient
1320 to only check for the next expected marker or other terminating
1321 condition.
1322
13235.1.3. Mixed Subtype
1324
1325 The "mixed" subtype of "multipart" is intended for use when the body
1326 parts are independent and need to be bundled in a particular order.
1327 Any "multipart" subtypes that an implementation does not recognize
1328 must be treated as being of subtype "mixed".
1329
13305.1.4. Alternative Subtype
1331
1332 The "multipart/alternative" type is syntactically identical to
1333 "multipart/mixed", but the semantics are different. In particular,
1334 each of the body parts is an "alternative" version of the same
1335 information.
1336
1337 Systems should recognize that the content of the various parts are
1338 interchangeable. Systems should choose the "best" type based on the
1339 local environment and references, in some cases even through user
1340 interaction. As with "multipart/mixed", the order of body parts is
1341 significant. In this case, the alternatives appear in an order of
1342 increasing faithfulness to the original content. In general, the
1343
1344
1345
1346Freed & Borenstein Standards Track [Page 24]
1347
1348RFC 2046 Media Types November 1996
1349
1350
1351 best choice is the LAST part of a type supported by the recipient
1352 system's local environment.
1353
1354 "Multipart/alternative" may be used, for example, to send a message
1355 in a fancy text format in such a way that it can easily be displayed
1356 anywhere:
1357
1358 From: Nathaniel Borenstein <nsb@bellcore.com>
1359 To: Ned Freed <ned@innosoft.com>
1360 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
1361 Subject: Formatted text mail
1362 MIME-Version: 1.0
1363 Content-Type: multipart/alternative; boundary=boundary42
1364
1365 --boundary42
1366 Content-Type: text/plain; charset=us-ascii
1367
1368 ... plain text version of message goes here ...
1369
1370 --boundary42
1371 Content-Type: text/enriched
1372
1373 ... RFC 1896 text/enriched version of same message
1374 goes here ...
1375
1376 --boundary42
1377 Content-Type: application/x-whatever
1378
1379 ... fanciest version of same message goes here ...
1380
1381 --boundary42--
1382
1383 In this example, users whose mail systems understood the
1384 "application/x-whatever" format would see only the fancy version,
1385 while other users would see only the enriched or plain text version,
1386 depending on the capabilities of their system.
1387
1388 In general, user agents that compose "multipart/alternative" entities
1389 must place the body parts in increasing order of preference, that is,
1390 with the preferred format last. For fancy text, the sending user
1391 agent should put the plainest format first and the richest format
1392 last. Receiving user agents should pick and display the last format
1393 they are capable of displaying. In the case where one of the
1394 alternatives is itself of type "multipart" and contains unrecognized
1395 sub-parts, the user agent may choose either to show that alternative,
1396 an earlier alternative, or both.
1397
1398
1399
1400
1401
1402Freed & Borenstein Standards Track [Page 25]
1403
1404RFC 2046 Media Types November 1996
1405
1406
1407 NOTE: From an implementor's perspective, it might seem more sensible
1408 to reverse this ordering, and have the plainest alternative last.
1409 However, placing the plainest alternative first is the friendliest
1410 possible option when "multipart/alternative" entities are viewed
1411 using a non-MIME-conformant viewer. While this approach does impose
1412 some burden on conformant MIME viewers, interoperability with older
1413 mail readers was deemed to be more important in this case.
1414
1415 It may be the case that some user agents, if they can recognize more
1416 than one of the formats, will prefer to offer the user the choice of
1417 which format to view. This makes sense, for example, if a message
1418 includes both a nicely- formatted image version and an easily-edited
1419 text version. What is most critical, however, is that the user not
1420 automatically be shown multiple versions of the same data. Either
1421 the user should be shown the last recognized version or should be
1422 given the choice.
1423
1424 THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a
1425 "multipart/alternative" entity represents the same data, but the
1426 mappings between the two are not necessarily without information
1427 loss. For example, information is lost when translating ODA to
1428 PostScript or plain text. It is recommended that each part should
1429 have a different Content-ID value in the case where the information
1430 content of the two parts is not identical. And when the information
1431 content is identical -- for example, where several parts of type
1432 "message/external-body" specify alternate ways to access the
1433 identical data -- the same Content-ID field value should be used, to
1434 optimize any caching mechanisms that might be present on the
1435 recipient's end. However, the Content-ID values used by the parts
1436 should NOT be the same Content-ID value that describes the
1437 "multipart/alternative" as a whole, if there is any such Content-ID
1438 field. That is, one Content-ID value will refer to the
1439 "multipart/alternative" entity, while one or more other Content-ID
1440 values will refer to the parts inside it.
1441
14425.1.5. Digest Subtype
1443
1444 This document defines a "digest" subtype of the "multipart" Content-
1445 Type. This type is syntactically identical to "multipart/mixed", but
1446 the semantics are different. In particular, in a digest, the default
1447 Content-Type value for a body part is changed from "text/plain" to
1448 "message/rfc822". This is done to allow a more readable digest
1449 format that is largely compatible (except for the quoting convention)
1450 with RFC 934.
1451
1452 Note: Though it is possible to specify a Content-Type value for a
1453 body part in a digest which is other than "message/rfc822", such as a
1454 "text/plain" part containing a description of the material in the
1455
1456
1457
1458Freed & Borenstein Standards Track [Page 26]
1459
1460RFC 2046 Media Types November 1996
1461
1462
1463 digest, actually doing so is undesireble. The "multipart/digest"
1464 Content-Type is intended to be used to send collections of messages.
1465 If a "text/plain" part is needed, it should be included as a seperate
1466 part of a "multipart/mixed" message.
1467
1468 A digest in this format might, then, look something like this:
1469
1470 From: Moderator-Address
1471 To: Recipient-List
1472 Date: Mon, 22 Mar 1994 13:34:51 +0000
1473 Subject: Internet Digest, volume 42
1474 MIME-Version: 1.0
1475 Content-Type: multipart/mixed;
1476 boundary="---- main boundary ----"
1477
1478 ------ main boundary ----
1479
1480 ...Introductory text or table of contents...
1481
1482 ------ main boundary ----
1483 Content-Type: multipart/digest;
1484 boundary="---- next message ----"
1485
1486 ------ next message ----
1487
1488 From: someone-else
1489 Date: Fri, 26 Mar 1993 11:13:32 +0200
1490 Subject: my opinion
1491
1492 ...body goes here ...
1493
1494 ------ next message ----
1495
1496 From: someone-else-again
1497 Date: Fri, 26 Mar 1993 10:07:13 -0500
1498 Subject: my different opinion
1499
1500 ... another body goes here ...
1501
1502 ------ next message ------
1503
1504 ------ main boundary ------
1505
15065.1.6. Parallel Subtype
1507
1508 This document defines a "parallel" subtype of the "multipart"
1509 Content-Type. This type is syntactically identical to
1510 "multipart/mixed", but the semantics are different. In particular,
1511
1512
1513
1514Freed & Borenstein Standards Track [Page 27]
1515
1516RFC 2046 Media Types November 1996
1517
1518
1519 in a parallel entity, the order of body parts is not significant.
1520
1521 A common presentation of this type is to display all of the parts
1522 simultaneously on hardware and software that are capable of doing so.
1523 However, composing agents should be aware that many mail readers will
1524 lack this capability and will show the parts serially in any event.
1525
15265.1.7. Other Multipart Subtypes
1527
1528 Other "multipart" subtypes are expected in the future. MIME
1529 implementations must in general treat unrecognized subtypes of
1530 "multipart" as being equivalent to "multipart/mixed".
1531
15325.2. Message Media Type
1533
1534 It is frequently desirable, in sending mail, to encapsulate another
1535 mail message. A special media type, "message", is defined to
1536 facilitate this. In particular, the "rfc822" subtype of "message" is
1537 used to encapsulate RFC 822 messages.
1538
1539 NOTE: It has been suggested that subtypes of "message" might be
1540 defined for forwarded or rejected messages. However, forwarded and
1541 rejected messages can be handled as multipart messages in which the
1542 first part contains any control or descriptive information, and a
1543 second part, of type "message/rfc822", is the forwarded or rejected
1544 message. Composing rejection and forwarding messages in this manner
1545 will preserve the type information on the original message and allow
1546 it to be correctly presented to the recipient, and hence is strongly
1547 encouraged.
1548
1549 Subtypes of "message" often impose restrictions on what encodings are
1550 allowed. These restrictions are described in conjunction with each
1551 specific subtype.
1552
1553 Mail gateways, relays, and other mail handling agents are commonly
1554 known to alter the top-level header of an RFC 822 message. In
1555 particular, they frequently add, remove, or reorder header fields.
1556 These operations are explicitly forbidden for the encapsulated
1557 headers embedded in the bodies of messages of type "message."
1558
15595.2.1. RFC822 Subtype
1560
1561 A media type of "message/rfc822" indicates that the body contains an
1562 encapsulated message, with the syntax of an RFC 822 message.
1563 However, unlike top-level RFC 822 messages, the restriction that each
1564 "message/rfc822" body must include a "From", "Date", and at least one
1565 destination header is removed and replaced with the requirement that
1566 at least one of "From", "Subject", or "Date" must be present.
1567
1568
1569
1570Freed & Borenstein Standards Track [Page 28]
1571
1572RFC 2046 Media Types November 1996
1573
1574
1575 It should be noted that, despite the use of the numbers "822", a
1576 "message/rfc822" entity isn't restricted to material in strict
1577 conformance to RFC822, nor are the semantics of "message/rfc822"
1578 objects restricted to the semantics defined in RFC822. More
1579 specifically, a "message/rfc822" message could well be a News article
1580 or a MIME message.
1581
1582 No encoding other than "7bit", "8bit", or "binary" is permitted for
1583 the body of a "message/rfc822" entity. The message header fields are
1584 always US-ASCII in any case, and data within the body can still be
1585 encoded, in which case the Content-Transfer-Encoding header field in
1586 the encapsulated message will reflect this. Non-US-ASCII text in the
1587 headers of an encapsulated message can be specified using the
1588 mechanisms described in RFC 2047.
1589
15905.2.2. Partial Subtype
1591
1592 The "partial" subtype is defined to allow large entities to be
1593 delivered as several separate pieces of mail and automatically
1594 reassembled by a receiving user agent. (The concept is similar to IP
1595 fragmentation and reassembly in the basic Internet Protocols.) This
1596 mechanism can be used when intermediate transport agents limit the
1597 size of individual messages that can be sent. The media type
1598 "message/partial" thus indicates that the body contains a fragment of
1599 a larger entity.
1600
1601 Because data of type "message" may never be encoded in base64 or
1602 quoted-printable, a problem might arise if "message/partial" entities
1603 are constructed in an environment that supports binary or 8bit
1604 transport. The problem is that the binary data would be split into
1605 multiple "message/partial" messages, each of them requiring binary
1606 transport. If such messages were encountered at a gateway into a
1607 7bit transport environment, there would be no way to properly encode
1608 them for the 7bit world, aside from waiting for all of the fragments,
1609 reassembling the inner message, and then encoding the reassembled
1610 data in base64 or quoted-printable. Since it is possible that
1611 different fragments might go through different gateways, even this is
1612 not an acceptable solution. For this reason, it is specified that
1613 entities of type "message/partial" must always have a content-
1614 transfer-encoding of 7bit (the default). In particular, even in
1615 environments that support binary or 8bit transport, the use of a
1616 content- transfer-encoding of "8bit" or "binary" is explicitly
1617 prohibited for MIME entities of type "message/partial". This in turn
1618 implies that the inner message must not use "8bit" or "binary"
1619 encoding.
1620
1621
1622
1623
1624
1625
1626Freed & Borenstein Standards Track [Page 29]
1627
1628RFC 2046 Media Types November 1996
1629
1630
1631 Because some message transfer agents may choose to automatically
1632 fragment large messages, and because such agents may use very
1633 different fragmentation thresholds, it is possible that the pieces of
1634 a partial message, upon reassembly, may prove themselves to comprise
1635 a partial message. This is explicitly permitted.
1636
1637 Three parameters must be specified in the Content-Type field of type
1638 "message/partial": The first, "id", is a unique identifier, as close
1639 to a world-unique identifier as possible, to be used to match the
1640 fragments together. (In general, the identifier is essentially a
1641 message-id; if placed in double quotes, it can be ANY message-id, in
1642 accordance with the BNF for "parameter" given in RFC 2045.) The
1643 second, "number", an integer, is the fragment number, which indicates
1644 where this fragment fits into the sequence of fragments. The third,
1645 "total", another integer, is the total number of fragments. This
1646 third subfield is required on the final fragment, and is optional
1647 (though encouraged) on the earlier fragments. Note also that these
1648 parameters may be given in any order.
1649
1650 Thus, the second piece of a 3-piece message may have either of the
1651 following header fields:
1652
1653 Content-Type: Message/Partial; number=2; total=3;
1654 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
1655
1656 Content-Type: Message/Partial;
1657 id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
1658 number=2
1659
1660 But the third piece MUST specify the total number of fragments:
1661
1662 Content-Type: Message/Partial; number=3; total=3;
1663 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
1664
1665 Note that fragment numbering begins with 1, not 0.
1666
1667 When the fragments of an entity broken up in this manner are put
1668 together, the result is always a complete MIME entity, which may have
1669 its own Content-Type header field, and thus may contain any other
1670 data type.
1671
16725.2.2.1. Message Fragmentation and Reassembly
1673
1674 The semantics of a reassembled partial message must be those of the
1675 "inner" message, rather than of a message containing the inner
1676 message. This makes it possible, for example, to send a large audio
1677 message as several partial messages, and still have it appear to the
1678 recipient as a simple audio message rather than as an encapsulated
1679
1680
1681
1682Freed & Borenstein Standards Track [Page 30]
1683
1684RFC 2046 Media Types November 1996
1685
1686
1687 message containing an audio message. That is, the encapsulation of
1688 the message is considered to be "transparent".
1689
1690 When generating and reassembling the pieces of a "message/partial"
1691 message, the headers of the encapsulated message must be merged with
1692 the headers of the enclosing entities. In this process the following
1693 rules must be observed:
1694
1695 (1) Fragmentation agents must split messages at line
1696 boundaries only. This restriction is imposed because
1697 splits at points other than the ends of lines in turn
1698 depends on message transports being able to preserve
1699 the semantics of messages that don't end with a CRLF
1700 sequence. Many transports are incapable of preserving
1701 such semantics.
1702
1703 (2) All of the header fields from the initial enclosing
1704 message, except those that start with "Content-" and
1705 the specific header fields "Subject", "Message-ID",
1706 "Encrypted", and "MIME-Version", must be copied, in
1707 order, to the new message.
1708
1709 (3) The header fields in the enclosed message which start
1710 with "Content-", plus the "Subject", "Message-ID",
1711 "Encrypted", and "MIME-Version" fields, must be
1712 appended, in order, to the header fields of the new
1713 message. Any header fields in the enclosed message
1714 which do not start with "Content-" (except for the
1715 "Subject", "Message-ID", "Encrypted", and "MIME-
1716 Version" fields) will be ignored and dropped.
1717
1718 (4) All of the header fields from the second and any
1719 subsequent enclosing messages are discarded by the
1720 reassembly process.
1721
17225.2.2.2. Fragmentation and Reassembly Example
1723
1724 If an audio message is broken into two pieces, the first piece might
1725 look something like this:
1726
1727 X-Weird-Header-1: Foo
1728 From: Bill@host.com
1729 To: joe@otherhost.com
1730 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
1731 Subject: Audio mail (part 1 of 2)
1732 Message-ID: <id1@host.com>
1733 MIME-Version: 1.0
1734 Content-type: message/partial; id="ABC@host.com";
1735
1736
1737
1738Freed & Borenstein Standards Track [Page 31]
1739
1740RFC 2046 Media Types November 1996
1741
1742
1743 number=1; total=2
1744
1745 X-Weird-Header-1: Bar
1746 X-Weird-Header-2: Hello
1747 Message-ID: <anotherid@foo.com>
1748 Subject: Audio mail
1749 MIME-Version: 1.0
1750 Content-type: audio/basic
1751 Content-transfer-encoding: base64
1752
1753 ... first half of encoded audio data goes here ...
1754
1755 and the second half might look something like this:
1756
1757 From: Bill@host.com
1758 To: joe@otherhost.com
1759 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
1760 Subject: Audio mail (part 2 of 2)
1761 MIME-Version: 1.0
1762 Message-ID: <id2@host.com>
1763 Content-type: message/partial;
1764 id="ABC@host.com"; number=2; total=2
1765
1766 ... second half of encoded audio data goes here ...
1767
1768 Then, when the fragmented message is reassembled, the resulting
1769 message to be displayed to the user should look something like this:
1770
1771 X-Weird-Header-1: Foo
1772 From: Bill@host.com
1773 To: joe@otherhost.com
1774 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
1775 Subject: Audio mail
1776 Message-ID: <anotherid@foo.com>
1777 MIME-Version: 1.0
1778 Content-type: audio/basic
1779 Content-transfer-encoding: base64
1780
1781 ... first half of encoded audio data goes here ...
1782 ... second half of encoded audio data goes here ...
1783
1784 The inclusion of a "References" field in the headers of the second
1785 and subsequent pieces of a fragmented message that references the
1786 Message-Id on the previous piece may be of benefit to mail readers
1787 that understand and track references. However, the generation of
1788 such "References" fields is entirely optional.
1789
1790
1791
1792
1793
1794Freed & Borenstein Standards Track [Page 32]
1795
1796RFC 2046 Media Types November 1996
1797
1798
1799 Finally, it should be noted that the "Encrypted" header field has
1800 been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421,
1801 RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless
1802 believed to describe the correct way to treat it if it is encountered
1803 in the context of conversion to and from "message/partial" fragments.
1804
18055.2.3. External-Body Subtype
1806
1807 The external-body subtype indicates that the actual body data are not
1808 included, but merely referenced. In this case, the parameters
1809 describe a mechanism for accessing the external data.
1810
1811 When a MIME entity is of type "message/external-body", it consists of
1812 a header, two consecutive CRLFs, and the message header for the
1813 encapsulated message. If another pair of consecutive CRLFs appears,
1814 this of course ends the message header for the encapsulated message.
1815 However, since the encapsulated message's body is itself external, it
1816 does NOT appear in the area that follows. For example, consider the
1817 following message:
1818
1819 Content-type: message/external-body;
1820 access-type=local-file;
1821 name="/u/nsb/Me.jpeg"
1822
1823 Content-type: image/jpeg
1824 Content-ID: <id42@guppylake.bellcore.com>
1825 Content-Transfer-Encoding: binary
1826
1827 THIS IS NOT REALLY THE BODY!
1828
1829 The area at the end, which might be called the "phantom body", is
1830 ignored for most external-body messages. However, it may be used to
1831 contain auxiliary information for some such messages, as indeed it is
1832 when the access-type is "mail- server". The only access-type defined
1833 in this document that uses the phantom body is "mail-server", but
1834 other access-types may be defined in the future in other
1835 specifications that use this area.
1836
1837 The encapsulated headers in ALL "message/external-body" entities MUST
1838 include a Content-ID header field to give a unique identifier by
1839 which to reference the data. This identifier may be used for caching
1840 mechanisms, and for recognizing the receipt of the data when the
1841 access-type is "mail-server".
1842
1843 Note that, as specified here, the tokens that describe external-body
1844 data, such as file names and mail server commands, are required to be
1845 in the US-ASCII character set.
1846
1847
1848
1849
1850Freed & Borenstein Standards Track [Page 33]
1851
1852RFC 2046 Media Types November 1996
1853
1854
1855 If this proves problematic in practice, a new mechanism may be
1856 required as a future extension to MIME, either as newly defined
1857 access-types for "message/external-body" or by some other mechanism.
1858
1859 As with "message/partial", MIME entities of type "message/external-
1860 body" MUST have a content-transfer-encoding of 7bit (the default).
1861 In particular, even in environments that support binary or 8bit
1862 transport, the use of a content- transfer-encoding of "8bit" or
1863 "binary" is explicitly prohibited for entities of type
1864 "message/external-body".
1865
18665.2.3.1. General External-Body Parameters
1867
1868 The parameters that may be used with any "message/external- body"
1869 are:
1870
1871 (1) ACCESS-TYPE -- A word indicating the supported access
1872 mechanism by which the file or data may be obtained.
1873 This word is not case sensitive. Values include, but
1874 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL-
1875 FILE", and "MAIL-SERVER". Future values, except for
1876 experimental values beginning with "X-", must be
1877 registered with IANA, as described in RFC 2048.
1878 This parameter is unconditionally mandatory and MUST be
1879 present on EVERY "message/external-body".
1880
1881 (2) EXPIRATION -- The date (in the RFC 822 "date-time"
1882 syntax, as extended by RFC 1123 to permit 4 digits in
1883 the year field) after which the existence of the
1884 external data is not guaranteed. This parameter may be
1885 used with ANY access-type and is ALWAYS optional.
1886
1887 (3) SIZE -- The size (in octets) of the data. The intent
1888 of this parameter is to help the recipient decide
1889 whether or not to expend the necessary resources to
1890 retrieve the external data. Note that this describes
1891 the size of the data in its canonical form, that is,
1892 before any Content-Transfer-Encoding has been applied
1893 or after the data have been decoded. This parameter
1894 may be used with ANY access-type and is ALWAYS
1895 optional.
1896
1897 (4) PERMISSION -- A case-insensitive field that indicates
1898 whether or not it is expected that clients might also
1899 attempt to overwrite the data. By default, or if
1900 permission is "read", the assumption is that they are
1901 not, and that if the data is retrieved once, it is
1902 never needed again. If PERMISSION is "read-write",
1903
1904
1905
1906Freed & Borenstein Standards Track [Page 34]
1907
1908RFC 2046 Media Types November 1996
1909
1910
1911 this assumption is invalid, and any local copy must be
1912 considered no more than a cache. "Read" and "Read-
1913 write" are the only defined values of permission. This
1914 parameter may be used with ANY access-type and is
1915 ALWAYS optional.
1916
1917 The precise semantics of the access-types defined here are described
1918 in the sections that follow.
1919
19205.2.3.2. The 'ftp' and 'tftp' Access-Types
1921
1922 An access-type of FTP or TFTP indicates that the message body is
1923 accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783]
1924 protocols, respectively. For these access-types, the following
1925 additional parameters are mandatory:
1926
1927 (1) NAME -- The name of the file that contains the actual
1928 body data.
1929
1930 (2) SITE -- A machine from which the file may be obtained,
1931 using the given protocol. This must be a fully
1932 qualified domain name, not a nickname.
1933
1934 (3) Before any data are retrieved, using FTP, the user will
1935 generally need to be asked to provide a login id and a
1936 password for the machine named by the site parameter.
1937 For security reasons, such an id and password are not
1938 specified as content-type parameters, but must be
1939 obtained from the user.
1940
1941 In addition, the following parameters are optional:
1942
1943 (1) DIRECTORY -- A directory from which the data named by
1944 NAME should be retrieved.
1945
1946 (2) MODE -- A case-insensitive string indicating the mode
1947 to be used when retrieving the information. The valid
1948 values for access-type "TFTP" are "NETASCII", "OCTET",
1949 and "MAIL", as specified by the TFTP protocol [RFC-
1950 783]. The valid values for access-type "FTP" are
1951 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a
1952 decimal integer, typically 8. These correspond to the
1953 representation types "A" "E" "I" and "L n" as specified
1954 by the FTP protocol [RFC-959]. Note that "BINARY" and
1955 "TENEX" are not valid values for MODE and that "OCTET"
1956 or "IMAGE" or "LOCAL8" should be used instead. IF MODE
1957 is not specified, the default value is "NETASCII" for
1958 TFTP and "ASCII" otherwise.
1959
1960
1961
1962Freed & Borenstein Standards Track [Page 35]
1963
1964RFC 2046 Media Types November 1996
1965
1966
19675.2.3.3. The 'anon-ftp' Access-Type
1968
1969 The "anon-ftp" access-type is identical to the "ftp" access type,
1970 except that the user need not be asked to provide a name and password
1971 for the specified site. Instead, the ftp protocol will be used with
1972 login "anonymous" and a password that corresponds to the user's mail
1973 address.
1974
19755.2.3.4. The 'local-file' Access-Type
1976
1977 An access-type of "local-file" indicates that the actual body is
1978 accessible as a file on the local machine. Two additional parameters
1979 are defined for this access type:
1980
1981 (1) NAME -- The name of the file that contains the actual
1982 body data. This parameter is mandatory for the
1983 "local-file" access-type.
1984
1985 (2) SITE -- A domain specifier for a machine or set of
1986 machines that are known to have access to the data
1987 file. This optional parameter is used to describe the
1988 locality of reference for the data, that is, the site
1989 or sites at which the file is expected to be visible.
1990 Asterisks may be used for wildcard matching to a part
1991 of a domain name, such as "*.bellcore.com", to indicate
1992 a set of machines on which the data should be directly
1993 visible, while a single asterisk may be used to
1994 indicate a file that is expected to be universally
1995 available, e.g., via a global file system.
1996
19975.2.3.5. The 'mail-server' Access-Type
1998
1999 The "mail-server" access-type indicates that the actual body is
2000 available from a mail server. Two additional parameters are defined
2001 for this access-type:
2002
2003 (1) SERVER -- The addr-spec of the mail server from which
2004 the actual body data can be obtained. This parameter
2005 is mandatory for the "mail-server" access-type.
2006
2007 (2) SUBJECT -- The subject that is to be used in the mail
2008 that is sent to obtain the data. Note that keying mail
2009 servers on Subject lines is NOT recommended, but such
2010 mail servers are known to exist. This is an optional
2011 parameter.
2012
2013
2014
2015
2016
2017
2018Freed & Borenstein Standards Track [Page 36]
2019
2020RFC 2046 Media Types November 1996
2021
2022
2023 Because mail servers accept a variety of syntaxes, some of which is
2024 multiline, the full command to be sent to a mail server is not
2025 included as a parameter in the content-type header field. Instead,
2026 it is provided as the "phantom body" when the media type is
2027 "message/external-body" and the access-type is mail-server.
2028
2029 Note that MIME does not define a mail server syntax. Rather, it
2030 allows the inclusion of arbitrary mail server commands in the phantom
2031 body. Implementations must include the phantom body in the body of
2032 the message it sends to the mail server address to retrieve the
2033 relevant data.
2034
2035 Unlike other access-types, mail-server access is asynchronous and
2036 will happen at an unpredictable time in the future. For this reason,
2037 it is important that there be a mechanism by which the returned data
2038 can be matched up with the original "message/external-body" entity.
2039 MIME mail servers must use the same Content-ID field on the returned
2040 message that was used in the original "message/external-body"
2041 entities, to facilitate such matching.
2042
20435.2.3.6. External-Body Security Issues
2044
2045 "Message/external-body" entities give rise to two important security
2046 issues:
2047
2048 (1) Accessing data via a "message/external-body" reference
2049 effectively results in the message recipient performing
2050 an operation that was specified by the message
2051 originator. It is therefore possible for the message
2052 originator to trick a recipient into doing something
2053 they would not have done otherwise. For example, an
2054 originator could specify a action that attempts
2055 retrieval of material that the recipient is not
2056 authorized to obtain, causing the recipient to
2057 unwittingly violate some security policy. For this
2058 reason, user agents capable of resolving external
2059 references must always take steps to describe the
2060 action they are to take to the recipient and ask for
2061 explicit permisssion prior to performing it.
2062
2063 The 'mail-server' access-type is particularly
2064 vulnerable, in that it causes the recipient to send a
2065 new message whose contents are specified by the
2066 original message's originator. Given the potential for
2067 abuse, any such request messages that are constructed
2068 should contain a clear indication that they were
2069 generated automatically (e.g. in a Comments: header
2070 field) in an attempt to resolve a MIME
2071
2072
2073
2074Freed & Borenstein Standards Track [Page 37]
2075
2076RFC 2046 Media Types November 1996
2077
2078
2079 "message/external-body" reference.
2080
2081 (2) MIME will sometimes be used in environments that
2082 provide some guarantee of message integrity and
2083 authenticity. If present, such guarantees may apply
2084 only to the actual direct content of messages -- they
2085 may or may not apply to data accessed through MIME's
2086 "message/external-body" mechanism. In particular, it
2087 may be possible to subvert certain access mechanisms
2088 even when the messaging system itself is secure.
2089
2090 It should be noted that this problem exists either with
2091 or without the availabilty of MIME mechanisms. A
2092 casual reference to an FTP site containing a document
2093 in the text of a secure message brings up similar
2094 issues -- the only difference is that MIME provides for
2095 automatic retrieval of such material, and users may
2096 place unwarranted trust is such automatic retrieval
2097 mechanisms.
2098
20995.2.3.7. Examples and Further Explanations
2100
2101 When the external-body mechanism is used in conjunction with the
2102 "multipart/alternative" media type it extends the functionality of
2103 "multipart/alternative" to include the case where the same entity is
2104 provided in the same format but via different accces mechanisms.
2105 When this is done the originator of the message must order the parts
2106 first in terms of preferred formats and then by preferred access
2107 mechanisms. The recipient's viewer should then evaluate the list
2108 both in terms of format and access mechanisms.
2109
2110 With the emerging possibility of very wide-area file systems, it
2111 becomes very hard to know in advance the set of machines where a file
2112 will and will not be accessible directly from the file system.
2113 Therefore it may make sense to provide both a file name, to be tried
2114 directly, and the name of one or more sites from which the file is
2115 known to be accessible. An implementation can try to retrieve remote
2116 files using FTP or any other protocol, using anonymous file retrieval
2117 or prompting the user for the necessary name and password. If an
2118 external body is accessible via multiple mechanisms, the sender may
2119 include multiple entities of type "message/external-body" within the
2120 body parts of an enclosing "multipart/alternative" entity.
2121
2122 However, the external-body mechanism is not intended to be limited to
2123 file retrieval, as shown by the mail-server access-type. Beyond
2124 this, one can imagine, for example, using a video server for external
2125 references to video clips.
2126
2127
2128
2129
2130Freed & Borenstein Standards Track [Page 38]
2131
2132RFC 2046 Media Types November 1996
2133
2134
2135 The embedded message header fields which appear in the body of the
2136 "message/external-body" data must be used to declare the media type
2137 of the external body if it is anything other than plain US-ASCII
2138 text, since the external body does not have a header section to
2139 declare its type. Similarly, any Content-transfer-encoding other
2140 than "7bit" must also be declared here. Thus a complete
2141 "message/external-body" message, referring to an object in PostScript
2142 format, might look like this:
2143
2144 From: Whomever
2145 To: Someone
2146 Date: Whenever
2147 Subject: whatever
2148 MIME-Version: 1.0
2149 Message-ID: <id1@host.com>
2150 Content-Type: multipart/alternative; boundary=42
2151 Content-ID: <id001@guppylake.bellcore.com>
2152
2153 --42
2154 Content-Type: message/external-body; name="BodyFormats.ps";
2155 site="thumper.bellcore.com"; mode="image";
2156 access-type=ANON-FTP; directory="pub";
2157 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2158
2159 Content-type: application/postscript
2160 Content-ID: <id42@guppylake.bellcore.com>
2161
2162 --42
2163 Content-Type: message/external-body; access-type=local-file;
2164 name="/u/nsb/writing/rfcs/RFC-MIME.ps";
2165 site="thumper.bellcore.com";
2166 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2167
2168 Content-type: application/postscript
2169 Content-ID: <id42@guppylake.bellcore.com>
2170
2171 --42
2172 Content-Type: message/external-body;
2173 access-type=mail-server
2174 server="listserv@bogus.bitnet";
2175 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
2176
2177 Content-type: application/postscript
2178 Content-ID: <id42@guppylake.bellcore.com>
2179
2180 get RFC-MIME.DOC
2181
2182 --42--
2183
2184
2185
2186Freed & Borenstein Standards Track [Page 39]
2187
2188RFC 2046 Media Types November 1996
2189
2190
2191 Note that in the above examples, the default Content-transfer-
2192 encoding of "7bit" is assumed for the external postscript data.
2193
2194 Like the "message/partial" type, the "message/external-body" media
2195 type is intended to be transparent, that is, to convey the data type
2196 in the external body rather than to convey a message with a body of
2197 that type. Thus the headers on the outer and inner parts must be
2198 merged using the same rules as for "message/partial". In particular,
2199 this means that the Content-type and Subject fields are overridden,
2200 but the From field is preserved.
2201
2202 Note that since the external bodies are not transported along with
2203 the external body reference, they need not conform to transport
2204 limitations that apply to the reference itself. In particular,
2205 Internet mail transports may impose 7bit and line length limits, but
2206 these do not automatically apply to binary external body references.
2207 Thus a Content-Transfer-Encoding is not generally necessary, though
2208 it is permitted.
2209
2210 Note that the body of a message of type "message/external-body" is
2211 governed by the basic syntax for an RFC 822 message. In particular,
2212 anything before the first consecutive pair of CRLFs is header
2213 information, while anything after it is body information, which is
2214 ignored for most access-types.
2215
22165.2.4. Other Message Subtypes
2217
2218 MIME implementations must in general treat unrecognized subtypes of
2219 "message" as being equivalent to "application/octet-stream".
2220
2221 Future subtypes of "message" intended for use with email should be
2222 restricted to "7bit" encoding. A type other than "message" should be
2223 used if restriction to "7bit" is not possible.
2224
22256. Experimental Media Type Values
2226
2227 A media type value beginning with the characters "X-" is a private
2228 value, to be used by consenting systems by mutual agreement. Any
2229 format without a rigorous and public definition must be named with an
2230 "X-" prefix, and publicly specified values shall never begin with
2231 "X-". (Older versions of the widely used Andrew system use the "X-
2232 BE2" name, so new systems should probably choose a different name.)
2233
2234 In general, the use of "X-" top-level types is strongly discouraged.
2235 Implementors should invent subtypes of the existing types whenever
2236 possible. In many cases, a subtype of "application" will be more
2237 appropriate than a new top-level type.
2238
2239
2240
2241
2242Freed & Borenstein Standards Track [Page 40]
2243
2244RFC 2046 Media Types November 1996
2245
2246
22477. Summary
2248
2249 The five discrete media types provide provide a standardized
2250 mechanism for tagging entities as "audio", "image", or several other
2251 kinds of data. The composite "multipart" and "message" media types
2252 allow mixing and hierarchical structuring of entities of different
2253 types in a single message. A distinguished parameter syntax allows
2254 further specification of data format details, particularly the
2255 specification of alternate character sets. Additional optional
2256 header fields provide mechanisms for certain extensions deemed
2257 desirable by many implementors. Finally, a number of useful media
2258 types are defined for general use by consenting user agents, notably
2259 "message/partial" and "message/external-body".
2260
22619. Security Considerations
2262
2263 Security issues are discussed in the context of the
2264 "application/postscript" type, the "message/external-body" type, and
2265 in RFC 2048. Implementors should pay special attention to the
2266 security implications of any media types that can cause the remote
2267 execution of any actions in the recipient's environment. In such
2268 cases, the discussion of the "application/postscript" type may serve
2269 as a model for considering other media types with remote execution
2270 capabilities.
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298Freed & Borenstein Standards Track [Page 41]
2299
2300RFC 2046 Media Types November 1996
2301
2302
23039. Authors' Addresses
2304
2305 For more information, the authors of this document are best contacted
2306 via Internet mail:
2307
2308 Ned Freed
2309 Innosoft International, Inc.
2310 1050 East Garvey Avenue South
2311 West Covina, CA 91790
2312 USA
2313
2314 Phone: +1 818 919 3600
2315 Fax: +1 818 919 3614
2316 EMail: ned@innosoft.com
2317
2318
2319 Nathaniel S. Borenstein
2320 First Virtual Holdings
2321 25 Washington Avenue
2322 Morristown, NJ 07960
2323 USA
2324
2325 Phone: +1 201 540 8967
2326 Fax: +1 201 993 3032
2327 EMail: nsb@nsb.fv.com
2328
2329
2330 MIME is a result of the work of the Internet Engineering Task Force
2331 Working Group on RFC 822 Extensions. The chairman of that group,
2332 Greg Vaudreuil, may be reached at:
2333
2334 Gregory M. Vaudreuil
2335 Octel Network Services
2336 17080 Dallas Parkway
2337 Dallas, TX 75248-1905
2338 USA
2339
2340 EMail: Greg.Vaudreuil@Octel.Com
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354Freed & Borenstein Standards Track [Page 42]
2355
2356RFC 2046 Media Types November 1996
2357
2358
2359Appendix A -- Collected Grammar
2360
2361 This appendix contains the complete BNF grammar for all the syntax
2362 specified by this document.
2363
2364 By itself, however, this grammar is incomplete. It refers by name to
2365 several syntax rules that are defined by RFC 822. Rather than
2366 reproduce those definitions here, and risk unintentional differences
2367 between the two, this document simply refers the reader to RFC 822
2368 for the remaining definitions. Wherever a term is undefined, it
2369 refers to the RFC 822 definition.
2370
2371 boundary := 0*69<bchars> bcharsnospace
2372
2373 bchars := bcharsnospace / " "
2374
2375 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
2376 "+" / "_" / "," / "-" / "." /
2377 "/" / ":" / "=" / "?"
2378
2379 body-part := <"message" as defined in RFC 822, with all
2380 header fields optional, not starting with the
2381 specified dash-boundary, and with the
2382 delimiter not occurring anywhere in the
2383 body part. Note that the semantics of a
2384 part differ from the semantics of a message,
2385 as described in the text.>
2386
2387 close-delimiter := delimiter "--"
2388
2389 dash-boundary := "--" boundary
2390 ; boundary taken from the value of
2391 ; boundary parameter of the
2392 ; Content-Type field.
2393
2394 delimiter := CRLF dash-boundary
2395
2396 discard-text := *(*text CRLF)
2397 ; May be ignored or discarded.
2398
2399 encapsulation := delimiter transport-padding
2400 CRLF body-part
2401
2402 epilogue := discard-text
2403
2404 multipart-body := [preamble CRLF]
2405 dash-boundary transport-padding CRLF
2406 body-part *encapsulation
2407
2408
2409
2410Freed & Borenstein Standards Track [Page 43]
2411
2412RFC 2046 Media Types November 1996
2413
2414
2415 close-delimiter transport-padding
2416 [CRLF epilogue]
2417
2418 preamble := discard-text
2419
2420 transport-padding := *LWSP-char
2421 ; Composers MUST NOT generate
2422 ; non-zero length transport
2423 ; padding, but receivers MUST
2424 ; be able to handle padding
2425 ; added by message transports.
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466Freed & Borenstein Standards Track [Page 44]
2467
2468