1
2
3
4
5
6
7Network Working Group E. Hall
8Request for Comments: 4155 September 2005
9Category: Informational
10
11
12 The application/mbox Media Type
13
14Status of This Memo
15
16 This memo provides information for the Internet community. It does
17 not specify an Internet standard of any kind. Distribution of this
18 memo is unlimited.
19
20Copyright Notice
21
22 Copyright (C) The Internet Society (2005).
23
24Abstract
25
26 This memo requests that the application/mbox media type be authorized
27 for allocation by the IESG, according to the terms specified in RFC
28 2048. This memo also defines a default format for the mbox database,
29 which must be supported by all conformant implementations.
30
311. Background and Overview
32
33 UNIX-like operating systems have historically made widespread use of
34 "mbox" database files for a variety of local email purposes. In the
35 common case, mbox files store linear sequences of one or more
36 electronic mail messages, with local email clients treating the
37 database as a logical folder of email messages. mbox databases are
38 also used by a variety of other messaging tools, such as mailing list
39 management programs, archiving and filtering utilities, messaging
40 servers, and other related applications. In recent years, mbox
41 databases have also become common on a large number of non-UNIX
42 computing platforms, for similar kinds of purposes.
43
44 The increased pervasiveness of these files has led to an increased
45 demand for a standardized, network-wide interchange of these files as
46 discrete database objects. In turn, this dictates a need for a
47 general media type definition for mbox files, which is the subject
48 and purpose of this memo.
49
50
51
52
53
54
55
56
57
58Hall Informational [Page 1]
59
60RFC 4155 The application/mbox Media Type September 2005
61
62
632. About the mbox Database
64
65 The mbox database format is not documented in an authoritative
66 specification, but instead exists as a well-known output format that
67 is anecdotally documented, or which is only authoritatively
68 documented for a specific platform or tool.
69
70 mbox databases typically contain a linear sequence of electronic mail
71 messages. Each message begins with a separator line that identifies ../store/import.go:159
72 the message sender, and also identifies the date and time at which
73 the message was received by the final recipient (either the last-hop
74 system in the transfer path, or the system which serves as the
75 recipient's mailstore). Each message is typically terminated by an ../store/export.go:548
76 empty line. The end of the database is usually recognized by either
77 the absence of any additional data, or by the presence of an explicit
78 end-of-file marker.
79
80 The structure of the separator lines vary across implementations, but ../store/export.go:457
81 usually contain the exact character sequence of "From", followed by a
82 single Space character (0x20), an email address of some kind, another
83 Space character, a timestamp sequence of some kind, and an end-of-
84 line marker. However, due to the lack of any authoritative
85 specification, each of these attributes are known to vary widely
86 across implementations. For example, the email address can reflect
87 any addressing syntax that has ever been used on any messaging system
88 in all of history (specifically including address forms that are not
89 compatible with Internet messages, as defined by RFC 2822 [RFC2822]).
90 Similarly, the timestamp sequences can also vary according to system
91 output, while the end-of-line sequences will often reflect platform-
92 specific requirements. Different data formats can even appear within
93 a single database as a result of multiple mbox files being
94 concatenated together, or because a single file was accessed by
95 multiple messaging clients, each of which has used its own syntax for
96 the separator line.
97
98 Message data within mbox databases often reflects site-specific
99 peculiarities. For example, it is entirely possible for the message
100 body or headers in an mbox database to contain untagged eight-bit
101 character data that implicitly reflects a site-specific default
102 language or locale, or that reflects local defaults for timestamps
103 and email addresses; none of this data is widely portable beyond the
104 local scope. Similarly, message data can also contain unencoded
105 eight-bit binary data, or can use encoding formats that represent a
106 specific platform (e.g., BINHEX or UUENCODE sequences).
107
108
109
110
111
112
113
114Hall Informational [Page 2]
115
116RFC 4155 The application/mbox Media Type September 2005
117
118
119 Many implementations are also known to escape message body lines that ../store/export.go:534 ../store/import.go:165
120 begin with the character sequence of "From ", so as to prevent
121 confusion with overly-liberal parsers that do not search for full
122 separator lines. In the common case, a leading Greater-Than symbol
123 (0x3E) is used for this purpose (with "From " becoming ">From ").
124 However, other implementations are known not to escape such lines
125 unless they are immediately preceded by a blank line or if they also
126 appear to contain an email address and a timestamp. Other
127 implementations are also known to perform secondary escapes against
128 these lines if they are already escaped or quoted, while others
129 ignore these mechanisms altogether.
130
131 A comprehensive description of mbox database files on UNIX-like
132 systems can be found at http://qmail.org./man/man5/mbox.html, which
133 should be treated as mostly authoritative for those variations that
134 are otherwise only documented in anecdotal form. However, readers
135 are advised that many other platforms and tools make use of mbox
136 databases, and that there are many more potential variations that can
137 be encountered in the wild.
138
139 In order to mitigate errors that may arise from such vagaries, this
140 specification defines a "format" parameter to the application/mbox
141 media type declaration, which can be used to identify the specific
142 kind of mbox database that is being transferred. Furthermore, this
143 specification defines a "default" database format which MUST be
144 supported by implementations that claim to be compliant with this
145 specification, and which is to be used as the implicit format for
146 undeclared application/mbox data objects. Additional format types
147 are to be defined in subsequent specifications. Messaging systems
148 that receive an mbox database with an unknown format parameter value
149 SHOULD treat the data as an opaque binary object, as if the data had
150 been declared as application/octet-stream
151
152 Refer to Appendix A for a description of the default mbox format.
153
154 Note that RFC 2046 [RFC2046] defines the multipart/digest media type
155 for transferring platform-independent message files. Because that
156 specification defines a set of neutral and strict formatting rules,
157 the multipart/digest media type already facilitates highly-
158 predictable transfer and conversion operations; as such, implementers
159 are strongly encouraged to support and use that media type where
160 possible.
161
162
163
164
165
166
167
168
169
170Hall Informational [Page 3]
171
172RFC 4155 The application/mbox Media Type September 2005
173
174
1753. Prerequisites and Terminology
176
177 Readers of this document are expected to be familiar with the
178 specification for MIME [RFC2045] and MIME-type registrations
179 [RFC2048].
180
181 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
182 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
183 document are to be interpreted as described in RFC 2119 [RFC2119].
184
1854. The application/mbox Media Type Registration
186
187 This section provides the media type registration application (as per
188 [RFC2048]).
189
190 MIME media type name: application
191
192 MIME subtype name: mbox
193
194 Required parameters: none
195
196 Optional parameters: The "format" parameter identifies the format of
197 the mbox database and the messages contained therein. The default
198 value for the "format" parameter is "default", and refers to the
199 formatting rules defined in Appendix A of this memo. mbox databases
200 that do not have a "format" parameter SHOULD be interpreted as having
201 the implicit "format" value of "default". mbox databases that have
202 an unknown value for the "format" parameter SHOULD be treated as
203 opaque data objects, as if the media type had been specified as
204 application/octet-stream. Additional values for the format parameter
205 are to be defined in subsequent specifications, and registered with
206 IANA.
207
208 Encoding considerations: If an email client receives an mbox database
209 as a message attachment, and then stores that attachment within a
210 local mbox database, the contents of the two database files may
211 become irreversibly intermingled, such that both databases are
212 rendered unrecognizable. In order to avoid these collisions,
213 messaging systems that support this specification MUST encode an mbox
214 database (or at a minimum, the separator lines) with non-transparent
215 transfer encoding (such as BASE64 or Quoted-Printable) whenever an
216 application/mbox object is transferred via messaging protocols.
217 Other transfer services are generally encouraged to adopt similar
218 encoding strategies in order to allow for any subsequent
219 retransmission that might occur, but this is not a requirement.
220 Implementers should also be prepared to encode mbox data locally if
221 non-compliant data is received.
222
223
224
225
226Hall Informational [Page 4]
227
228RFC 4155 The application/mbox Media Type September 2005
229
230
231 Security considerations: mbox data is passive, and does not generally
232 represent a unique or new security threat. However, there is risk in
233 sharing any kind of data, because unintentional information may be
234 exposed, and this risk certainly applies to mbox data as well.
235
236 Interoperability considerations: Due to the lack of a single
237 authoritative specification for mbox databases, there are a large
238 number of variations between database formats (refer to the
239 introduction text for common examples), and it is expected that non-
240 conformant data will be erroneously tagged or exchanged. Although
241 the "default" format specified in this memo does not allow for these
242 kinds of vagaries, prior negotiation or agreement between humans may
243 sometimes be needed.
244
245 Published specification: see Appendix A.
246
247 Applications that use this media type: hundreds of messaging products
248 make use of the mbox database format, in one form or another.
249
250 Magic number(s): mbox database files can be recognized by having a
251 leading character sequence of "From", followed by a single Space
252 character (0x20), followed by additional printable character data
253 (refer to the description in Appendix A for details). However,
254 implementers are cautioned that all such files will not be compliant
255 with all of the formatting rules, therefore implementers should treat
256 these files with an appropriate amount of circumspection.
257
258 File extension(s): mbox database files sometimes have an ".mbox"
259 extension, but this is not required nor expected. As with magic
260 numbers, implementers should avoid reflexive assumptions about the
261 contents of such files.
262
263 Macintosh File Type Code(s): None are known to be common.
264
265 Person & email address to contact for further information: Eric A.
266 Hall (ehall@ntrg.com)
267
268 Intended usage: COMMON
269
2705. Security Considerations
271
272 See the discussion in section 4.
273
274
275
276
277
278
279
280
281
282Hall Informational [Page 5]
283
284RFC 4155 The application/mbox Media Type September 2005
285
286
2876. IANA Considerations
288
289 The IANA has registered the application/mbox media type in the MIME
290 registry, using the application provided in section 4 above.
291
292 Furthermore, IANA has established and will maintain a registry of
293 values for the "format" parameter as described in this memo. The
294 first registration is the "default" value, using the description
295 provided in Appendix A. Subsequent values for the "format" parameter
296 MUST be accompanied by some form of recognizable, complete, and
297 legitimate specification, such as an IESG-approved specification, or
298 some kind of authoritative vendor documentation.
299
3007. Normative References
301
302 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
303 Extensions (MIME) Part One: Format of Internet Message
304 Bodies", RFC 2045, November 1996.
305
306 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
307 Extensions (MIME) Part Two: Media Types", RFC 2046,
308 November 1996.
309
310 [RFC2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose
311 Internet Mail Extensions (MIME) Part Four: Registration
312 Procedures", BCP 13, RFC 2048, November 1996.
313
314 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
315 Requirement Levels", BCP 14, RFC 2119, March 1997.
316
317 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April
318 2001.
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338Hall Informational [Page 6]
339
340RFC 4155 The application/mbox Media Type September 2005
341
342
343Appendix A. The "default" mbox Database Format
344
345 In order to improve interoperability among messaging systems, this
346 memo defines a "default" mbox database format, which MUST be
347 supported by all implementations that claim to be compliant with this
348 specification.
349
350 The "default" mbox database format uses a linear sequence of Internet
351 messages, with each message being immediately prefaced by a separator
352 line, and being terminated by an empty line. More specifically:
353
354 o Each message within the database MUST follow the syntax and ../store/export.go:518 ../store/import.go:103
355 formatting rules defined in RFC 2822 [RFC2822] and its related
356 specifications, with the exception that the canonical mbox
357 database MUST use a single Line-Feed character (0x0A) as the
358 end-of-line sequence, and MUST NOT use a Carriage-Return/Line-
359 Feed pair (NB: this requirement only applies to the canonical
360 mbox database as transferred, and does not override any other
361 specifications). This usage represents the most common
362 historical representation of the mbox database format, and
363 allows for the least amount of conversion.
364
365 o Messages within the default mbox database MUST consist of ../store/export.go:508
366 seven-bit characters within an eight-bit stream. Eight-bit data
367 within the stream MUST be converted to a seven-bit form (using
368 appropriate, standardized encoding) and appropriately tagged
369 (with the correct header fields) before the database is
370 transferred.
371
372 o Message headers and data in the default mbox database MUST be
373 fully-qualified, as per the relevant specification(s). For
374 example, email addresses in the various header fields MUST have
375 legitimate domain names (as per RFC 2822), while extended
376 characters and encodings MUST be specified in the appropriate
377 location (as per the appropriate MIME specifications), and so
378 forth.
379
380 o Each message in the mbox database MUST be immediately preceded
381 by a single separator line, which MUST conform to the following
382 syntax:
383
384 The exact character sequence of "From";
385
386 a single Space character (0x20);
387
388 the email address of the message sender (as obtained from the
389 message envelope or other authoritative source), conformant
390 with the "addr-spec" syntax from RFC 2822;
391
392
393
394Hall Informational [Page 7]
395
396RFC 4155 The application/mbox Media Type September 2005
397
398
399 a single Space character;
400
401 a timestamp indicating the UTC date and time when the message
402 was originally received, conformant with the syntax of the
403 traditional UNIX 'ctime' output sans timezone (note that the
404 use of UTC precludes the need for a timezone indicator);
405
406 an end-of-line marker.
407
408 o Each message in the database MUST be terminated by an empty
409 line, containing a single end-of-line marker.
410
411 Note that the first message in an mbox database will only be prefaced
412 by a separator line, while every other message will begin with two
413 end-of-line sequences (one at the end of the message itself, and
414 another to mark the end of the message within the mbox database file
415 stream) and a separator line (marking the new message). The end of
416 the database is implicitly reached when no more message data or
417 separator lines are found.
418
419 Also note that this specification does not prescribe any escape
420 syntax for message body lines that begin with the character sequence
421 of "From ". Recipient systems are expected to parse full separator
422 lines as they are documented above.
423
424Author's Address
425
426 Eric A. Hall
427
428 EMail: ehall@ntrg.com
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450Hall Informational [Page 8]
451
452RFC 4155 The application/mbox Media Type September 2005
453
454
455Full Copyright Statement
456
457 Copyright (C) The Internet Society (2005).
458
459 This document is subject to the rights, licenses and restrictions
460 contained in BCP 78, and except as set forth therein, the authors
461 retain all their rights.
462
463 This document and the information contained herein are provided on an
464 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
465 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
466 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
467 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
468 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
469 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
470
471Intellectual Property
472
473 The IETF takes no position regarding the validity or scope of any
474 Intellectual Property Rights or other rights that might be claimed to
475 pertain to the implementation or use of the technology described in
476 this document or the extent to which any license under such rights
477 might or might not be available; nor does it represent that it has
478 made any independent effort to identify any such rights. Information
479 on the procedures with respect to rights in RFC documents can be
480 found in BCP 78 and BCP 79.
481
482 Copies of IPR disclosures made to the IETF Secretariat and any
483 assurances of licenses to be made available, or the result of an
484 attempt made to obtain a general license or permission for the use of
485 such proprietary rights by implementers or users of this
486 specification can be obtained from the IETF on-line IPR repository at
487 http://www.ietf.org/ipr.
488
489 The IETF invites any interested party to bring to its attention any
490 copyrights, patents or patent applications, or other proprietary
491 rights that may cover technology that may be required to implement
492 this standard. Please address the information to the IETF at ietf-
493 ipr@ietf.org.
494
495Acknowledgement
496
497 Funding for the RFC Editor function is currently provided by the
498 Internet Society.
499
500
501
502
503
504
505
506Hall Informational [Page 9]
507
508