7Internet Engineering Task Force (IETF)                   P. Resnick, Ed.
 
8Request for Comments: 6855                         Qualcomm Incorporated
 
9Obsoletes: 5738                                           C. Newman, Ed.
 
10Category: Standards Track                                         Oracle
 
11ISSN: 2070-1721                                             S. Shen, Ed.
 
16                         IMAP Support for UTF-8
 
20   This specification extends the Internet Message Access Protocol
 
21   (IMAP) to support UTF-8 encoded international characters in user
 
22   names, mail addresses, and message headers.  This specification
 
27   This is an Internet Standards Track document.
 
29   This document is a product of the Internet Engineering Task Force
 
30   (IETF).  It represents the consensus of the IETF community.  It has
 
31   received public review and has been approved for publication by the
 
32   Internet Engineering Steering Group (IESG).  Further information on
 
33   Internet Standards is available in Section 2 of RFC 5741.
 
35   Information about the current status of this document, any errata,
 
36   and how to provide feedback on it may be obtained at
 
37   http://www.rfc-editor.org/info/rfc6855.
 
41   Copyright (c) 2013 IETF Trust and the persons identified as the
 
42   document authors.  All rights reserved.
 
44   This document is subject to BCP 78 and the IETF Trust's Legal
 
45   Provisions Relating to IETF Documents
 
46   (http://trustee.ietf.org/license-info) in effect on the date of
 
47   publication of this document.  Please review these documents
 
48   carefully, as they describe your rights and restrictions with respect
 
49   to this document.  Code Components extracted from this document must
 
50   include Simplified BSD License text as described in Section 4.e of
 
51   the Trust Legal Provisions and are provided without warranty as
 
52   described in the Simplified BSD License.
 
58Resnick, et al.              Standards Track                    [Page 1]
 
60RFC 6855                 IMAP Support for UTF-8               March 2013
 
65   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
 
66   2.  Conventions Used in This Document  . . . . . . . . . . . . . .  2
 
67   3.  "UTF8=ACCEPT" IMAP Capability and UTF-8 in IMAP
 
68       Quoted-Strings . . . . . . . . . . . . . . . . . . . . . . . .  3
 
69   4.  IMAP UTF8 "APPEND" Data Extension  . . . . . . . . . . . . . .  4
 
70   5.  "LOGIN" Command and UTF-8  . . . . . . . . . . . . . . . . . .  5
 
71   6.  "UTF8=ONLY" Capability . . . . . . . . . . . . . . . . . . . .  5
 
72   7.  Dealing with Legacy Clients  . . . . . . . . . . . . . . . . .  6
 
73   8.  Issues with UTF-8 Header Mailstore . . . . . . . . . . . . . .  7
 
74   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
 
75   10. Security Considerations  . . . . . . . . . . . . . . . . . . .  8
 
76   11. References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
 
77     11.1.  Normative References  . . . . . . . . . . . . . . . . . .  9
 
78     11.2.  Informative References  . . . . . . . . . . . . . . . . . 10
 
79   Appendix A.  Design Rationale  . . . . . . . . . . . . . . . . . . 11
 
80   Appendix B.  Acknowledgments . . . . . . . . . . . . . . . . . . . 11
 
84   This specification forms part of the Email Address
 
85   Internationalization protocols described in the Email Address
 
86   Internationalization Framework document [RFC6530].  It extends IMAP
 
87   [RFC3501] to permit UTF-8 [RFC3629] in headers, as described in
 
88   "Internationalized Email Headers" [RFC6532].  It also adds a
 
89   mechanism to support mailbox names using the UTF-8 charset.  This
 
90   specification creates two new IMAP capabilities to allow servers to
 
91   advertise these new extensions.
 
93   This specification assumes that the IMAP server will be operating in
 
94   a fully internationalized environment, i.e., one in which all clients
 
95   accessing the server will be able to accept non-ASCII message header
 
96   fields and other information, as specified in Section 3.  At least
 
97   during a transition period, that assumption will not be realistic for
 
98   many environments; the issues involved are discussed in Section 7
 
101   This specification replaces an earlier, experimental approach to the
 
102   same problem [RFC5738].
 
1042.  Conventions Used in This Document
 
106   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
 
107   in this document are to be interpreted as defined in "Key words for
 
108   use in RFCs to Indicate Requirement Levels" [RFC2119].
 
114Resnick, et al.              Standards Track                    [Page 2]
 
116RFC 6855                 IMAP Support for UTF-8               March 2013
 
119   The formal syntax uses the Augmented Backus-Naur Form (ABNF)
 
120   [RFC5234] notation.  In addition, rules from IMAP [RFC3501], UTF-8
 
121   [RFC3629], Extensions to IMAP ABNF [RFC4466], and IMAP "LIST" command
 
122   extensions [RFC5258] are also referenced.  This document assumes that
 
123   the reader will have a reasonably good understanding of these RFCs.
 
1253.  "UTF8=ACCEPT" IMAP Capability and UTF-8 in IMAP Quoted-Strings
 
127   The "UTF8=ACCEPT" capability indicates that the server supports the
 
128   ability to open mailboxes containing internationalized messages with
 
129   the "SELECT" and "EXAMINE" commands, and the server can provide UTF-8
 
130   responses to the "LIST" and "LSUB" commands.  This capability also
 
131   affects other IMAP extensions that can return mailbox names or their
 
132   prefixes, such as NAMESPACE [RFC2342] and ACL [RFC4314].
 
134   The "UTF8=ONLY" capability, described in Section 6, implies the
 
135   "UTF8=ACCEPT" capability.  A server is said to support "UTF8=ACCEPT"
 
136   if it advertises either "UTF8=ACCEPT" or "UTF8=ONLY".
 
138   A client MUST use the "ENABLE" command [RFC5161] with the
 
139   "UTF8=ACCEPT" option (defined in Section 4 below) to indicate to the
 
140   server that the client accepts UTF-8 in quoted-strings and supports
 
141   the "UTF8=ACCEPT" extension.  The "ENABLE UTF8=ACCEPT" command is
 
142   only valid in the authenticated state.
 
144   The IMAP base specification [RFC3501] forbids the use of 8-bit
 
145   characters in atoms or quoted-strings.  Thus, a UTF-8 string can only
 
146   be sent as a literal.  This can be inconvenient from a coding
 
147   standpoint, and unless the server offers IMAP non-synchronizing
 
148   literals [RFC2088], this requires an extra round trip for each UTF-8
 
149   string sent by the client.  When the IMAP server supports
 
150   "UTF8=ACCEPT", it supports UTF-8 in quoted-strings with the following
 
154                   ; QUOTED-CHAR is not modified, as it will affect
 
155                   ; other RFC 3501 ABNF non-terminals.
 
157            uQUOTED-CHAR  = QUOTED-CHAR / UTF8-2 / UTF8-3 / UTF8-4
 
159            UTF8-2        =   <Defined in Section 4 of RFC 3629>
 
161            UTF8-3        =   <Defined in Section 4 of RFC 3629>
 
163            UTF8-4        =   <Defined in Section 4 of RFC 3629>
 
165   When this extended quoting mechanism is used by the client, the
 
166   server MUST reject, with a "BAD" response, any octet sequences with
 
170Resnick, et al.              Standards Track                    [Page 3]
 
172RFC 6855                 IMAP Support for UTF-8               March 2013
 
175   the high bit set that fail to comply with the formal syntax
 
176   requirements of UTF-8 [RFC3629].  The IMAP server MUST NOT send UTF-8
 
177   in quoted-strings to the client unless the client has indicated
 
178   support for that syntax by using the "ENABLE UTF8=ACCEPT" command.
 
180   If the server supports "UTF8=ACCEPT", the client MAY use extended
 
181   quoted syntax with any IMAP argument that permits a string (including
 
182   astring and nstring).  However, if characters outside the US-ASCII
 
183   repertoire are used in an inappropriate place, the results would be
 
184   the same as if other syntactically valid but semantically invalid
 
185   characters were used.  Specific cases where UTF-8 characters are
 
186   permitted or not permitted are described in the following paragraphs.
 
188   All IMAP servers that support "UTF8=ACCEPT" SHOULD accept UTF-8 in
 
189   mailbox names, and those that also support the Mailbox International
 
190   Naming Convention described in RFC 3501, Section 5.1.3, MUST accept
 
191   UTF8-quoted mailbox names and convert them to the appropriate
 
193   Definition ([RFC5198], Section 2) with the specific exception that
 
194   they MUST NOT contain control characters (U+0000-U+001F and U+0080-U+
 
195   009F), a delete character (U+007F), a line separator (U+2028), or a
 
196   paragraph separator (U+2029).
 
199   UTF8=ACCEPT" command, it MUST NOT issue a "SEARCH" command that
 
200   contains a charset specification.  If an IMAP server receives such a
 
201   "SEARCH" command in that situation, it SHOULD reject the command with
 
202   a "BAD" response (due to the conflicting charset labels).
 
206   If the server supports "UTF8=ACCEPT", then the server accepts UTF-8
 
207   headers in the "APPEND" command message argument.  A client that
 
208   sends a message with UTF-8 headers to the server MUST send them using
 
209   the "UTF8" data extension to the "APPEND" command.  If the server
 
210   also advertises the "CATENATE" capability [RFC4469], the client can
 
211   use the same data extension to include such a message in a catenated
 
212   message part.  The ABNF for the "APPEND" data extension and
 
213   "CATENATE" extension follows:
 
215        utf8-literal   = "UTF8" SP "(" literal8 ")"
 
217        literal8       = <Defined in RFC 4466>
 
221        cat-part       =/ utf8-literal
 
226Resnick, et al.              Standards Track                    [Page 4]
 
228RFC 6855                 IMAP Support for UTF-8               March 2013
 
231   If an IMAP server supports "UTF8=ACCEPT" and the IMAP client has not
 
232   issued the "ENABLE UTF8=ACCEPT" command, the server MUST reject, with
 
233   a "NO" response, an "APPEND" command that includes any 8-bit
 
234   character in message header fields.
 
2365.  "LOGIN" Command and UTF-8
 
238   This specification does not extend the IMAP "LOGIN" command [RFC3501]
 
239   to support UTF-8 usernames and passwords.  Whenever a client needs to
 
240   use UTF-8 usernames or passwords, it MUST use the IMAP "AUTHENTICATE"
 
241   command, which is already capable of passing UTF-8 usernames and
 
244   Although using the IMAP "AUTHENTICATE" command in this way makes it
 
245   syntactically legal to have a UTF-8 username or password, there is no
 
246   guarantee that the user provisioning system utilized by the IMAP
 
247   server will allow such identities.  This is an implementation
 
248   decision and may depend on what identity system the IMAP server is
 
253   The "UTF8=ONLY" capability indicates that the server supports
 
254   "UTF8=ACCEPT" (see Section 4) and that it requires support for UTF-8
 
255   from clients.  In particular, this means that the server will send
 
256   UTF-8 in quoted-strings, and it will not accept the older
 
257   international mailbox name convention (modified UTF-7 [RFC3501]).
 
258   Because these are incompatible changes to IMAP, explicit server
 
259   announcement and client confirmation is necessary: clients MUST use
 
260   the "ENABLE UTF8=ACCEPT" command before using this server.  A server
 
261   that advertises "UTF8=ONLY" will reject, with a "NO [CANNOT]"
 
262   response [RFC5530], any command that might require UTF-8 support and
 
263   is not preceded by an "ENABLE UTF8=ACCEPT" command.
 
265   IMAP clients that find support for a server that announces
 
266   "UTF8=ONLY" problematic are encouraged to at least detect the
 
267   announcement and provide an informative error message to the
 
270   Because the "UTF8=ONLY" server capability includes support for
 
271   "UTF8=ACCEPT", the capability string will include, at most, one of
 
272   those and never both.  For the client, "ENABLE UTF8=ACCEPT" is always
 
273   used -- never "ENABLE UTF8=ONLY".
 
282Resnick, et al.              Standards Track                    [Page 5]
 
284RFC 6855                 IMAP Support for UTF-8               March 2013
 
2877.   Dealing with Legacy Clients
 
289   In most situations, it will be difficult or impossible for the
 
290   implementer or operator of an IMAP (or POP) server to know whether
 
291   all of the clients that might access it, or the associated mail store
 
292   more generally, will be able to support the facilities defined in
 
293   this document.  In almost all cases, servers that conform to this
 
294   specification will have to be prepared to deal with clients that do
 
295   not enable the relevant capabilities.  Unfortunately, there is no
 
296   completely satisfactory way to do so other than for systems that wish
 
297   to receive email that requires SMTPUTF8 capabilities to be sure that
 
298   all components of those systems -- including IMAP and other clients
 
299   selected by users -- are upgraded appropriately.
 
301   When a message that requires SMTPUTF8 is encountered and the client
 
302   does not enable UTF-8 capability, choices available to the server
 
303   include hiding the problematic message(s), creating in-band or
 
304   out-of-band notifications or error messages, or somehow trying to
 
305   create a surrogate of the message with the intention of providing
 
306   useful information to that client about what has occurred.  Such
 
307   surrogate messages cannot be actual substitutes for the original
 
308   message: they will almost always be impossible to reply to (either at
 
309   all or without loss of information) and the new header fields or
 
310   specialized constructs for server-client communications may go beyond
 
311   the requirements of current email specifications (e.g., [RFC5322]).
 
312   Consequently, such messages may confuse some legacy mail user agents
 
313   (including IMAP clients) or not provide expected information to
 
314   users.  There are also trade-offs in constructing surrogates of the
 
315   original message between accepting complexity and additional
 
316   computation costs in order to try to preserve as much information as
 
317   possible (for example, in "Post-Delivery Message Downgrading for
 
318   Internationalized Email Messages" [RFC6857]) and trying to minimize
 
319   those costs while still providing useful information (for example, in
 
320   "Simplified POP and IMAP Downgrading for Internationalized Email"
 
323   Implementations that choose to perform downgrading SHOULD use one of
 
324   the standardized algorithms provided in RFC 6857 or RFC 6858.
 
325   Getting downgrade algorithms right, and minimizing the risk of
 
326   operational problems and harm to the email system, is tricky and
 
327   requires careful engineering.  These two algorithms are well
 
328   understood and carefully designed.
 
330   Because such messages are really surrogates of the original ones, not
 
331   really "downgraded" ones (although that terminology is often used for
 
332   convenience), they inevitably have relationships to the originals
 
333   that the IMAP specification [RFC3501] did not anticipate.  This
 
334   brings up two concerns in particular: First, digital signatures
 
338Resnick, et al.              Standards Track                    [Page 6]
 
340RFC 6855                 IMAP Support for UTF-8               March 2013
 
343   computed over and intended for the original message will often not be
 
344   applicable to the surrogate message, and will often fail signature
 
345   verification.  (It will be possible for some digital signatures to be
 
346   verified, if they cover only parts of the original message that are
 
347   not affected in the creation of the surrogate.)  Second, servers that
 
348   may be accessed by the same user with different clients or methods
 
349   (e.g., POP or webmail systems in addition to IMAP or IMAP clients
 
350   with different capabilities) will need to exert extreme care to be
 
351   sure that UIDVALIDITY [RFC3501] behaves as the user would expect.
 
352   Those issues may be especially sensitive if the server caches the
 
353   surrogate message or computes and stores it when the message arrives
 
354   with the intent of making either form available depending on client
 
355   capabilities.  Additionally, in order to cope with the case when a
 
356   server compliant with this extension returns the same UIDVALIDITY to
 
357   both legacy and "UTF8=ACCEPT"-aware clients, a client upgraded from
 
358   being non-"UTF8=ACCEPT"-aware MUST discard its cache of messages
 
359   downloaded from the server.
 
361   The best (or "least bad") approach for any given environment will
 
362   depend on local conditions, local assumptions about user behavior,
 
363   the degree of control the server operator has over client usage and
 
364   upgrading, the options that are actually available, and so on.  It is
 
365   impossible, at least at the time of publication of this
 
366   specification, to give good advice that will apply to all situations,
 
367   or even particular profiles of situations, other than "upgrade legacy
 
368   clients as soon as possible".
 
3708.  Issues with UTF-8 Header Mailstore
 
372   When an IMAP server uses a mailbox format that supports UTF-8 headers
 
373   and it permits selection or examination of that mailbox without
 
374   issuing "ENABLE UTF8=ACCEPT" first, it is the responsibility of the
 
375   server to comply with the IMAP base specification [RFC3501] and the
 
376   Internet Message Format [RFC5322] with respect to all header
 
377   information transmitted over the wire.  The issue of handling
 
378   messages containing non-ASCII characters in legacy environments is
 
379   discussed in Section 7.
 
394Resnick, et al.              Standards Track                    [Page 7]
 
396RFC 6855                 IMAP Support for UTF-8               March 2013
 
3999.  IANA Considerations
 
401   This document redefines two capabilities ("UTF8=ACCEPT" and
 
402   "UTF8=ONLY") in the "IMAP 4 Capabilities" registry [RFC3501].  Three
 
403   other capabilities that were described in the experimental
 
404   predecessor to this document ("UTF8=ALL", "UTF8=APPEND", "UTF8=USER")
 
405   are now OBSOLETE.  IANA has updated the registry as follows:
 
408      +--------------+-----------------+
 
409      | UTF8=ACCEPT  |  [RFC5738]      |
 
410      | UTF8=ALL     |  [RFC5738]      |
 
411      | UTF8=APPEND  |  [RFC5738]      |
 
412      | UTF8=ONLY    |  [RFC5738]      |
 
413      | UTF8=USER    |  [RFC5738]      |
 
414      +--------------+-----------------+
 
418      +------------------------+---------------------+
 
419      | UTF8=ACCEPT            |  [RFC6855]          |
 
420      | UTF8=ALL (OBSOLETE)    |  [RFC5738] [RFC6855]|
 
421      | UTF8=APPEND (OBSOLETE) |  [RFC5738] [RFC6855]|
 
422      | UTF8=ONLY              |  [RFC6855]          |
 
423      | UTF8=USER (OBSOLETE)   |  [RFC5738] [RFC6855]|
 
424      +------------------------+---------------------+
 
42610.  Security Considerations
 
428   The security considerations of UTF-8 [RFC3629] and SASLprep [RFC4013]
 
429   apply to this specification, particularly with respect to use of
 
430   UTF-8 in usernames and passwords.  Otherwise, this is not believed to
 
431   alter the security considerations of IMAP.
 
433   Special considerations, some of them with security implications,
 
434   occur if a server that conforms to this specification is accessed by
 
435   a client that does not, as well as in some more complex situations in
 
436   which a given message is accessed by multiple clients that might use
 
437   different protocols and/or support different capabilities.  Those
 
438   issues are discussed in Section 7.
 
450Resnick, et al.              Standards Track                    [Page 8]
 
452RFC 6855                 IMAP Support for UTF-8               March 2013
 
45711.1.  Normative References
 
459   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
 
460              Requirement Levels", BCP 14, RFC 2119, March 1997.
 
462   [RFC3501]  Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
 
463              4rev1", RFC 3501, March 2003.
 
465   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
 
466              10646", STD 63, RFC 3629, November 2003.
 
468   [RFC4013]  Zeilenga, K., "SASLprep: Stringprep Profile for User Names
 
469              and Passwords", RFC 4013, February 2005.
 
471   [RFC4466]  Melnikov, A. and C. Daboo, "Collected Extensions to IMAP4
 
472              ABNF", RFC 4466, April 2006.
 
474   [RFC4469]  Resnick, P., "Internet Message Access Protocol (IMAP)
 
475              CATENATE Extension", RFC 4469, April 2006.
 
477   [RFC5161]  Gulbrandsen, A. and A. Melnikov, "The IMAP ENABLE
 
478              Extension", RFC 5161, March 2008.
 
480   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
 
481              Interchange", RFC 5198, March 2008.
 
483   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
 
484              Specifications: ABNF", STD 68, RFC 5234, January 2008.
 
486   [RFC5258]  Leiba, B. and A. Melnikov, "Internet Message Access
 
487              Protocol version 4 - LIST Command Extensions", RFC 5258,
 
490   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
 
493   [RFC6530]  Klensin, J. and Y. Ko, "Overview and Framework for
 
494              Internationalized Email", RFC 6530, February 2012.
 
496   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
 
497              Email Headers", RFC 6532, February 2012.
 
499   [RFC6857]  Fujiwara, K., "Post-Delivery Message Downgrading for
 
500              Internationalized Email Messages", RFC 6857, March 2013.
 
506Resnick, et al.              Standards Track                    [Page 9]
 
508RFC 6855                 IMAP Support for UTF-8               March 2013
 
511   [RFC6858]  Gulbrandsen, A., "Simplified POP and IMAP Downgrading for
 
512              Internationalized Email", RFC 6858, March 2013.
 
51411.2.  Informative References
 
516   [RFC2088]  Myers, J., "IMAP4 non-synchronizing literals", RFC 2088,
 
519   [RFC2342]  Gahrns, M. and C. Newman, "IMAP4 Namespace", RFC 2342,
 
522   [RFC4314]  Melnikov, A., "IMAP4 Access Control List (ACL) Extension",
 
523              RFC 4314, December 2005.
 
525   [RFC5530]  Gulbrandsen, A., "IMAP Response Codes", RFC 5530,
 
528   [RFC5738]  Resnick, P. and C. Newman, "IMAP Support for UTF-8",
 
529              RFC 5738, March 2010.
 
562Resnick, et al.              Standards Track                   [Page 10]
 
564RFC 6855                 IMAP Support for UTF-8               March 2013
 
567Appendix A.  Design Rationale
 
569   This non-normative section discusses the reasons behind some of the
 
570   design choices in this specification.
 
572   The "UTF8=ONLY" mechanism simplifies diagnosis of interoperability
 
573   problems when legacy support goes away.  In the situation where
 
574   backwards compatibility is not working anyway, the non-conforming
 
575   "just-send-UTF-8 IMAP" has the advantage that it might work with some
 
576   legacy clients.  However, the difficulty of diagnosing
 
577   interoperability problems caused by a "just-send-UTF-8 IMAP"
 
578   mechanism is the reason the "UTF8=ONLY" capability mechanism was
 
581Appendix B.  Acknowledgments
 
583   The authors wish to thank the participants of the EAI working group
 
584   for their contributions to this document, with particular thanks to
 
585   Harald Alvestrand, David Black, Randall Gellens, Arnt Gulbrandsen,
 
586   Kari Hurtta, John Klensin, Xiaodong Lee, Charles Lindsey, Alexey
 
587   Melnikov, Subramanian Moonesamy, Shawn Steele, Daniel Taharlev, and
 
588   Joseph Yee for their specific contributions to the discussion.
 
618Resnick, et al.              Standards Track                   [Page 11]
 
620RFC 6855                 IMAP Support for UTF-8               March 2013
 
625   Pete Resnick (editor)
 
626   Qualcomm Incorporated
 
628   San Diego, CA  92121-1714
 
631   Phone: +1 858 651 4478
 
632   EMail: presnick@qti.qualcomm.com
 
635   Chris Newman (editor)
 
642   EMail: chris.newman@oracle.com
 
647   No.4 South 4th Zhongguancun Street
 
651   Phone: +86 10-58813038
 
652   EMail: shenshuo@cnnic.cn
 
674Resnick, et al.              Standards Track                   [Page 12]