1
2
3
4
5
6
7Network Working Group D. Goldsmith
8Request for Comments: 2152 Apple Computer, Inc.
9Obsoletes: RFC 1642 M. Davis
10Category: Informational Taligent, Inc.
11 May 1997
12
13
14 UTF-7
15
16 A Mail-Safe Transformation Format of Unicode
17
18Status of this Memo
19
20 This memo provides information for the Internet community. This memo
21 does not specify an Internet standard of any kind. Distribution of
22 this memo is unlimited.
23
24Abstract
25
26 The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as
27 amended) jointly define a character set (hereafter referred to as
28 Unicode) which encompasses most of the world's writing systems.
29 However, Internet mail (STD 11, RFC 822) currently supports only 7-
30 bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends
31 Internet mail to support different media types and character sets,
32 and thus could support Unicode in mail messages. MIME neither defines
33 Unicode as a permitted character set nor specifies how it would be
34 encoded, although it does provide for the registration of additional
35 character sets over time.
36
37 This document describes a transformation format of Unicode that
38 contains only 7-bit ASCII octets and is intended to be readable by
39 humans in the limiting case that the document consists of characters
40 from the US-ASCII repertoire. It also specifies how this
41 transformation format is used in the context of MIME and RFC 1641,
42 "Using Unicode with MIME".
43
44Motivation
45
46 Although other transformation formats of Unicode exist and could
47 conceivably be used in this context (most notably UTF-8, also known
48 as UTF-2 or UTF-FSS), they suffer the disadvantage that they use
49 octets in the range decimal 128 through 255 to encode Unicode
50 characters outside the US-ASCII range. Thus, in the context of mail,
51 those octets must themselves be encoded. This requires putting text
52 through two successive encoding processes, and leads to a significant
53 expansion of characters outside the US-ASCII range, putting non-
54 English speakers at a disadvantage. For example, using UTF-8 together
55
56
57
58Goldsmith & Davis Informational [Page 1]
59
60RFC 2152 UTF-7 May 1997
61
62
63 with the Quoted-Printable content transfer encoding of MIME
64 represents US-ASCII characters in one octet, but other characters may
65 require up to nine octets.
66
67Overview
68
69 UTF-7 encodes Unicode characters as US-ASCII octets, together with ../imapserver/utf7.go:13
70 shift sequences to encode characters outside that range. For this
71 purpose, one of the characters in the US-ASCII repertoire is reserved
72 for use as a shift character.
73
74 Many mail gateways and systems cannot handle the entire US-ASCII
75 character set (those based on EBCDIC, for example), and so UTF-7
76 contains provisions for encoding characters within US-ASCII in a way
77 that all mail systems can accomodate.
78
79 UTF-7 should normally be used only in the context of 7 bit
80 transports, such as mail. In other contexts, straight Unicode or
81 UTF-8 is preferred.
82
83 See RFC 1641, "Using Unicode with MIME" for the overall specification
84 on usage of Unicode transformation formats with MIME.
85
86Definitions
87
88 First, the definition of Unicode:
89
90 The 16 bit character set Unicode is defined by "The Unicode
91 Standard, Version 2.0". This character set is identical with the
92 character repertoire and coding of the international standard
93 ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
94 Subset=300; Implementation Level=3, including the first 7
95 amendments to 10646 plus editorial corrections.
96
97 Note. Unicode 2.0 further specifies the use and interaction of
98 these character codes beyond the ISO standard. However, any valid
99 10646 sequence is a valid Unicode sequence, and vice versa;
100 Unicode supplies interpretations of sequences on which the ISO
101 standard is silent as to interpretation.
102
103 Next, some handy definitions of US-ASCII character subsets:
104
105 Set D (directly encoded characters) consists of the following
106 characters (derived from RFC 1521, Appendix B, which no longer
107 appears in RFC 2045): the upper and lower case letters A through Z
108 and a through z, the 10 digits 0-9, and the following nine special
109 characters (note that "+" and "=" are omitted):
110
111
112
113
114Goldsmith & Davis Informational [Page 2]
115
116RFC 2152 UTF-7 May 1997
117
118
119 Character ASCII & Unicode Value (decimal)
120 ' 39
121 ( 40
122 ) 41
123 , 44
124 - 45
125 . 46
126 / 47
127 : 58
128 ? 63
129
130 Set O (optional direct characters) consists of the following
131 characters (note that "\" and "~" are omitted):
132
133 Character ASCII & Unicode Value (decimal)
134 ! 33
135 " 34
136 # 35
137 $ 36
138 % 37
139 & 38
140 * 42
141 ; 59
142 < 60
143 = 61
144 > 62
145 @ 64
146 [ 91
147 ] 93
148 ^ 94
149 _ 95
150 ' 96
151 { 123
152 | 124
153 } 125
154
155 Rationale. The characters "\" and "~" are omitted because they are
156 often redefined in variants of ASCII.
157
158 Set B (Modified Base 64) is the set of characters in the Base64
159 alphabet defined in RFC 2045, excluding the pad character "="
160 (decimal value 61).
161
162
163
164
165
166
167
168
169
170Goldsmith & Davis Informational [Page 3]
171
172RFC 2152 UTF-7 May 1997
173
174
175 Rationale. The pad character = is excluded because UTF-7 is designed
176 for use within header fields as set forth in RFC 2047. Since the only
177 readable encoding in RFC 2047 is "Q" (based on RFC 2045's Quoted-
178 Printable), the "=" character is not available for use (without a lot
179 of escape sequences). This was very unfortunate but unavoidable. The
180 "=" character could otherwise have been used as the UTF-7 escape
181 character as well (rather than using "+").
182
183 Note that all characters in US-ASCII have the same value in Unicode
184 when zero-extended to 16 bits.
185
186UTF-7 Definition
187
188 A UTF-7 stream represents 16-bit Unicode characters using 7-bit US-
189 ASCII octets as follows:
190
191 Rule 1: (direct encoding) Unicode characters in set D above may be
192 encoded directly as their ASCII equivalents. Unicode characters in
193 Set O may optionally be encoded directly as their ASCII
194 equivalents, bearing in mind that many of these characters are
195 illegal in header fields, or may not pass correctly through some
196 mail gateways.
197
198 Rule 2: (Unicode shifted encoding) Any Unicode character sequence
199 may be encoded using a sequence of characters in set B, when
200 preceded by the shift character "+" (US-ASCII character value
201 decimal 43). The "+" signals that subsequent octets are to be
202 interpreted as elements of the Modified Base64 alphabet until a
203 character not in that alphabet is encountered. Such characters
204 include control characters such as carriage returns and line
205 feeds; thus, a Unicode shifted sequence always terminates at the
206 of a line. As a special case, if the sequence terminates with the
207 character "-" (US-ASCII decimal 45) then that character is
208 absorbed; other terminating characters are not absorbed and are
209 processed normally.
210
211 Note that if the first character after the shifted sequence is "-"
212 then an extra "-" must be present to terminate the shifted
213 sequence so that the actual "-" is not itself absorbed.
214
215 Rationale. A terminating character is necessary for cases where
216 the next character after the Modified Base64 sequence is part of
217 character set B or is itself the terminating character. It can
218 also enhance readability by delimiting encoded sequences.
219
220
221
222
223
224
225
226Goldsmith & Davis Informational [Page 4]
227
228RFC 2152 UTF-7 May 1997
229
230
231 Also as a special case, the sequence "+-" may be used to encode
232 the character "+". A "+" character followed immediately by any
233 character other than members of set B or "-" is an ill-formed
234 sequence.
235
236 Unicode is encoded using Modified Base64 by first converting
237 Unicode 16-bit quantities to an octet stream (with the most
238 significant octet first). Surrogate pairs (UTF-16) are converted
239 by treating each half of the pair as a separate 16 bit quantity
240 (i.e., no special treatment). Text with an odd number of octets is
241 ill-formed. ISO 10646 characters outside the range addressable via
242 surrogate pairs cannot be encoded.
243
244 Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters
245 the UCS-2 form are serialized as octets, that the most significant
246 octet appear first. This is also in keeping with common network
247 practice of choosing a canonical format for transmission.
248
249 Rationale. The policy for code point allocation within ISO 10646
250 and Unicode is that the repertoires be kept synchronized. No code
251 points will be allocated in ISO 10646 outside the range
252 addressable by surrogate pairs.
253
254 Next, the octet stream is encoded by applying the Base64 content
255 transfer encoding algorithm as defined in RFC 2045, modified to
256 omit the "=" pad character. Instead, when encoding, zero bits are
257 added to pad to a Base64 character boundary. When decoding, any
258 bits at the end of the Modified Base64 sequence that do not
259 constitute a complete 16-bit Unicode character are discarded. If
260 such discarded bits are non-zero the sequence is ill-formed.
261
262 Rationale. The pad character "=" is not used when encoding
263 Modified Base64 because of the conflict with its use as an escape
264 character for the Q content transfer encoding in RFC 2047 header
265 fields, as mentioned above.
266
267 Rule 3: The space (decimal 32), tab (decimal 9), carriage return
268 (decimal 13), and line feed (decimal 10) characters may be
269 directly represented by their ASCII equivalents. However, note
270 that MIME content transfer encodings have rules concerning the use
271 of such characters. Usage that does not conform to the
272 restrictions of RFC 822, for example, would have to be encoded
273 using MIME content transfer encodings other than 7bit or 8bit,
274 such as quoted-printable, binary, or base64.
275
276 Given this set of rules, Unicode characters which may be encoded via
277 rules 1 or 3 take one octet per character, and other Unicode
278 characters are encoded on average with 2 2/3 octets per character
279
280
281
282Goldsmith & Davis Informational [Page 5]
283
284RFC 2152 UTF-7 May 1997
285
286
287 plus one octet to switch into Modified Base64 and an optional octet
288 to switch out.
289
290 Example. The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>."
291 (hexadecimal 0041,2262,0391,002E) may be encoded as follows:
292
293 A+ImIDkQ.
294
295 Example. The Unicode sequence "Hi Mom -<WHITE SMILING FACE>-!"
296 (hexadecimal 0048, 0069, 0020, 004D, 006F, 006D, 0020, 002D, 263A,
297 002D, 0021) may be encoded as follows:
298
299 Hi Mom -+Jjo--!
300
301 Example. The Unicode sequence representing the Han characters for
302 the Japanese word "nihongo" (hexadecimal 65E5,672C,8A9E) may be
303 encoded as follows:
304
305 +ZeVnLIqe-
306
307Use of Character Set UTF-7 Within MIME
308
309 Character set UTF-7 is safe for mail transmission and therefore may
310 be used with any content transfer encoding in MIME (except where line
311 length and line break restrictions are violated). Specifically, the 7
312 bit encoding for bodies and the Q encoding for headers are both
313 acceptable. The MIME character set tag is UTF-7. This signifies any
314 version of Unicode equal to or greater than 2.0.
315
316 Example. Here is a text portion of a MIME message containing the
317 Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (hexadecimal 0048,
318 0069, 0020, 004D, 006F, 006D, 0020, 263A, 0021).
319
320 Content-Type: text/plain; charset=UTF-7
321
322 Hi Mom +Jjo-!
323
324 Example. Here is a text portion of a MIME message containing the
325 Unicode sequence representing the Han characters for the Japanese
326 word "nihongo" (hexadecimal 65E5,672C,8A9E).
327
328 Content-Type: text/plain; charset=UTF-7
329
330 +ZeVnLIqe-
331
332 Example. Here is a text portion of a MIME message containing the
333 Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (hexadecimal
334 0041,2262,0391,002E).
335
336
337
338Goldsmith & Davis Informational [Page 6]
339
340RFC 2152 UTF-7 May 1997
341
342
343 Content-Type: text/plain; charset=utf-7
344
345 A+ImIDkQ.
346
347 Example. Here is a text portion of a MIME message containing the
348 Unicode sequence "Item 3 is <POUND SIGN>1." (hexadecimal 0049,
349 0074, 0065, 006D, 0020, 0033, 0020, 0069, 0073, 0020, 00A3, 0031,
350 002E).
351
352 Content-Type: text/plain; charset=UTF-7
353
354 Item 3 is +AKM-1.
355
356 Note that to achieve the best interoperability with systems that may
357 not support Unicode or MIME, when preparing text for mail
358 transmission line breaks should follow Internet conventions. This
359 means that lines should be short and terminated with the proper SMTP
360 CRLF sequence. Unicode LINE SEPARATOR (hexadecimal 2028) and
361 PARAGRAPH SEPARATOR (hexadecimal 2029) should be converted to SMTP
362 line breaks. Ideally, this would be handled transparently by a
363 Unicode-aware user agent.
364
365 This preparation is not absolutely necessary, since UTF-7 and the
366 appropriate MIME content transfer encoding can handle text that does
367 not follow Internet conventions, but readability by systems without
368 Unicode or MIME will be impaired. See RFC 2045 for a discussion of
369 mail interoperability issues.
370
371 Lines should never be broken in the middle of a UTF-7 shifted
372 sequence, since such sequences may not cross line breaks. Therefore,
373 UTF-7 encoding should take place after line breaking. If a line
374 containing a shifted sequence is too long after encoding, a MIME
375 content transfer encoding such as Quoted Printable can be used to
376 encode the text. Another possibility is to perform line breaking and
377 UTF-7 encoding at the same time, so that lines containing shifted
378 sequences already conform to length restrictions.
379
380Discussion
381
382 In this section we will motivate the introduction of UTF-7 as opposed
383 to the alternative of using the existing transformation formats of
384 Unicode (e.g., UTF-8) with MIME's content transfer encodings. Before
385 discussing this, it will be useful to list some assumptions about
386 character frequency within typical natural language text strings that
387 we use to estimate typical storage requirements:
388
389 1. Most Western European languages use roughly 7/8 of their letters
390 from US-ASCII and 1/8 from Latin 1 (ISO-8859-1).
391
392
393
394Goldsmith & Davis Informational [Page 7]
395
396RFC 2152 UTF-7 May 1997
397
398
399 2. Most non-Roman alphabet-based languages (e.g., Greek) use about
400 1/6 of their letters from ASCII (since white space is in the 7-bit
401 area) and the rest from their alphabets.
402
403 3. East Asian ideographic-based languages (including Japanese) use
404 essentially all of their characters from the Han or CJK syllabary
405 area.
406
407 4. Non-directly encoded punctuation characters do not occur
408 frequently enough to affect the results.
409
410 Notice that current 8 bit standards, such as ISO-8859-x, require use
411 of a content transfer encoding. For comparison with the subsequent
412 discussion, the costs break down as follows (note that many of these
413 figures are approximate since they depend on the exact composition of
414 the text):
415
416 8859-x in Base64
417
418 Text type Average octets/character
419 All 1.33
420
421 8859-x in Quoted Printable
422
423 Text type Average octets/character
424 US-ASCII 1
425 Western European 1.25
426 Other 2.67
427
428 Note also that Unicode encoded in Base64 takes a constant 2.67 octets
429 per character. For purposes of comparison, we will look at UTF-8 in
430 Base64 and Quoted Printable, and UTF-7. Also note that fixed overhead
431 for long strings is relative to 1/n, where n is the encoded string
432 length in octets.
433
434 UTF-8 in Base64
435
436 Text type Average octets/character
437 US-ASCII 1.33
438 Western European 1.5
439 Some Alphabetics 2.44
440 All others 4
441
442
443
444
445
446
447
448
449
450Goldsmith & Davis Informational [Page 8]
451
452RFC 2152 UTF-7 May 1997
453
454
455 UTF-8 in Quoted Printable
456
457 Text type Average octets/character
458 US-ASCII 1
459 Western European 1.63
460 Some Alphabetics 5.17
461 All others 7-9
462
463 UTF-7
464
465 Text type Average octets/character
466 Most US-ASCII 1
467 Western European 1.5
468 All others 2.67+2/n
469
470 We feel that the UTF-8 in Quoted Printable option is not viable due
471 to the very large expansion of all text except Western European. This
472 would only be viable in texts consisting of large expanses of US-
473 ASCII or Latin characters with occasional other characters
474 interspersed. We would prefer to introduce one encoding that works
475 reasonably well for all users.
476
477 We also feel that UTF-8 in Base64 has high expansion for non-
478 Western-European users, and is less desirable because it cannot be
479 read directly, even when the content is largely US-ASCII. The base
480 encoding of UTF-7 gives competitive results and is readable for ASCII
481 text.
482
483 UTF-7 gives results competitive with ISO-8859-x, with access to all
484 of the Unicode character set. We believe this justifies the
485 introduction of a new transformation format of Unicode.
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506Goldsmith & Davis Informational [Page 9]
507
508RFC 2152 UTF-7 May 1997
509
510
511 As an alternative to use of UTF-7, it might be possible to intermix
512 Unicode characters with other character sets using an existing MIME
513 mechanism, the multipart/mixed content type, ignoring for the moment
514 the issues with line breaks (thanks to Nathaniel Borenstein for
515 suggesting this). For instance (repeating an earlier example):
516
517 Content-type: multipart/mixed; boundary=foo
518 Content-Disposition: inline
519
520 --foo
521 Content-type: text/plain; charset=us-ascii
522
523 Hi Mom
524 --foo
525 Content-type: text/plain; charset=UNICODE-2-0
526 Content-transfer-encoding: base64
527
528 Jjo=
529 --foo
530 Content-type: text/plain; charset=us-ascii
531
532 !
533 --foo--
534
535 Theoretically, this removes the need for UTF-7 in message bodies
536 (multipart may not be used in header fields). However, we feel that
537 as use of the Unicode character set becomes more widespread,
538 intermittent use of specialized Unicode characters (such as dingbats
539 and mathematical symbols) will occur, and that text will also
540 typically include small snippets from other scripts, such as
541 Cyrillic, Greek, or East Asian languages (anything in the Roman
542 script is already handled adequately by existing MIME character
543 sets). Although the multipart technique works well for large chunks
544 of text in alternating character sets, we feel it does not adequately
545 support the kinds of uses just discussed, and so we still believe the
546 introduction of UTF-7 is justified.
547
548Summary
549
550 The UTF-7 encoding allows Unicode characters to be encoded within the
551 US-ASCII 7 bit character set. It is most effective for Unicode
552 sequences which contain relatively long strings of US-ASCII
553 characters interspersed with either single Unicode characters or
554 strings of Unicode characters, as it allows the US-ASCII portions to
555 be read on systems without direct Unicode support.
556
557 UTF-7 should only be used with 7 bit transports such as mail. In
558 other contexts, use of straight Unicode or UTF-8 is preferred.
559
560
561
562Goldsmith & Davis Informational [Page 10]
563
564RFC 2152 UTF-7 May 1997
565
566
567Acknowledgements
568
569 Many thanks to the following people for their contributions,
570 comments, and suggestions. If we have omitted anyone it was through
571 oversight and not intentionally.
572
573 Glenn Adams
574 Harald T. Alvestrand
575 Nathaniel Borenstein
576 Lee Collins
577 Jim Conklin
578 Dave Crocker
579 Steve Dorner
580 Dana S. Emery
581 Ned Freed
582 Kari E. Hurtta
583 John H. Jenkins
584 John C. Klensin
585 Valdis Kletnieks
586 Keith Moore
587 Masataka Ohta
588 Einar Stefferud
589 Erik M. van der Poel
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618Goldsmith & Davis Informational [Page 11]
619
620RFC 2152 UTF-7 May 1997
621
622
623Appendix A -- Examples
624
625 Here is a longer example, taken from a document originally in Big5
626 code. It has been condensed for brevity. There are two versions: the
627 first uses optional characters from set O (and so may not pass
628 through some mail gateways), and the second does not.
629
630 Content-type: text/plain; charset=utf-7
631
632 Below is the full Chinese text of the Analects (+itaKng-).
633
634 The sources for the text are:
635
636 "The sayings of Confucius," James R. Ware, trans. +U/BTFw-:
637 +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)
638
639 +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.
640
641 "The Chinese Classics with a Translation, Critical and Exegetical
642 Notes, Prolegomena, and Copius Indexes," James Legge, trans., Taipei:
643 Southern Materials Center Publishing, Inc., 1991. (Chinese text with
644 English translation)
645
646 Big Five and GB versions of the text are being made available
647 separately.
648
649 Neither the Big Five nor GB contain all the characters used in this
650 text. Missing characters have been indicated using their Unicode/ISO
651 10646 code points. "U+-" followed by four hexadecimal digits
652 indicates a Unicode/10646 code (e.g., U+-9F08). There is no good
653 solution to the problem of the small size of the Big Five/GB
654 character sets; this represents the solution I find personally most
655 satisfactory.
656
657 (omitted...)
658
659 I have tried to minimize this problem by using variant characters
660 where they were available and the character actually in the text was
661 not. Only variants listed as such in the +XrdxmVtXUXg- were used.
662
663 (omitted...)
664
665 John H. Jenkins +TpVPXGBG- jenkins@apple.com 5 January 1993
666 (omitted...)
667
668 Content-type: text/plain; charset=utf-7
669
670 Below is the full Chinese text of the Analects (+itaKng-).
671
672
673
674Goldsmith & Davis Informational [Page 12]
675
676RFC 2152 UTF-7 May 1997
677
678
679 The sources for the text are:
680
681 +ACI-The sayings of Confucius,+ACI- James R. Ware, trans. +U/BTFw-:
682 +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)
683
684 +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.
685
686 +ACI-The Chinese Classics with a Translation, Critical and Exegetical
687 Notes, Prolegomena, and Copius Indexes,+ACI- James Legge, trans.,
688 Taipei: Southern Materials Center Publishing, Inc., 1991. (Chinese
689 text with English translation)
690
691 Big Five and GB versions of the text are being made available
692 separately.
693
694 Neither the Big Five nor GB contain all the characters used in this
695 text. Missing characters have been indicated using their Unicode/ISO
696 10646 code points. +ACI-U+-+ACI- followed by four hexadecimal digits
697 indicates a Unicode/10646 code (e.g., U+-9F08). There is no good
698 solution to the problem of the small size of the Big Five/GB
699 character sets+ADs- this represents the solution I find personally
700 most satisfactory.
701
702 (omitted...)
703
704 I have tried to minimize this problem by using variant characters
705 where they were available and the character actually in the text was
706 not. Only variants listed as such in the +XrdxmVtXUXg- were used.
707 (omitted...)
708
709 John H. Jenkins +TpVPXGBG- jenkins+AEA-apple.com 5 January 1993
710 (omitted...)
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730Goldsmith & Davis Informational [Page 13]
731
732RFC 2152 UTF-7 May 1997
733
734
735Security Considerations
736
737 Security issues are not discussed in this memo.
738
739References
740
741[UNICODE 2.0] "The Unicode Standard, Version 2.0", The Unicode
742 Consortium, Addison-Wesley, 1996. ISBN 0-201-48345-9.
743
744[ISO 10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal
745 Multiple-octet Coded Character Set (UCS). See also
746 amendments 1 through 7, plus editorial corrections.
747
748[RFC-1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME",
749 RFC 1641, Taligent, Inc., July 1994.
750
751[US-ASCII] Coded Character Set--7-bit American Standard Code for
752 Information Interchange, ANSI X3.4-1986.
753
754[ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic
755 Character Sets -- Part 1: Latin Alphabet No. 1, ISO
756 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,
757 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.
758 Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5:
759 Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6:
760 Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7:
761 Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:
762 Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin
763 alphabet No. 5, ISO 8859-9, 1990.
764
765[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
766 Text Messages", STD 11, RFC 822, UDEL, August 1982.
767
768[MIME] Borenstein N., N. Freed, K. Moore, J. Klensin, and J.
769 Postel, "MIME (Multipurpose Internet Mail Extensions)
770 Parts One through Five", RFC 2045, 2046, 2047, 2048, and
771 2049, November 1996.
772
773Authors' Addresses
774
775 David Goldsmith
776 Apple Computer, Inc.
777 2 Infinite Loop, MS: 302-2IS
778 Cupertino, CA 95014
779
780 Phone: 408-974-1957
781 Fax: 408-862-4566
782 EMail: goldsmith@apple.com
783
784
785
786Goldsmith & Davis Informational [Page 14]
787
788RFC 2152 UTF-7 May 1997
789
790
791 Mark Davis
792 Taligent, Inc.
793 10201 N. DeAnza Blvd.
794 Cupertino, CA 95014-2233
795
796 Phone: 408-777-5116
797 Fax: 408-777-5081
798 EMail: mark_davis@taligent.com
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842Goldsmith & Davis Informational [Page 15]
843
844