1
2
3
4
5
6
7Network Working Group J. Degener
8Request for Comments: 5173 P. Guenther
9Updates: 5229 Sendmail, Inc.
10Category: Standards Track April 2008
11
12
13
14 Sieve Email Filtering: Body Extension
15
16Status of This Memo
17
18 This document specifies an Internet standards track protocol for the
19 Internet community, and requests discussion and suggestions for
20 improvements. Please refer to the current edition of the "Internet
21 Official Protocol Standards" (STD 1) for the standardization state
22 and status of this protocol. Distribution of this memo is unlimited.
23
24Abstract
25
26 This document defines a new command for the "Sieve" email filtering
27 language that tests for the occurrence of one or more strings in the
28 body of an email message.
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58Degener & Guenther Standards Track [Page 1]
59
60RFC 5173 Sieve Email Filtering: Body Extension April 2008
61
62
631. Introduction
64
65 The "body" test checks for the occurrence of one or more strings in
66 the body of an email message. Such a test was initially discussed
67 for the [SIEVE] base document, but was subsequently removed because
68 it was thought to be too costly to implement.
69
70 Nevertheless, several server vendors have implemented some form of
71 the "body" test.
72
73 This document reintroduces the "body" test as an extension, and
74 specifies its syntax and semantics.
75
762. Conventions Used in This Document
77
78 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
79 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
80 document are to be interpreted as described in [KEYWORDS].
81
82 Conventions for notations are as in [SIEVE] Section 1.1, including
83 the use of the "Usage:" label for the definition of text and tagged
84 argument syntax.
85
86 The rules for interpreting the grammar are defined in [SIEVE] and
87 inherited by this specification. In particular, readers of this
88 document are reminded that according to [SIEVE] Sections 2.6.2 and
89 2.6.3, optional arguments such as COMPARATOR and MATCH-TYPE can
90 appear in any order.
91
923. Capability Identifier
93
94 The capability string associated with the extension defined in this
95 document is "body".
96
974. Test body
98
99 Usage: "body" [COMPARATOR] [MATCH-TYPE] [BODY-TRANSFORM]
100 <key-list: string-list>
101
102 The body test matches content in the body of an email message, that
103 is, anything following the first empty line after the header. (The
104 empty line itself, if present, is not considered to be part of the
105 body.)
106
107 The COMPARATOR and MATCH-TYPE keyword parameters are defined in
108 [SIEVE]. As specified in Sections 2.7.1 and 2.7.3 of [SIEVE], the
109 default COMPARATOR is "i;ascii-casemap" and the default MATCH-TYPE is
110 ":is".
111
112
113
114Degener & Guenther Standards Track [Page 2]
115
116RFC 5173 Sieve Email Filtering: Body Extension April 2008
117
118
119 The BODY-TRANSFORM is a keyword parameter that governs how a set of
120 strings to be matched against are extracted from the body of the
121 message. If a message consists of a header only, not followed by an
122 empty line, then that set is empty and all "body" tests return false,
123 including those that test for an empty string. (This is similar to
124 how the "header" test always fails when the named header fields
125 aren't present.) Otherwise, the transform must be followed as
126 defined below in Section 5.
127
128 Note that the transformations defined here do *not* match against
129 each line of the message independently, so the strings will usually
130 contain CRLFs. How these can be matched is governed by the
131 comparator and match-type. For example, with the default comparator
132 of "i;ascii-casemap", they can be included literally in the key
133 strings, or be matched with the "*" or "?" wildcards of the :matches
134 match-type, or be skipped with :contains.
135
1365. Body Transform
137
138 Prior to matching content in a message body, "transformations" can be
139 applied that filter and decode certain parts of the body. These
140 transformations are selected by a "BODY-TRANSFORM" keyword parameter.
141
142 Usage: ":raw"
143 / ":content" <content-types: string-list>
144 / ":text"
145
146 The default transformation is :text.
147
1485.1. Body Transform ":raw"
149
150 The ":raw" transform matches against the entire undecoded body of a
151 message as a single item.
152
153 If the specified body-transform is ":raw", the [MIME] structure of
154 the body is irrelevant. The implementation MUST NOT remove any
155 transfer encoding from the message, MUST NOT refuse to filter
156 messages with syntactic errors (unless the environment it is part of
157 rejects them outright), and MUST treat multipart boundaries or the
158 MIME headers of enclosed body parts as part of the content being
159 matched against, instead of MIME structures to interpret.
160
161
162
163
164
165
166
167
168
169
170Degener & Guenther Standards Track [Page 3]
171
172RFC 5173 Sieve Email Filtering: Body Extension April 2008
173
174
175 Example:
176
177 require "body";
178
179 # This will match a message containing the literal text
180 # "MAKE MONEY FAST" in body parts (ignoring any
181 # content-transfer-encodings) or MIME headers other than
182 # the outermost RFC 2822 header.
183
184 if body :raw :contains "MAKE MONEY FAST" {
185 discard;
186 }
187
1885.2. Body Transform ":content"
189
190 If the body transform is ":content", the MIME parts that have the
191 specified content types are matched against independently.
192
193 If an individual content type begins or ends with a '/' (slash) or
194 contains multiple slashes, then it matches no content types.
195 Otherwise, if it contains a slash, then it specifies a full
196 <type>/<subtype> pair, and matches only that specific content type.
197 If it is the empty string, all MIME content types are matched.
198 Otherwise, it specifies a <type> only, and any subtype of that type
199 matches it.
200
201 The search for MIME parts matching the :content specification is
202 recursive and automatically descends into multipart and
203 message/rfc822 MIME parts. All MIME parts with matching types are
204 searched for the key strings. The test returns true if any
205 combination of a searched MIME part and key-list argument match.
206
207 If the :content specification matches a multipart MIME part, only the
208 prologue and epilogue sections of the part will be searched for the
209 key strings, treating the entire prologue and the entire epilogue as
210 separate strings; the contents of nested parts are only searched if
211 their respective types match the :content specification.
212
213 If the :content specification matches a message/rfc822 MIME part,
214 only the header of the nested message will be searched for the key
215 strings, treating the header as a single string; the contents of the
216 nested message body parts are only searched if their content type
217 matches the :content specification.
218
219 For other MIME types, the entire part will be searched as a single
220 string.
221
222
223
224
225
226Degener & Guenther Standards Track [Page 4]
227
228RFC 5173 Sieve Email Filtering: Body Extension April 2008
229
230
231 (Matches against container types with an empty match string can be
232 useful as tests for the existence of such parts.)
233
234 Example:
235
236 From: Whomever
237 To: Someone
238 Date: Whenever
239 Subject: whatever
240 Content-Type: multipart/mixed; boundary=outer
241
242 & This is a multi-part message in MIME format.
243 &
244 --outer
245 Content-Type: multipart/alternative; boundary=inner
246
247 & This is a nested multi-part message in MIME format.
248 &
249 --inner
250 Content-Type: text/plain; charset="us-ascii"
251
252 $ Hello
253 $
254 --inner
255 Content-Type: text/html; charset="us-ascii"
256
257 % <html><body>Hello</body></html>
258 %
259 --inner--
260 &
261 & This is the end of the inner MIME multipart.
262 &
263 --outer
264 Content-Type: message/rfc822
265
266 ! From: Someone Else
267 ! Subject: hello request
268
269 $ Please say Hello
270 $
271 --outer--
272 &
273 & This is the end of the outer MIME multipart.
274
275
276
277
278
279
280
281
282Degener & Guenther Standards Track [Page 5]
283
284RFC 5173 Sieve Email Filtering: Body Extension April 2008
285
286
287 In the above example, the '&', '$', '%', and '!' characters at the
288 start of a line are used to illustrate what portions of the example
289 message are used in tests:
290
291 - the lines starting with '&' are the ones that are tested when a
292 'body :content "multipart" :contains "MIME"' test is executed.
293
294 - the lines starting with '$' are the ones that are tested when a
295 'body :content "text/plain" :contains "Hello"' test is executed.
296
297 - the lines starting with '%' are the ones that are tested when a
298 'body :content "text/html" :contains "Hello"' test is executed.
299
300 - the lines starting with '$' or '%' are the ones that are tested
301 when a 'body :content "text" :contains "Hello"' test is executed.
302
303 - the lines starting with '!' are the ones that are tested when a
304 'body :content "message/rfc822" :contains "Hello"' test is
305 executed.
306
307 Comparisons are performed on octets. Implementations decode the
308 content-transfer-encoding and convert text to [UTF-8] as input to the
309 comparator. MIME parts that cannot be decoded and converted MAY be
310 treated as plain US-ASCII, omitted, or processed according to local
311 conventions. A NUL octet (character zero) SHOULD NOT cause early
312 termination of the content being compared against. Implementations
313 MUST support the "quoted-printable", "base64", "7bit", "8bit", and
314 "binary" content transfer encodings. Implementations MUST be capable
315 of converting to UTF-8 the US-ASCII, ISO-8859-1, and the US-ASCII
316 subset of ISO-8859-* character sets.
317
318 Each matched part is matched against independently: search
319 expressions MUST NOT match across MIME part boundaries. MIME headers
320 of the containing part MUST NOT be included in the data.
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338Degener & Guenther Standards Track [Page 6]
339
340RFC 5173 Sieve Email Filtering: Body Extension April 2008
341
342
343 Example:
344
345 require ["body", "fileinto"];
346
347 # Save any message with any text MIME part that contains the
348 # words "missile" or "coordinates" in the "secrets" folder.
349
350 if body :content "text" :contains ["missile", "coordinates"] {
351 fileinto "secrets";
352 }
353
354 # Save any message with an audio/mp3 MIME part in
355 # the "jukebox" folder.
356
357 if body :content "audio/mp3" :contains "" {
358 fileinto "jukebox";
359 }
360
3615.3. Body Transform ":text"
362
363 The ":text" body transform matches against the results of an
364 implementation's best effort at extracting UTF-8 encoded text from a
365 message.
366
367 It is unspecified whether this transformation results in a single
368 string or multiple strings being matched against. All the text
369 extracted from a given non-container MIME part MUST be in the same
370 string.
371
372 In simple implementations, :text MAY be treated the same as :content
373 "text".
374
375 Sophisticated implementations MAY strip mark-up from the text prior
376 to matching, and MAY convert media types other than text to text
377 prior to matching.
378
379 (For example, they may be able to convert proprietary text editor
380 formats to text or apply optical character recognition algorithms to
381 image data.)
382
383 Example:
384 require ["body", "fileinto"];
385
386 # Save messages mentioning the project schedule in the
387 # project/schedule folder.
388 if body :text :contains "project schedule" {
389 fileinto "project/schedule";
390 }
391
392
393
394Degener & Guenther Standards Track [Page 7]
395
396RFC 5173 Sieve Email Filtering: Body Extension April 2008
397
398
3996. Interaction with Other Sieve Extensions
400
401 Any extension that extends the grammar for the COMPARATOR or MATCH-
402 TYPE nonterminals will also affect the implementation of "body".
403
404 Wildcard expressions used with "body" are exempt from the side
405 effects described in [VARIABLES]. That is, they MUST NOT set match
406 variables (${1}, ${2}...) to the input values corresponding to
407 wildcard sequences in the matched pattern. However, if the extension
408 is present, variable references in the key strings or content type
409 strings are evaluated as described in this document.
410
4117. IANA Considerations
412
413 The following template specifies the IANA registration of the Sieve
414 extension specified in this document:
415
416 To: iana@iana.org
417 Subject: Registration of new Sieve extension
418
419 Capability name: body
420 Description: Provides a test for matching against the
421 body of the message being processed
422 RFC number: RFC 5173
423 Contact Address: The Sieve discussion list
424 <ietf-mta-filters@imc.org>
425
4268. Security Considerations
427
428 The system MUST be sized and restricted in such a manner that even
429 malicious use of body matching does not deny service to other users
430 of the host system.
431
432 Filters relying on string matches in the raw body of an email message
433 may be more general than intended. Text matches are no replacement
434 for a spam, virus, or other security related filtering system.
435
4369. Acknowledgments
437
438 This document has been revised in part based on comments and
439 discussions that took place on and off the SIEVE mailing list.
440 Thanks to Cyrus Daboo, Ned Freed, Bob Johannessen, Simon Josefsson,
441 Mark E. Mallett, Chris Markle, Alexey Melnikov, Ken Murchison, Greg
442 Shapiro, Tim Showalter, Nigel Swinson, Dowson Tong, and Christian
443 Vogt for reviews and suggestions.
444
445
446
447
448
449
450Degener & Guenther Standards Track [Page 8]
451
452RFC 5173 Sieve Email Filtering: Body Extension April 2008
453
454
45510. References
456
45710.1. Normative References
458
459 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
460 Requirement Levels", BCP 14, RFC 2119, March 1997.
461
462 [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
463 Extensions (MIME) Part One: Format of Internet Message
464 Bodies", RFC 2045, November 1996.
465
466 [SIEVE] Guenther, P., Ed., and T. Showalter, Ed., "Sieve: An
467 Email Filtering Language", RFC 5228, January 2008.
468
469 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO
470 10646", STD 63, RFC 3629, November 2003.
471
47210.2. Informative References
473
474 [VARIABLES] Homme, K., "Sieve Email Filtering: Variables Extension",
475 RFC 5229, January 2008.
476
477Authors' Addresses
478
479 Jutta Degener
480 5245 College Ave, Suite #127
481 Oakland, CA 94618
482
483 EMail: jutta@pobox.com
484
485
486 Philip Guenther
487 Sendmail, Inc.
488 6425 Christie Ave, 4th Floor
489 Emeryville, CA 94608
490
491 EMail: guenther@sendmail.com
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506Degener & Guenther Standards Track [Page 9]
507
508RFC 5173 Sieve Email Filtering: Body Extension April 2008
509
510
511Full Copyright Statement
512
513 Copyright (C) The IETF Trust (2008).
514
515 This document is subject to the rights, licenses and restrictions
516 contained in BCP 78, and except as set forth therein, the authors
517 retain all their rights.
518
519 This document and the information contained herein are provided on an
520 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
521 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
522 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
523 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
524 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
525 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
526
527Intellectual Property
528
529 The IETF takes no position regarding the validity or scope of any
530 Intellectual Property Rights or other rights that might be claimed to
531 pertain to the implementation or use of the technology described in
532 this document or the extent to which any license under such rights
533 might or might not be available; nor does it represent that it has
534 made any independent effort to identify any such rights. Information
535 on the procedures with respect to rights in RFC documents can be
536 found in BCP 78 and BCP 79.
537
538 Copies of IPR disclosures made to the IETF Secretariat and any
539 assurances of licenses to be made available, or the result of an
540 attempt made to obtain a general license or permission for the use of
541 such proprietary rights by implementers or users of this
542 specification can be obtained from the IETF on-line IPR repository at
543 http://www.ietf.org/ipr.
544
545 The IETF invites any interested party to bring to its attention any
546 copyrights, patents or patent applications, or other proprietary
547 rights that may cover technology that may be required to implement
548 this standard. Please address the information to the IETF at
549 ietf-ipr@ietf.org.
550
551
552
553
554
555
556
557
558
559
560
561
562Degener & Guenther Standards Track [Page 10]
563
564