Network Working Group Julian Onions INTERNET DRAFT Nexor Ltd March 27, 1995 How to be a Bad EMail Citizen 1. Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. (The file 1id-abstracts.txt on nic.ddn.mil describes the current status of each Internet Draft.) It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "work in progress". This draft is known as draft-onions-822-mailproblems-01. 2. Abstract The internet consists of many hosts and many implementations of each protocol suite. There are no formal tests or approval mechanisms associated with membership of the internet, and therefore there are very varied levels of conformance to the various standards. This document intends to describe some of the common problems, mistakes and errors that are made in electronic mail. Most of them are easily avoidable, and some guidance on what to do in each case is given here. Some of these guidelines are pragmatic, some are mandated by other standards, and others are religious. 3. Introduction There are various documents around the internet that define the way mail should behave, what is mandatory, what is optional and what is forbidden. Adherence to these standards across implementations is at best patchy, and with no overseeing body the only enforcement to the standards are peer pressure and possible lack of service. This document started out as an attempt to summarise the various ways the fairly straight-forward mail protocol can be mis-implemented. This document tries to point out cases where common usage has extended the standard (unofficially) and where implementations are wrong. This is Onions Expires Sep 30, 1995 [Page 1] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 somewhat analagous to defining "correct" english. Is it what is defined, or is it like y'know what people kinda say? 4. Scope This document restricts itself to the standards defined in RFC-821 (SMTP), RFC-822, RFC-1123 (Host Requirements), RFC-1521 (MIME) and RFC-1651 (SMTP Extensions). Currently other documents are not considered. 5. Issues concerning SMTP 5.1. The RSET Command RFC-821 is not specific about exactly what the RSET verb resets. This has apparently not been a problem in the past because of the simplicity of the protocol. With the publication of extensions to the SMTP protocol with additional commands and state information, making a more precise definition desirable. The definition provided should not constrain any existing RFC-821 implementation since it is consistent with both the current practice and the only two plausible interpretations. RSET is to be interpreted by SMTP servers as clearing state information present in a session. In particular, it eliminates the effect of any prior FROM commands, any DATA, and any delivery addresses. It resets the server's state to "not a mail transaction". This implies it is in the state after the HELO and before the MAIL verb. RSET has been interpreted by some SMTP servers as requiring that a new HELO command be sent after RSET is acknowledged. Other servers assume that the previous HELO is not reset. Servers SHOULD accept a HELO command subsequent to RSET without special comment, overriding a previous one if necessary. Servers MUST NOT require a HELO command after a RSET. Worse still are servers that drop the connection on an RSET verb being issued. As some clients issue an RSET whenever they feel like it (at the end of a transfer for instance), such behaviour is very inefficient. The description above summarizes the current situation with SMTP implementations based on a series of experiments. No implementations have been identified that rejects a second HELO, but it would not be surprising to find one. Onions Expires Sep 30, 1995 [Page 2] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 5.2. Duplication of single state verbs. Whilst some of the SMTP state-inducing verbs may be repeated and arbitrary number of times (such as RCPT for multi-destination) other verbs (such as MAIL) may only be issued once per transaction. If a second occurrence of state-inducing verb is detected, a server MAY either accept it, overriding earlier information, or may reject it as an out-of-sequence command with a "503 bad sequence of commands" code. A client sending multiple of these commands within a mail transaction MUST be prepared to send a RSET and start over, or to send QUIT and abandon the session, if 503 is received in this case. Clients SHOULD, if possible, behave in a way that avoids this situation. The issues above do not arise in the normal case of multiple successful message transmissions in the same session, since each successful message completion (i.e., server receipt of DATA, the message, CR LF . CR LF, and then sending a positive completion reply) results in terminating a mail transaction. Clients SHOULD NOT send RSET after receipt of a 250 response after DATA and the message; servers MUST reset their states after sending that 250 response and MUST NOT require clients to send RSET before the next MAIL FROM command 5.3. Behavior with unrecognized verbs. While it is not quite explicit, RFC-821 appears to expect that, if a verb is not recognized by the receiver, it will reject the command with a "permanent error", 5yz, code, presumably 500 (Syntax error). Similarly, it appears to specify that, if the sender receives such a code, it must either abandon the mail message (sending QUIT or RSET, presumably) or do something else involving the same or a different verb; it may not simply ignore the 5yz error code and pretend it was a 2yz (or 354) code. This specification depends on that behavioral model. As defined in RFC-821, existing SMTP servers MUST reply with a return code of 500 (Syntax error) when any unfamiliar verb is received. The material above should probably have made it into RFC-1123, but some of the issues -- particularly the fact that anyone could ever have believed that anything else (such as simply ignoring 5xx codes) was permitted--have emerged only in the process of this investigation. Nonetheless, this clarification is believed to be consistent with existing usage and implementations of SMTP. Onions Expires Sep 30, 1995 [Page 3] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 5.4. Behaviour with eight-bit data RFC-821 together with RFC-822 is unambiguous in this respect. Unless an extension to RFC-821 is in force for the mail transaction, eight- bit data may not be sent. Period. This point just needs emphasising. It is present in the original documents, but not spelled out. If an implementation receives eight-bit data there are various things it may do. These come down to one of the following, and depend on how charitable you wish to be, or how much political pressure is placed on you. 1. Bounce the message. This is the correct thing to do. The message is a protocol violation and should therefore not be acceptable. Send it back! 2. Strip the 8th bit. This ends up making the message conformant, but may well loose information. It is an interesting approach as it lets the mail through - albeit distorted. The problems usually end up on the recipients desk - whereas it is the senders fault. 3. Convert the message. If you are very charitable, you may consider converting the message to 7bit data by making the message into a MIME message (if it is not already) and then applying a content-transfer-encoding. This has a down side that your machine is having to expend considerable effort fixing up other peoples mistakes. All but the first action are compromises. The fault is that of the originator and they should have to bear the pain. With all but case 1, it is either the recipient, or an intermediate machine that has to suffer. 5.5. Error reports with eight-bit data Some implementations will return the original message as part of a delivery report. Care needs to be taken in this case that the reason for failure was that eight-bit data was present. Otherwise it is possible to construct an illegal eight-bit message as an error report to an eight-bit message. As error reports and messages cannot be easily distinguished in RFC821, all messages (including error messages) appear as standard messages, and therefore need to be correct RFC822 messages. Onions Expires Sep 30, 1995 [Page 4] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 5.6. SMTP MAIL FROM response. The use of the error 550 (Address Unknown) is not documented in RFC- 821 as a response to the MAIL FROM verb. However this seems to be the most appropriate response to a client that tries to give an unrouteable mail origintor. Therefore this code should be taken as a permanent rejection. It implies that the mail will not be accepted as the MTA does not know how to route error reports. Some MTAs care about this, others do not. 5.7. Rejection of SMTP connections due to DNS failure. There are a number of SMTP implementations that either do, or can be configured, to reject SMTP connections if the calling host is not registered in the DNS. This is seen by some as a breaking of the spirit of RFC-1123, and by others as a useful get-out-of-jail card. Regardless of whether this is a good idea or a bad one, the fact remains this is practiced by some sites. Implementors are therefore encouraged to use back up MX routing in the case of a connection that succeeds but no data is received before the connection is dropped. This topic has been debated a number of times on the Internet with both sides sticking to their views. There is no sense in continuing to try and standardise this point. What a site will do with any internet connection from any host eventually comes down to what the administrator at that site decides. If they don't want to talk to a given set of hosts, that may be their loss. With the increasing emphasis on security though, the fact that a site advertises an MX or A record in the DNS does not imply it will talk to all callers. 5.8. EHLO commands There are one or two servers that respond badly to EHLO commands. That is they either set themselves into inconsistent states, or else drop the connection at once. The RFC is fairly clear that unknown commands should be rejected but otherwise ignored. A resilient server MAY detect that the EHLO caused the connection to drop and immediately retry the connection with a HELO verb in place. Alternatively it can be treated as a bad connection and subsequent MX records tried if available. However servers SHOULD NOT drop the connection in response to an unknown verb. 5.9. SMTP Multi-line responses. Both SMTP client and server implementations should be prepared to take multi-line responses at any part of the transaction. In particular a multi-line HELO/EHLO response should be acceptable. Onions Expires Sep 30, 1995 [Page 5] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 5.10. SMTP use of other services. When using SMTP in a restrictive environment, care should be taken that associated services are allowed. For instance, use of the DNS is often required for incoming SMTP servers, and IDENT may also be in use. Care must be taken when constructing firewalls that these protocols are allowed to penetrate if they are required. 5.11. Error reports If an error is detected at the MTA level, such as non-delivery, protocol failure or some such, an error message should be sent to the parameter passed in the MAIL FROM field. The Reply-To, From and Sender fields should never be used for such purposes. These fields are for the sole use of the end-user recipient and their user agent software. They should be no concern of the MTA. On a related note, if a list exapander is used, the MAIL FROM parameter should be set to something other than the mail list name. Otherwise error reports can loop with exponential explosive force. The standard technique is when expanding list to originate the message from -request (or sometimes -owner). This address should resolve to a single individual who can be charged with maintaining the list. 6. RFC-822 Issues 6.1. Illegal format RFC-822 messages Some implementations of RFC-821 check the message for adherence to RFC-822 minimum requirements as the message is received. These are that the message contains in the header a From field, a Date field and a recipient field of some type. However, some user agents use RFC-821 as a submission protocol and assume that messages will be made legal RFC-822 as part of the submission process (as some MTA's already do this). Implementations MAY therefore allow strictly illegal RFC-822 messages as data and make them legal by addition of new headers, or MAY reject the message as illegal data. Some User Agents, particularly those on PC's find it difficult to determine an accurate time to provide a Date field, and therefore leave it out. It is harmless enough to insert such a field when acting as a submission channel, but inserting a Date mid way through a multi-hop delivery path is mis-leading and should be discouraged. However, in practice it is difficult to determine the two modes RFC- 821 is used in, so usually a blanket decision concerning all transfers has to be made. What is really required is a submission protocol tailored for this sort of behaviour that can take a partial Onions Expires Sep 30, 1995 [Page 6] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 RFC-822 message and add the appropriate envelope bits. 6.2. RFC-822 addresses RFC-822 addresses are more complex than might appear at first sight. The simple use of @ is by far the most prevelant, but when writing a parser, care must be taken with the more unusual format of addresses. Many implementations deal very badly, or not at all with the more unusual forms of addresses. Take the following cases, which all seem to cause one or another implementation indigestion. @host,host2:user@host3 "J.Onions"@nexor.co.uk List: user@host.domain; (comment) JOnions (comment) @ (comment) nexor.co.uk (comment) \"f\"@b.foo Other useful examples are in Appendix A.3 in RFC-822. Worse than these are the psuedo-legal addresses that get produced. An implementation should be able to deal with such horrors as the following, where "deal with" is to do something other than crash and burn. user::something@foo.bar @host.domain:auser user!fred@domain.ref Another item to beware of is very large To/Cc lines. Some attempt list expansion by a macro expansion of the To: field. This can lead to 1000 or more addresses appearing in the To: line. Whilst this is a bad idea to do, implementations should be careful of using fixed buffer sizes - such header lines can be 250 lines long! 6.3. Received Lines The syntax of the Received: lines in RFC-822 messages is reasonably straight forward. It requires as a minimum a date stamp following a semi-colon. Unfortunately some implementations cannot seem to generate this. This can cause problems when gatewaying to other systems that also have trace fields. This is seen as a good way to cause general confusion when tracking messages. When gatewaying or examining these elements, the invalid elements should either be discarded or else the current time inserted to make them legal. The illegal Received: lines can be changed to be Orig- Received: to ensure the relayed message is now legal. Note that the received line should contain fully qualified host names if any are present, rather than just local references. This aids in tracking the message, makes for better gatewaying and stop confusion. Onions Expires Sep 30, 1995 [Page 7] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 Note also that the "id" construct takes a msg-id construct which is of the form "<" addr-spec ">". 6.4. Comments in Received lines A strict reading of RFC-822 shows that comments are not legal within Received lines. However so many implementations insert them, and the host requirements document encourages them to be added under certain conditions, that it is wise to consider the EBNF to be extended to allow comment fields at any point in the line. A comment should be taken as the EBNF construct from RFC-822. This is basically some text found between parenthesis, such as: Received: from foo.bar (actually xyz.foo.bar) by lancaster.nexor.co.uk id with SMTP (v1.1) ; Mon, 20 Feb 1995 12:28:53 +0100 6.5. Date fields. Date fields are usually fairly standard, although there are implementations that strike out with new and novel formats. However, when it comes to the area of time zones there is little limitation in the imagination of implementors. Normally time zones should be numeric as these are unambiguous. It should be down to the user agent to display the Date in a ``pretty'' format. Just say NO to pretty, arbitrary timezones! All UAs should generate numeric offsets for timezones. How they are displayed is up to the UA however. 6.6. Resent- fields RFC-822 allows the pseudo-forwarding of messages by amending the header of a message to contain new recipients. This is done by adding headers such as Resent-To: abc@domain.name Resent-Date: Sun, 1 Jan 1995 02:24 +0000 Resent-From: xyz@foo.bar It is not clear in RFC-822 if when resending a message a complete set of headers is required. The standard would seem to imply that they are but no grammar is present which mandates it. Therefore implementations vary on how to treat this type of message. Strict implementations will on detection of a Resent- field, conclude that this is a resent message, and therefore should be using the Resent- versions of the fields as opposed to the standard forms. In Onions Expires Sep 30, 1995 [Page 8] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 this case a message without a Resent-From, a Resent-Date and a Resent- recipient field is illegal. It is assumed that the message has been resent but with only a partially correct header. Other implementations take the view that a Resent- field is a higher weighted form of the original field. That is, a Resent-Date should be used in preference to a Date field, but as long as a Date, From and Recipient field is present with or without the resent- prefix the message is legal. The first view treats the resent- as a new overriding SET of headers, the second as individual replacements for fields. Either case could be argued, as the original text is unclear. For pragmatic reasons, and because it seems closer to the intent of RFC-822 in this case, the Resent- fields should be taken as a set. However implementations SHOULD allow the individual fields. In practice this sort of forwarding is not very common, but does arise. 6.7. Separator between RFC-822 Header and Content. RFC-822 defines a separator of a single blank line to delimit the header from the body of the message. It seems apparent that some implementations either have problems generating a blank line or problems reading the standard. The definition is a single blank line and hence consists of the characters CR LF CR LF. Any other sequence is difficult to parse reliably, so the string CR LF CR LF MUST always be used. Whilst one can think up possible heuristic parsing techniques to allow sort-of- blank lines - they are error prone. In particular a line consisting solely of white space is a valid continuation line, and may be used. Consider the following: Subject: Hello How are you This is a legal subject line, spread across three lines. In isolation it may look like the end of a header and the start of a message content. If an implementation applies heuristics to the delimiter it may well get confused by such cases. If the following line started To: it would be clear this was a header line continuation. If it started with some non white space text without a colon in it anywhere, this is probably a mistyped separator. If there is a colon in the next line, you can start guessing if it looks like a header field, or a piece of text. To try and circumvent the problem - use of header continuation lines consisting solely of white space should be strongly discouraged. You Onions Expires Sep 30, 1995 [Page 9] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 never know what a helpful mailer might do with it! 7. MIME issues. MIME since its inception has allowed implementations of MTAs and UAs to further the cause of havoc and generally increase entropy. The number of ways that it is possible to get this specification wrong is truely astounding! In general an MTA can treat badly formatted MIME as a text/plain format and punt the whole problem to the UA. The UA will take a number of views: a) It will crash and burn. b) It will complain the message is illegal and refuse to show it. c) It won't care and show you the message, warts and all. d) It will ignore the message, and you will never even know you have received the message. The best approach is to be able to flag an error and then revert to action c) above. This may upset some naive mail users (who seem to be predominantly upper management and therefore dangerous to upset!). 7.1. Badly formatted Content-Type: fields Implementations have been known to produce lines of the form MIME-version: 1.0 Content-Type: text That is, a MIME type, without the mandatory subtype. This is illegal as a MIME header and means the content may be subject to misinterpretation. In these cases the most pragmatic case is to treat the message as text/plain, regardless of what the Content-type might indicate. However, outright rejection of the message is also an option. (The author feels a system that rejects every other such message may have merits in forcing systems to be upgraded.) 7.2. Multiple Content-Type fields Messages may contain multiple Content-Type fields, sometimes containing contradictory information. Where this happens this may again cause contents to be misrepresented, or misprocessed. For instance: MIME-version: 1.0 Content-Type: multipart/mixed; boundary="---" Onions Expires Sep 30, 1995 [Page 10] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 MIME-version: 1.0 Content-Type: text/plain As for the badly formatted contents type. If two Content-Type fields are present, and contain the same information, that case MAY be treated as just one Content-Type field. 7.3. Badly structured multipart messages Message that contain fields such as Content-Type: multipart/mixed have some great potential for causing indigestion in mail systems. The missing boundary string means that although the message is split into multiple parts, there is no way a process can reconstruct the message in general. It is charitable to believed that these type of messages start out with good intentions, but loose their boundary markers somewhere in flight. Whilst an intelligent human can scan the body part and make an educated guess at what the separator is, this is not generally possible for a program. 7.4. Wrapped lines Another interesting little problem is where a UA, or MTA has helpfully wrapped the text of the field to improve readability. Some interesting examples are presented here. Content-Type: multipart/mixed; boundary="message -separator" Content-Type: multipart/mixed; boundary="abcdefghijklmno: boundary:fixed01" The first case is debateably correct input, although few MTA/UAs will be able to reconstruct the correct separator. The second case is illegal, ambiguous and awkward to treat well. Why do people do this! The road to hell is paved with good intentions. In both cases little should be done to try and reconstruct the message without human help. 7.5. MIME prologue and Epilogue text A number of systems and hand constructed messages put text into the prologue and epilogue of MIME multipart messages. Whilst this is a neat trick for allowing non-mime UAs to inform the user why the message appears as garbage, the prologue/epilogue does not really Onions Expires Sep 30, 1995 [Page 11] INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995 exist as part of a message. Therefore when gatewaying or simply processing such messages, these components may disappear. Alternatively they may appear as new body parts after transformation. Therefore whilst you can do it, don't be surprised if it fails to appear at the other end. 7.6. MIME and UUENCODE Whilst a type and/or content-transfer-encoding for uuencoded documents could be defined, there is currently not one. Therefore uuencoded documents are not on their own valid MIME messages. They use to be one of the better ways to transfer documents before the advent of MIME, but now there are better mechanisms. Uuencoded contents are known to suffer from problems at gateways. 8. Acknowledgements This document represents a collection of the experiences and hard-won battle scars from a large community of people. All implementors of SMTP mail systems will have had some influence on this document. In particular there are a number of points taken from the work done in the smtp extensions working group. This document is a summary of some of the discussions, and other experiences. Some of this text is taken from an earlier draft of the SMTP working group document. The following people made useful comments on an earlier version of this document: Patrik Faltstrom, Tomas Kullman, Peter Sylvester, Harald Alvestrand, Steve Dorner 9. Security Considerations Security considerations are not discussed in this memo. 10. Editor's Address Julian Onions Nexor Ltd. PO Box 132, Nottingham, England. Onions Expires Sep 30, 1995 [Page 12]