Date: 26 Nov 2002 13:41:36 -0800 From: swear@attbi.com (Gary W. Swearingen) To: Roman Neuhauser <neuhauser@bellavista.cz> Cc: freebsd-questions@FreeBSD.ORG, Giorgos Keramidas <keramida@ceid.upatras.gr> Subject: Re: Find abandoned packages Message-ID: <a44ra46mb3.ra4@localhost.localdomain> In-Reply-To: <20021126065739.GL77198@freepuppy.bellavista.cz> References: <000801c2915e$be8907c0$6400a8c0@windows> <9eel9eaber.l9e@localhost.localdomain> <20021125091339.GR77198@freepuppy.bellavista.cz> <tpfztp8m6a.ztp@localhost.localdomain> <20021126065739.GL77198@freepuppy.bellavista.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
Roman Neuhauser <neuhauser@bellavista.cz> writes: > 4. You have already shown that you (falsely) think MIME email == > HTML email. I surely didn't think that, but HTML was all I mentioned because I did (falsely) think that MIME email was almost always either HTML or complex stuff like MSFT Word or multipart things with images, etc. I didn't know it could be as simple as a non-MIME message with a "MIME-Version: 1.0" header inserted, with or without a "Content-type: text/plain; charset=something" header. Per your suggestion, I've just read some of RFC-2045, but not all 31 pages or the several other RCFs which are essentially parts of it. But I think I've learned a few things new to me and, apparently, you. > 2. 3. and 4. lead me to the conclusion that it *was* a MIME message, > despite 1. You are corrrect. I've been missing that fact that MIME is often used for (sort of) plain text messages. Here are the offending message's key headers with the false claim of charset, like you said: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > AFAIK, this has changed with MIME. RFC 822 restricts email messages > to 7 bits (ASCII), ... Looks like we're both wrong, if non-STMP MTAs are allowed. MIME hasn't changed anything at the level I was thinking about (MTA) -- after MIME encoding, if any. First, both non-MIME and MIME messages MUST have only 7-bit data if they want to get thru a SMTP system. RFC-2045 says: RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII data with lines no longer than 1000 characters including any trailing CRLF line separator. Second, non-MIME messages MAY have 8-bit data if they are using "less restrictive transports" (than SMTP). RFC-2045 says: In the absence of a MIME-Version field, a receiving mail user agent (whether conforming to MIME requirements or not) may optionally choose to interpret the body of the message according to local conventions. Many such conventions are currently in use and it should be noted that in practice non-MIME messages can contain just about anything. > ... but MIME allows different charecter sets, like > UTF8. What is still a possibility is that such a message will get > mangled on its way by an MTA that assumes all data is ASCII, but I > don't remember seeing anything like that happen. But assuming SMTP MTAs, those different character sets are in the same category as any other non-7-bit data; they require encoding to 7-bit data so that those ASCII-assuming MTAs won't mangle them. MIME has options for other (eg, 8-bit) encodings or no encoding, but they are likely to be mangled on the Internet. > BTW, RFC 2045 specifies a way to pass non-ASCII messages through > MTA's that assume all-ASCII world: the Content-Transfer-Encoding > header. Yes. The default MIME encoding is none; the message must be 7-bit clean for SMTP MTAs. The offending message used "quoted-printable" which is almost like 7-bit, except that 8-bit characters (and a few 7-bit'ers) are encoded as "=#", where "#" is the 8-bit value encoded as two ASCII HEX digits. (The offending message had that OK.) A more reliable (but unreadable) 7-bit encoding is "base64". Other encodings allow for encodings to 8-bit data, no encoding, etc. There's a whole other aspect of this that deserves mention. Even if MSFT software worked correctly, telling the truth about its weird character set and properly encoding it, it's unlikely that my MUA (Xemacs) would know how to decode it properly. If it did, it would need to either decode to the original weird character and support the display of that, or translate the decoded character set to some other character set, like probably Unicode. (Which Xemacs might even support, I don't know -- I HAVE noticed that has started displaying trademark symbols, for instance where it probably used to show it as an octal number (\###).) Thanks for the education-inducing dialog. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a44ra46mb3.ra4>