Skip site navigation (1)Skip section navigation (2)
Date:      26 Nov 2002 13:41:36 -0800
From:      swear@attbi.com (Gary W. Swearingen)
To:        Roman Neuhauser <neuhauser@bellavista.cz>
Cc:        freebsd-questions@FreeBSD.ORG, Giorgos Keramidas <keramida@ceid.upatras.gr>
Subject:   Re: Find abandoned packages
Message-ID:  <a44ra46mb3.ra4@localhost.localdomain>
In-Reply-To: <20021126065739.GL77198@freepuppy.bellavista.cz>
References:  <000801c2915e$be8907c0$6400a8c0@windows> <9eel9eaber.l9e@localhost.localdomain> <20021125091339.GR77198@freepuppy.bellavista.cz> <tpfztp8m6a.ztp@localhost.localdomain> <20021126065739.GL77198@freepuppy.bellavista.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
Roman Neuhauser <neuhauser@bellavista.cz> writes:

>    4. You have already shown that you (falsely) think MIME email ==
>       HTML email.

I surely didn't think that, but HTML was all I mentioned because I did
(falsely) think that MIME email was almost always either HTML or
complex stuff like MSFT Word or multipart things with images, etc.  I
didn't know it could be as simple as a non-MIME message with a
"MIME-Version: 1.0" header inserted, with or without a
"Content-type: text/plain; charset=something" header.

Per your suggestion, I've just read some of RFC-2045, but not all 31
pages or the several other RCFs which are essentially parts of it.
But I think I've learned a few things new to me and, apparently, you.

>    2. 3. and 4. lead me to the conclusion that it *was* a MIME message,
>    despite 1.

You are corrrect.  I've been missing that fact that MIME is often used
for (sort of) plain text messages.  Here are the offending message's key
headers with the false claim of charset, like you said:

    Content-Type: text/plain;	charset="iso-8859-1"
    Content-Transfer-Encoding: quoted-printable

>     AFAIK, this has changed with MIME. RFC 822 restricts email messages
>     to 7 bits (ASCII), ...

Looks like we're both wrong, if non-STMP MTAs are allowed.  MIME hasn't
changed anything at the level I was thinking about (MTA) -- after MIME
encoding, if any.

First, both non-MIME and MIME messages MUST have only 7-bit data if they
want to get thru a SMTP system. RFC-2045 says:

   RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII data with
   lines no longer than 1000 characters including any trailing CRLF line
   separator.

Second, non-MIME messages MAY have 8-bit data if they are using
"less restrictive transports" (than SMTP). RFC-2045 says:

   In the absence of a MIME-Version field, a receiving mail user agent
   (whether conforming to MIME requirements or not) may optionally
   choose to interpret the body of the message according to local
   conventions.  Many such conventions are currently in use and it
   should be noted that in practice non-MIME messages can contain just
   about anything.

>  ... but MIME allows different charecter sets, like
>     UTF8. What is still a possibility is that such a message will get
>     mangled on its way by an MTA that assumes all data is ASCII, but I
>     don't remember seeing anything like that happen.

But assuming SMTP MTAs, those different character sets are in the same
category as any other non-7-bit data; they require encoding to 7-bit
data so that those ASCII-assuming MTAs won't mangle them.  MIME has
options for other (eg, 8-bit) encodings or no encoding, but they are
likely to be mangled on the Internet.

>     BTW, RFC 2045 specifies a way to pass non-ASCII messages through
>     MTA's that assume all-ASCII world: the Content-Transfer-Encoding
>     header.

Yes.  The default MIME encoding is none; the message must be 7-bit clean
for SMTP MTAs.  The offending message used "quoted-printable" which is
almost like 7-bit, except that 8-bit characters (and a few 7-bit'ers)
are encoded as "=#", where "#" is the 8-bit value encoded as two ASCII
HEX digits.  (The offending message had that OK.)  A more reliable
(but unreadable) 7-bit encoding is "base64".  Other encodings allow for
encodings to 8-bit data, no encoding, etc.


There's a whole other aspect of this that deserves mention.  Even if
MSFT software worked correctly, telling the truth about its weird
character set and properly encoding it, it's unlikely that my MUA
(Xemacs) would know how to decode it properly.  If it did, it would need
to either decode to the original weird character and support the display
of that, or translate the decoded character set to some other character
set, like probably Unicode.  (Which Xemacs might even support, I don't
know -- I HAVE noticed that has started displaying trademark symbols,
for instance where it probably used to show it as an octal number
(\###).)


Thanks for the education-inducing dialog.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a44ra46mb3.ra4>