Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Mar 2010 22:39:45 -0400
From:      Mark Shroyer <subscriber+freebsd@markshroyer.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: procmail regex help ... sometimes works, sometimes doesn't...
Message-ID:  <4BB012F1.6020202@markshroyer.com>
In-Reply-To: <471394.79697.qm@web111611.mail.gq1.yahoo.com>
References:  <471394.79697.qm@web111611.mail.gq1.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 3/28/2010 6:34 PM, George Sanders wrote:
> I have added a very standard, very common regex line to my
> .procmailrc to filter character sets I can't read:
> 
> UNREADABLE='[^?"]*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|ks_c_5601|3Deuc-kr|koi8'
> :0:
> * ^Content-Type:.*multipart
> * B ?? $ ^Content-Type:.*^?.*charset="?($UNREADABLE)
> unreadable_messages
> 
> I know that this works because my "unreadable_messages" mail file is
> now full of messages with headers like:
> 
> From: =?GB2312?B?xMLTq9Or?= <uigvrutit@heki.net>
> Subject: =?GB2312?B?MjAxMMTqyMvBptfK1LS4w9bYytPKssO0?=
> To: "me" <me@me.com>
> Content-Type: text/html;
>         charset="gb2312"
> 
> However, a lot of mail gets through to my inbox that matches:
> 
> From: "osdeiiftnvpp@gmail.com" <xjyfgzyjm@gmail.com>
> Reply-To: "osdeiiftnvpp@gmail.com" <xjyfgzyjm@gmail.com>
> Message-ID: <533pbxxy2oc>
> To: me <me@me.com>
> Subject: Fw: \xb8\xf2\xad\xe8\xa5X\xa8\xd3\xbd\xe6~\xb1o\xb4\xa9\xa9f\xaa\xb1\xb5L\xaeM\xa4\xba\xaeg\xb2n\xa7o
> X-Mailer: inhalation
> Organization: Microsoft Outlook Express 6.00.2462.0000
> Mime-Version: 1.0
> Content-Type: multipart/alternative;
>         boundary="1-104247307-2712732737=:8213"
> Status: RO
> X-Status:
> X-Keywords:
> X-UID: 63502
> 
> --1-104247307-2712732737=:8213
> Content-Type: text/plain; charset="big5"
> Content-Transfer-Encoding: quoted-printable
> 
> However, "big5" is very clearly listed in my regex above, and as far
> as I can tell, this mail should match perfectly...
> 
> I cannot see why these "big5" emails are not matching my procmail
> regex ... is it obvious to anyone ?

This is just a shot in the dark, but do you find that the unreadable
messages that this rule successfully matches have the relevant
Content-Type header in the message's "main" header group, whereas the
messages that should match but fail to do so have the Content-Type
header in a MIME attachment, as in your example?

(Apologies for the imprecise terminology.)

-- 
Mark Shroyer
http://markshroyer.com/contact/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BB012F1.6020202>