From owner-freebsd-questions@FreeBSD.ORG  Sat Oct 14 12:39:36 2006
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4DA1116A403
	for <freebsd-questions@freebsd.org>;
	Sat, 14 Oct 2006 12:39:36 +0000 (UTC)
	(envelope-from norgaard@locolomo.org)
Received: from strange.daemonsecurity.com
	(59.Red-81-33-11.staticIP.rima-tde.net [81.33.11.59])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C57BD43D45
	for <freebsd-questions@freebsd.org>;
	Sat, 14 Oct 2006 12:39:35 +0000 (GMT)
	(envelope-from norgaard@locolomo.org)
Received: from [10.35.4.65] (65.4-35-10-static.chueca.wifi [10.35.4.65])
	by strange.daemonsecurity.com (Postfix) with ESMTP id 7CCA02E037;
	Sat, 14 Oct 2006 14:39:34 +0200 (CEST)
Message-ID: <4530DA30.7060004@locolomo.org>
Date: Sat, 14 Oct 2006 14:38:08 +0200
From: Erik Norgaard <norgaard@locolomo.org>
User-Agent: Thunderbird 1.5.0.7 (X11/20060916)
MIME-Version: 1.0
To: Beech Rintoul <freebsd@alaskaparadise.com>
References: <200610131712.46822.freebsd@alaskaparadise.com>
In-Reply-To: <200610131712.46822.freebsd@alaskaparadise.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-questions@freebsd.org
Subject: Re: Non English Spam
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Oct 2006 12:39:36 -0000

Beech Rintoul wrote:
> I'm getting a ton of spam every day  that comes from China, Japan and Korea. 
> Spam Assassin completely ignores it because it has all non-english characters 
> and slows kmail to a crawl loading. Is there a way to filter on non-english 
> either using Spam Assassin or procmail? 

I get none after adding simple filter rules for postfix:

# Accepted mime headers: (ASCII, UTF-8 and ISO-8859-X)
/^Content-Type:.*?charset\s*=\s*"?(us-ascii|iso-8859-\d+|utf-8)"?/
     OK     HDR2000 Accepted charset: $1

Strictly you can reject every other characterset, but I chose to make it 
explicit:

# Reject specific character sets
# Chinese, Japanese and Korean
/^Content-Type:.*?charset\s*=\s*"?(Big5|gb2312|euc-cn)"?/
     REJECT HDR2100: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(euc-kr|iso-2022-kr)"?/
     REJECT HDR2110: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(iso-2022-\w+|euc-jp|shift_jis)"?/
     REJECT HDR2120: Unaccepted character set: "$1"
# Cyrrilic character sets: Russian/Ukrainian
/^Content-Type:.*?charset\s*=\s*"?(koi8-(?:r|u))"?/
     REJECT HDR2200: Unaccepted character set: "$1"
/^Content-Type:.*?charset\s*=\s*"?(windows-(?:1250|1251))"?/
     REJECT HDR2210: Unaccepted character set: "$1"

And then you may want a catchup rule to catch unknown character sets.

/^Content-Type:.*?charset\s*=\s*"?(\w?)"?/
     WARN   HDR2299: Unknown character set: "$1"

you may change WARN to REJECT.

I have noted however, that some subscribers to this list write english 
encoded in one of the above character sets, I don't know enough about 
the character set definition, but it seems that English characters are a 
subset of any character set?

What is the recommended policy here? Should subscribers be advised to 
change character set when posting to the list?

Cheers, Erik
-- 
Ph: +34.666334818                      web: http://www.locolomo.org
X.509 Certificate: http://www.locolomo.org/crt/8D03551FFCE04F0C.crt
Key ID: 69:79:B8:2C:E3:8F:E7:BE:5D:C3:C3:B1:74:62:B8:3F:9F:1F:69:B9