From owner-freebsd-questions@FreeBSD.ORG Sat Oct 14 12:39:36 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4DA1116A403 for ; Sat, 14 Oct 2006 12:39:36 +0000 (UTC) (envelope-from norgaard@locolomo.org) Received: from strange.daemonsecurity.com (59.Red-81-33-11.staticIP.rima-tde.net [81.33.11.59]) by mx1.FreeBSD.org (Postfix) with ESMTP id C57BD43D45 for ; Sat, 14 Oct 2006 12:39:35 +0000 (GMT) (envelope-from norgaard@locolomo.org) Received: from [10.35.4.65] (65.4-35-10-static.chueca.wifi [10.35.4.65]) by strange.daemonsecurity.com (Postfix) with ESMTP id 7CCA02E037; Sat, 14 Oct 2006 14:39:34 +0200 (CEST) Message-ID: <4530DA30.7060004@locolomo.org> Date: Sat, 14 Oct 2006 14:38:08 +0200 From: Erik Norgaard User-Agent: Thunderbird 1.5.0.7 (X11/20060916) MIME-Version: 1.0 To: Beech Rintoul References: <200610131712.46822.freebsd@alaskaparadise.com> In-Reply-To: <200610131712.46822.freebsd@alaskaparadise.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: Non English Spam X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Oct 2006 12:39:36 -0000 Beech Rintoul wrote: > I'm getting a ton of spam every day that comes from China, Japan and Korea. > Spam Assassin completely ignores it because it has all non-english characters > and slows kmail to a crawl loading. Is there a way to filter on non-english > either using Spam Assassin or procmail? I get none after adding simple filter rules for postfix: # Accepted mime headers: (ASCII, UTF-8 and ISO-8859-X) /^Content-Type:.*?charset\s*=\s*"?(us-ascii|iso-8859-\d+|utf-8)"?/ OK HDR2000 Accepted charset: $1 Strictly you can reject every other characterset, but I chose to make it explicit: # Reject specific character sets # Chinese, Japanese and Korean /^Content-Type:.*?charset\s*=\s*"?(Big5|gb2312|euc-cn)"?/ REJECT HDR2100: Unaccepted character set: "$1" /^Content-Type:.*?charset\s*=\s*"?(euc-kr|iso-2022-kr)"?/ REJECT HDR2110: Unaccepted character set: "$1" /^Content-Type:.*?charset\s*=\s*"?(iso-2022-\w+|euc-jp|shift_jis)"?/ REJECT HDR2120: Unaccepted character set: "$1" # Cyrrilic character sets: Russian/Ukrainian /^Content-Type:.*?charset\s*=\s*"?(koi8-(?:r|u))"?/ REJECT HDR2200: Unaccepted character set: "$1" /^Content-Type:.*?charset\s*=\s*"?(windows-(?:1250|1251))"?/ REJECT HDR2210: Unaccepted character set: "$1" And then you may want a catchup rule to catch unknown character sets. /^Content-Type:.*?charset\s*=\s*"?(\w?)"?/ WARN HDR2299: Unknown character set: "$1" you may change WARN to REJECT. I have noted however, that some subscribers to this list write english encoded in one of the above character sets, I don't know enough about the character set definition, but it seems that English characters are a subset of any character set? What is the recommended policy here? Should subscribers be advised to change character set when posting to the list? Cheers, Erik -- Ph: +34.666334818 web: http://www.locolomo.org X.509 Certificate: http://www.locolomo.org/crt/8D03551FFCE04F0C.crt Key ID: 69:79:B8:2C:E3:8F:E7:BE:5D:C3:C3:B1:74:62:B8:3F:9F:1F:69:B9