Date: Tue, 1 Nov 2011 08:11:46 GMT From: Alexey Markov <redrat@mail.ru> To: freebsd-gnats-submit@FreeBSD.org Subject: ports/162218: SpamAssassin's sa-learn can't parse mbox of CommuniGate Pro Message-ID: <201111010811.pA18BkeG084176@red.freebsd.org> Resent-Message-ID: <201111010820.pA18K8gH080392@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 162218 >Category: ports >Synopsis: SpamAssassin's sa-learn can't parse mbox of CommuniGate Pro >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-ports-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 01 08:20:07 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Alexey Markov >Release: 8.2-RELEASE-p4 >Organization: JSC Complitex >Environment: FreeBSD meson.complitex.ru 8.2-RELEASE-p4 FreeBSD 8.2-RELEASE-p4 #0: Mon Oct 17 11:44:31 MSD 2011 redrat@meson.complitex.ru:/arc/obj/arc/src/sys/MESON amd64 >Description: In the recent versions of Communigate Pro format of date in the From_ line was changed. Old was like "From <>(________-000000000007) Wed Feb 20 20:28:23 2008", and new is like "From <>(S_____________-000000085573) 16-04-2010_08:55:34_". Because of it sa-learn can't parse CGP's mbox anymore, and users got "Learned tokens from 0 message(s) (0 message(s) examined)" message. >How-To-Repeat: Install SpamAssassin and CommuniGate Pro, and try to sa-learn some spam from CGP's mbox. >Fix: Attached patch fixes this problem. Patch attached with submission follows: Index: lib/Mail/SpamAssassin/ArchiveIterator.pm =================================================================== --- lib/Mail/SpamAssassin/ArchiveIterator.pm (revision 1190346) +++ lib/Mail/SpamAssassin/ArchiveIterator.pm (working copy) @@ -396,7 +396,8 @@ } seek(INPUT,$offset,0) or die "cannot reposition file to $offset: $!"; for ($!=0; <INPUT>; $!=0) { - last if (substr($_,0,5) eq "From " && @msg && /^From \S+ ?\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}/); + #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413 + last if (substr($_,0,5) eq "From " && @msg && /^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/); push (@msg, $_); # skip too-big mails @@ -908,8 +909,9 @@ $header .= $_; } } + #Changed Regex to include boundaries for Communigate Pro versions (5.2.x and later). per Bug 6413 if (substr($_,0,5) eq "From " && - /^From \S+ ?\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}/) { + /^From \S+ ?(\S\S\S \S\S\S .\d .\d:\d\d:\d\d \d{4}|.\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/) { $in_header = 1; $first = $_; $start = $where; >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111010811.pA18BkeG084176>