From owner-freebsd-questions@FreeBSD.ORG Wed Feb 23 09:42:47 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E06316A4CE for ; Wed, 23 Feb 2005 09:42:47 +0000 (GMT) Received: from mail.gmx.net (mail.gmx.de [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 2F55B43D1D for ; Wed, 23 Feb 2005 09:42:46 +0000 (GMT) (envelope-from barner@gmx.de) Received: (qmail invoked by alias); 23 Feb 2005 09:42:44 -0000 Received: from unknown (EHLO zi025.glhnet.mhn.de) (129.187.19.157) by mail.gmx.net (mp019) with SMTP; 23 Feb 2005 10:42:44 +0100 X-Authenticated: #147403 Received: by zi025.glhnet.mhn.de (Postfix, from userid 1000) id 06752C257; Wed, 23 Feb 2005 10:43:16 +0100 (CET) Date: Wed, 23 Feb 2005 10:43:16 +0100 From: Simon Barner To: Mike Hauber Message-ID: <20050223094316.GA70078@zi025.glhnet.mhn.de> References: <200502222316.32866.m.hauber@mchsi.com> <20050223055018.GA82969@keyslapper.net> <200502230218.37665.m.hauber@mchsi.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="k1lZvvs/B4yU6o8G" Content-Disposition: inline In-Reply-To: <200502230218.37665.m.hauber@mchsi.com> User-Agent: Mutt/1.5.8i X-Y-GMX-Trusted: 0 cc: Louis LeBlanc cc: freebsd-questions@freebsd.org Subject: Re: filtering HTML tags from email X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Feb 2005 09:42:47 -0000 --k1lZvvs/B4yU6o8G Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Mike Hauber wrote: > > Mutt saves to a temp file then calls the following command: > > lynx -localhost -dump %s > > where '%s' is the temporary file you saved it to. > > > > You could also just pipe it to the following: > > lynx -localhost -dump -stdin > > > > the -localhost argument prevents lynx from simply following > > links external to your machine - helpful to avoid generating > > hits for unscrupulous spammers that get paid for hits on a URL. > > > > Just make sure lynx is installed. > > > > Lou >=20 > Okay, so to be sure, there is no filter (as of yet) to simply open=20 > an email file, strip the HTML tags, and resave it? I'm not=20 > complaining, as this may actually be something I'm capable of=20 > creating myself. (I'll make this my first python project. :) ) >=20 > I'm just making sure I'm not missing anything obvious before I=20 > start working on it. It's irritating to spend time on something=20 > only to find out that it's already been done. You probably could do it also with procmail + lynx (or w3m) during the delivery process. Another possibility is to have the following entries in your ~/.mailcap file, which converts html, doc and rtf to plain text. text/html; w3m -dump -T text/html; copiousoutput; application/msword; antiword %s; copiousoutput application/rtf; rtfreader %s; copiousoutput As for your python script: I don't think that just stripping everything matching the following expressions is correct because they might appear in non html emails, too: <.*> <\/.*> (perl syntax). At least, you'd need a list of valid html tags, i.e. a regular grammar for html: | | | | ... (BNF notation). While this is not too hard to implement (and possibly a good project to learn a new programming language), this would be too much work for something that can be achieved easier with existing tools (that is, for me, personally ;-) Simon --k1lZvvs/B4yU6o8G Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQFCHFA0Ckn+/eutqCoRAgNVAJ9Y/2R6ycf+xgexeEVLUH5XxcwrnwCgxfM8 lNOVsHQxYbxw3Y9Qa7cwJlI= =y8Uh -----END PGP SIGNATURE----- --k1lZvvs/B4yU6o8G--