Date: Wed, 23 Feb 2005 10:43:16 +0100 From: Simon Barner <barner@gmx.de> To: Mike Hauber <m.hauber@mchsi.com> Cc: freebsd-questions@freebsd.org Subject: Re: filtering HTML tags from email Message-ID: <20050223094316.GA70078@zi025.glhnet.mhn.de> In-Reply-To: <200502230218.37665.m.hauber@mchsi.com> References: <200502222316.32866.m.hauber@mchsi.com> <20050223055018.GA82969@keyslapper.net> <200502230218.37665.m.hauber@mchsi.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--k1lZvvs/B4yU6o8G Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Mike Hauber wrote: > > Mutt saves to a temp file then calls the following command: > > lynx -localhost -dump %s > > where '%s' is the temporary file you saved it to. > > > > You could also just pipe it to the following: > > lynx -localhost -dump -stdin > > > > the -localhost argument prevents lynx from simply following > > links external to your machine - helpful to avoid generating > > hits for unscrupulous spammers that get paid for hits on a URL. > > > > Just make sure lynx is installed. > > > > Lou >=20 > Okay, so to be sure, there is no filter (as of yet) to simply open=20 > an email file, strip the HTML tags, and resave it? I'm not=20 > complaining, as this may actually be something I'm capable of=20 > creating myself. (I'll make this my first python project. :) ) >=20 > I'm just making sure I'm not missing anything obvious before I=20 > start working on it. It's irritating to spend time on something=20 > only to find out that it's already been done. You probably could do it also with procmail + lynx (or w3m) during the delivery process. Another possibility is to have the following entries in your ~/.mailcap file, which converts html, doc and rtf to plain text. text/html; w3m -dump -T text/html; copiousoutput; application/msword; antiword %s; copiousoutput application/rtf; rtfreader %s; copiousoutput As for your python script: I don't think that just stripping everything matching the following expressions is correct because they might appear in non html emails, too: <.*> <\/.*> (perl syntax). At least, you'd need a list of valid html tags, i.e. a regular grammar for html: <b> | </b> | <i> | </i> | ... (BNF notation). While this is not too hard to implement (and possibly a good project to learn a new programming language), this would be too much work for something that can be achieved easier with existing tools (that is, for me, personally ;-) Simon --k1lZvvs/B4yU6o8G Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQFCHFA0Ckn+/eutqCoRAgNVAJ9Y/2R6ycf+xgexeEVLUH5XxcwrnwCgxfM8 lNOVsHQxYbxw3Y9Qa7cwJlI= =y8Uh -----END PGP SIGNATURE----- --k1lZvvs/B4yU6o8G--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050223094316.GA70078>