FreeBSD Mail Archives

Date:      Tue, 15 May 2007 15:34:14 +1000 (EST)
From:      Ian Smith <smithi@nimnet.asn.au>
To:        Gary Kline <kline@tao.thought.org>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: what's the easiest way to de-html-ize files?
Message-ID:  <Pine.BSF.3.96.1070515152444.7949B-100000@gaia.nimnet.asn.au>
In-Reply-To: <20070514210933.1024A16A478@hub.freebsd.org>

index | next in thread | previous in thread | raw e-mail


On Sat, 12 May 2007 14:34:52 -0700 Gary Kline <kline@tao.thought.org> wrote:
 > On Mon, May 14, 2007 at 12:09:07PM -0700, Chuck Swiger wrote:
 > > On May 12, 2007, at 12:54 PM, Gary Kline wrote:
 > > >This is for those of us who appreciate ASCII or straight
 > > >	ISO_8859-15 rather than marked up files.  I have slapped together
 > > >	a crude C program that does scotch (or *cleanse*) text of
 > > >	<B></B> and so on.   Still... is there some standalone converter
 > > >	that gets rids of markup more elegantly?   Something where i
 > > >	can say
 > > >
 > > >	% cmd file_1.html ... file_N.html and output file_1.text ...
 > > >	file_N.text?
 > > 
 > > Perhaps:
 > > 
 > >   lynx -dump file1.html ... > file.text
 > > 
 > > ...?
 > 
 > 	Hm, maybe Ineed Bill Campbell's -force_html switch.  
 > 
 > 	Yes, seems that way.  USing just -dump got most of them, but
 > 	using the -force_html caught all.  Need to script something to
 > 	reformat, but the worst of it's done!

Also, if using Mozilla (so, I would assume, Firefox) the 'Save Page As'
dialog offers a picklist for 'Files of Type' that includes 'Text Files'.

This does a pretty decent job of producing text from HTML files, and is
quicker than firing up lynx (or links) if you're already viewing a page.

Cheers, Ian

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.1070515152444.7949B-100000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation