Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jul 2008 10:10:11 +0200
From:      Jonathan McKeown <jonathan+freebsd-questions@hst.org.za>
To:        freebsd-questions@freebsd.org
Subject:   Re: How to divide up?
Message-ID:  <200807201010.11568.jonathan%2Bfreebsd-questions@hst.org.za>
In-Reply-To: <20080720063746.GB21826@thought.org>
References:  <20080720002345.GA9173@thought.org> <87mykde2ho.fsf@kobe.laptop> <20080720063746.GB21826@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 20 July 2008 08:37, Gary Kline wrote:
> On Sun, Jul 20, 2008 at 05:03:15AM +0300, Giorgos Keramidas wrote:
> > On Sun, 20 Jul 2008 03:44:07 +0300, Giorgos Keramidas 
<keramida@ceid.upatras.gr> wrote:
> > > Now, if you want to merely "hack something quick and dirty", a short
> > > Perl script can probably do regexp substitution similar to
> > >
> > >         #
> > >         # WARNING: THIS HAS NOT BEEN TESTED :P
> > >         #
> > >         my $foo = <STDIN>;
> > >         $foo = s:(<[^>]+>[^<]*</[^>]+>):$1\n:ge;
> > >         print "$foo";
> > >
> > > but you shouldn't trust the output of such a quick hack too much.
> >
> > As I wrote in reply to the personal email, this was untested and a bit
> > wrong in places, but now I've tried something like:
> >
> >   $ echo '<hello>world</hello><hello>next world</hello>' | \
> >   perl -e '$foo = <STDIN>; $foo =~ s:(<[^>]+>[^<]*</[^>]+>):$1\n:g; print
> > "$foo";'
> >
> > and it does seem to sort of work.  The output is:
> >
> >   <hello>world</hello>
> >   <hello>next world</hello>
> >
> > Maybe that's good enough?  They say `the perfect is the enemy of good
> > enough', so if this works for your data set, it's probably ok to use it
> > :-)
> >
> > Have fun,
> > Giorgos
>
> 	Fun?!  welll, but yes, anything that can save me from
> 	hand-editing  ~~70 files will be a riot;)

I haven't tried it, but I suspect if the simple approach fails, HTML::Tidy may 
well have an option which would help. It can be installed from CPAN or ports, 
where it is textproc/p5-HTML-Tidy.

Jonathan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807201010.11568.jonathan%2Bfreebsd-questions>