From owner-freebsd-questions@FreeBSD.ORG  Sun Jul 20 08:09:51 2008
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0EE8A106564A
	for <freebsd-questions@freebsd.org>;
	Sun, 20 Jul 2008 08:09:51 +0000 (UTC)
	(envelope-from jonathan+freebsd-questions@hst.org.za)
Received: from hermes.hst.org.za (onix.hst.org.za [209.203.2.133])
	by mx1.freebsd.org (Postfix) with ESMTP id 22CB28FC08
	for <freebsd-questions@freebsd.org>;
	Sun, 20 Jul 2008 08:09:49 +0000 (UTC)
	(envelope-from jonathan+freebsd-questions@hst.org.za)
Received: from [10.1.11.1] ([10.1.11.1]) (authenticated bits=0)
	by hermes.hst.org.za (8.13.8/8.13.8) with ESMTP id m6K86HLs022810
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-questions@freebsd.org>;
	Sun, 20 Jul 2008 10:06:18 +0200 (SAST)
	(envelope-from jonathan+freebsd-questions@hst.org.za)
From: Jonathan McKeown <jonathan+freebsd-questions@hst.org.za>
To: freebsd-questions@freebsd.org
Date: Sun, 20 Jul 2008 10:10:11 +0200
User-Agent: KMail/1.9.4
References: <20080720002345.GA9173@thought.org> <87mykde2ho.fsf@kobe.laptop>
	<20080720063746.GB21826@thought.org>
In-Reply-To: <20080720063746.GB21826@thought.org>
X-Face: $@VrUx^RHy/}yu]jKf/<4T%/d|F+$j-Ol2"2J$q+%OK1]&/G_S9(=?utf-8?q?HkaQ*=60!=3FYOK=3FY!=27M=60C=0A=09aP=5C9nVPF8Q=7DCilHH8l=3B=7E!4?=
	=?utf-8?q?2HK6=273lg4J=7Daz?=@1Dqqh:J]M^"YPn*2IWrZON$1+G?oX3@
	=?utf-8?q?k=230=0A=0954XDRg=3DYn=5FF-etwot4U=24b?=<FO&kjQCGnjz>dTS{i<i?:=p!]'t#K~4)cr7o%ygQiso
Organization: Health Systems Trust
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200807201010.11568.jonathan+freebsd-questions@hst.org.za>
X-Spam-Score: -4.375 () ALL_TRUSTED,AWL,BAYES_00
X-Scanned-By: MIMEDefang 2.61 on 209.203.2.133
Subject: Re: How to divide up?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Jul 2008 08:09:51 -0000

On Sunday 20 July 2008 08:37, Gary Kline wrote:
> On Sun, Jul 20, 2008 at 05:03:15AM +0300, Giorgos Keramidas wrote:
> > On Sun, 20 Jul 2008 03:44:07 +0300, Giorgos Keramidas 
<keramida@ceid.upatras.gr> wrote:
> > > Now, if you want to merely "hack something quick and dirty", a short
> > > Perl script can probably do regexp substitution similar to
> > >
> > >         #
> > >         # WARNING: THIS HAS NOT BEEN TESTED :P
> > >         #
> > >         my $foo = <STDIN>;
> > >         $foo = s:(<[^>]+>[^<]*</[^>]+>):$1\n:ge;
> > >         print "$foo";
> > >
> > > but you shouldn't trust the output of such a quick hack too much.
> >
> > As I wrote in reply to the personal email, this was untested and a bit
> > wrong in places, but now I've tried something like:
> >
> >   $ echo '<hello>world</hello><hello>next world</hello>' | \
> >   perl -e '$foo = <STDIN>; $foo =~ s:(<[^>]+>[^<]*</[^>]+>):$1\n:g; print
> > "$foo";'
> >
> > and it does seem to sort of work.  The output is:
> >
> >   <hello>world</hello>
> >   <hello>next world</hello>
> >
> > Maybe that's good enough?  They say `the perfect is the enemy of good
> > enough', so if this works for your data set, it's probably ok to use it
> > :-)
> >
> > Have fun,
> > Giorgos
>
> 	Fun?!  welll, but yes, anything that can save me from
> 	hand-editing  ~~70 files will be a riot;)

I haven't tried it, but I suspect if the simple approach fails, HTML::Tidy may 
well have an option which would help. It can be installed from CPAN or ports, 
where it is textproc/p5-HTML-Tidy.

Jonathan