From owner-freebsd-questions@FreeBSD.ORG  Mon Jan  5 19:28:56 2004
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 910CC16A4CE
	for <questions@freebsd.org>; Mon,  5 Jan 2004 19:28:56 -0800 (PST)
Received: from madras.dyndns.org (dsl-137.241.240.220.dsl.comindico.com.au
	[220.240.241.137])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D1D2343D4C
	for <questions@freebsd.org>; Mon,  5 Jan 2004 19:28:51 -0800 (PST)
	(envelope-from ggop@madras.dyndns.org)
Received: from madras.dyndns.org (localhost [127.0.0.1])
	by madras.dyndns.org (8.12.9p1/8.12.9) with ESMTP id i063RS7v008697;
	Tue, 6 Jan 2004 14:27:30 +1100 (EST)
	(envelope-from ggop@madras.dyndns.org)
Received: (from ggop@localhost)
	by madras.dyndns.org (8.12.9p1/8.12.9/Submit) id i063RPIm008692;
	Tue, 6 Jan 2004 14:27:25 +1100 (EST)
Date: Tue, 6 Jan 2004 14:27:24 +1100
From: Gautam Gopalakrishnan <ggop@madras.dyndns.org>
To: Malcolm Kay <malcolm.kay@internode.on.net>
Message-ID: <20040106032724.GA8616@madras.dyndns.org>
References: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
	<200401061230.42038.malcolm.kay@internode.on.net>
	<20040106022052.GA8122@madras.dyndns.org>
	<200401061345.04575.malcolm.kay@internode.on.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200401061345.04575.malcolm.kay@internode.on.net>
User-Agent: Mutt/1.4.1i
cc: Zhang Weiwu <weiwuzhang@hotmail.com>
cc: questions@freebsd.org
cc: zhangweiwu@realss.com
Subject: Re: help me with this sed expression
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jan 2004 03:28:56 -0000

On Tue, Jan 06, 2004 at 01:45:04PM +1030, Malcolm Kay wrote:
> On Tue, 6 Jan 2004 12:50, Gautam Gopalakrishnan wrote:
> > On Tue, Jan 06, 2004 at 12:30:42PM +1030, Malcolm Kay wrote:
> > > On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> > > > Hello. I've worked an hour to figure out a serial of sed command to
> > > > process some text (without any luck, you kown I'm kinda newbie). I
> > > > really appreciate your help.
> > > >
> > > > The original text file is in this form -- for each line:
> > > > one Chinese word then one or two English word seperated by space.
> > > >
> > > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first
> > > > \(.*\) is too greedy and included the rest [a-z].
> > >
> > > Well the greedy part is easily fixed with:
> > >   s/\([^a-z]*\)\([a-z]*\)/\2 \1/
> > >
> > > But this will not work for those lines with 2 english words. The
> > > following should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e
> > > 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target
> >
> > I think awk is easier:
> >
> > awk '{print $2 " " $3 " " $1}' original | tr -s > target
> 
> I'm not really very familiar with awk, but I must say this
> is a much simpler and rather magical solution.
> 
> How does awk know which part of the original line goes into $1, $2 and $3.
> (You will notice there is no space between the chinese and english words).
> 

It does not.  I did not read the earlier mail properly. But there
is an easier way than all those regexes: Prefix the first a-z char
with a space and use awk.

sed -e 's/\([a-z]\)/ \1/' | awk '{print $2" "$1} NF==3 {print $3" "$1}'

Gautam