Date: Tue, 6 Jan 2004 14:27:24 +1100 From: Gautam Gopalakrishnan <ggop@madras.dyndns.org> To: Malcolm Kay <malcolm.kay@internode.on.net> Cc: zhangweiwu@realss.com Subject: Re: help me with this sed expression Message-ID: <20040106032724.GA8616@madras.dyndns.org> In-Reply-To: <200401061345.04575.malcolm.kay@internode.on.net> References: <Law11-F31WerOc0Ne0P00016107@hotmail.com> <200401061230.42038.malcolm.kay@internode.on.net> <20040106022052.GA8122@madras.dyndns.org> <200401061345.04575.malcolm.kay@internode.on.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 06, 2004 at 01:45:04PM +1030, Malcolm Kay wrote: > On Tue, 6 Jan 2004 12:50, Gautam Gopalakrishnan wrote: > > On Tue, Jan 06, 2004 at 12:30:42PM +1030, Malcolm Kay wrote: > > > On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote: > > > > Hello. I've worked an hour to figure out a serial of sed command to > > > > process some text (without any luck, you kown I'm kinda newbie). I > > > > really appreciate your help. > > > > > > > > The original text file is in this form -- for each line: > > > > one Chinese word then one or two English word seperated by space. > > > > > > > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first > > > > \(.*\) is too greedy and included the rest [a-z]. > > > > > > Well the greedy part is easily fixed with: > > > s/\([^a-z]*\)\([a-z]*\)/\2 \1/ > > > > > > But this will not work for those lines with 2 english words. The > > > following should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e > > > 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target > > > > I think awk is easier: > > > > awk '{print $2 " " $3 " " $1}' original | tr -s > target > > I'm not really very familiar with awk, but I must say this > is a much simpler and rather magical solution. > > How does awk know which part of the original line goes into $1, $2 and $3. > (You will notice there is no space between the chinese and english words). > It does not. I did not read the earlier mail properly. But there is an easier way than all those regexes: Prefix the first a-z char with a space and use awk. sed -e 's/\([a-z]\)/ \1/' | awk '{print $2" "$1} NF==3 {print $3" "$1}' Gautam
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040106032724.GA8616>