Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Jan 2004 12:30:42 +1030
From:      Malcolm Kay <malcolm.kay@internode.on.net>
To:        zhangweiwu@realss.com, "Zhang Weiwu" <weiwuzhang@hotmail.com>, questions@freebsd.org
Subject:   Re: help me with this sed expression
Message-ID:  <200401061230.42038.malcolm.kay@internode.on.net>
In-Reply-To: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
References:  <Law11-F31WerOc0Ne0P00016107@hotmail.com>

index | next in thread | previous in thread | raw e-mail

On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to process
> some text (without any luck, you kown I'm kinda newbie). I really
> appreciate your help.
>
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
>
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English word
> following in the same line, repeat the Chinese word to satisfy 1).
>
> Define: Chinese word = one or more continous bytes of data where each byte
> is greater then 128 in value. (it is true in GB2312 Chinese charset which
> this email is written in.)
> Define: English word = one or more continous bytes of [a-z].
>
> Say, for the original file:
> ===========
> 一a av
> 可歌可泣aaav
> 无可奉告aacm
> ===========
> The target file should be:
> ===========
> a 一
> av 一
> aaav 可歌可泣
> aacm 无可奉告
> ===========
>
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is
> too greedy and included the rest [a-z].

Well the greedy part is easily fixed with:
  s/\([^a-z]*\)\([a-z]*\)/\2 \1/

But this will not work for those lines with 2 english words. The following should:
% sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target

Malcolm Kay


help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200401061230.42038.malcolm.kay>