Date: Tue, 6 Jan 2004 12:30:42 +1030 From: Malcolm Kay <malcolm.kay@internode.on.net> To: zhangweiwu@realss.com, "Zhang Weiwu" <weiwuzhang@hotmail.com>, questions@freebsd.org Subject: Re: help me with this sed expression Message-ID: <200401061230.42038.malcolm.kay@internode.on.net> In-Reply-To: <Law11-F31WerOc0Ne0P00016107@hotmail.com> References: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote: > Hello. I've worked an hour to figure out a serial of sed command to pro= cess > some text (without any luck, you kown I'm kinda newbie). I really > appreciate your help. > > The original text file is in this form -- for each line: > one Chinese word then one or two English word seperated by space. > > I wish to change to: > 1) target file: one English word, then a space, then a Chinese word > coorisponding to that English word. > 2) if in the original file one Chinese word has more than one English w= ord > following in the same line, repeat the Chinese word to satisfy 1). > > Define: Chinese word =3D one or more continous bytes of data where each= byte > is greater then 128 in value. (it is true in GB2312 Chinese charset whi= ch > this email is written in.) > Define: English word =3D one or more continous bytes of [a-z]. > > Say, for the original file: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =D2=BBa av > =BF=C9=B8=E8=BF=C9=C6=FCaaav > =CE=DE=BF=C9=B7=EE=B8=E6aacm > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > The target file should be: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > a =D2=BB > av =D2=BB > aaav =BF=C9=B8=E8=BF=C9=C6=FC > aacm =CE=DE=BF=C9=B7=EE=B8=E6 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\= ) is > too greedy and included the rest [a-z]. Well the greedy part is easily fixed with: s/\([^a-z]*\)\([a-z]*\)/\2 \1/ But this will not work for those lines with 2 english words. The followin= g should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* = \([a-z]*\)/\2 \1/p' original > target Malcolm Kay
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200401061230.42038.malcolm.kay>