Date: Mon, 5 Jan 2004 15:28:41 +0000 From: Matthew Seaman <m.seaman@infracaninophile.co.uk> To: zhangweiwu@realss.com Cc: questions@freebsd.org Subject: Re: help me with this sed expression Message-ID: <20040105152841.GA2784@happy-idiot-talk.infracaninophile.co.uk> In-Reply-To: <Law11-F31WerOc0Ne0P00016107@hotmail.com> References: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 05, 2004 at 07:49:43PM +0800, Zhang Weiwu wrote: > Hello. I've worked an hour to figure out a serial of sed command to proce= ss=20 > some text (without any luck, you kown I'm kinda newbie). I really=20 > appreciate your help. >=20 > The original text file is in this form -- for each line: > one Chinese word then one or two English word seperated by space. >=20 > I wish to change to: > 1) target file: one English word, then a space, then a Chinese word=20 > coorisponding to that English word. > 2) if in the original file one Chinese word has more than one English wor= d=20 > following in the same line, repeat the Chinese word to satisfy 1). >=20 > Define: Chinese word =3D one or more continous bytes of data where each b= yte=20 > is greater then 128 in value. (it is true in GB2312 Chinese charset which= =20 > this email is written in.) > Define: English word =3D one or more continous bytes of [a-z]. >=20 > Say, for the original file: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ??a av > ????????aaav > ????????aacm > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > The target file should be: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > a ?? > av ?? > aaav ???????? > aacm ???????? > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) = is=20 > too greedy and included the rest [a-z]. Dunno about sed(1) but you could do the job like this: perl -ne '($c, $e) =3D m/^([\x{81}-\x{ff}]+)([a-z ]+)\z/; foreach $x (s= plit / /, $e) { print "$c $x\n"; }' filename Cheers, Matthew --=20 Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040105152841.GA2784>