From owner-freebsd-questions@FreeBSD.ORG  Mon Jan  5 07:28:52 2004
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DB3D916A4CE
	for <questions@freebsd.org>; Mon,  5 Jan 2004 07:28:52 -0800 (PST)
Received: from smtp.infracaninophile.co.uk (ns0.infracaninophile.co.uk
	[81.2.69.218])	by mx1.FreeBSD.org (Postfix) with ESMTP id 32DDD43D41
	for <questions@freebsd.org>; Mon,  5 Jan 2004 07:28:50 -0800 (PST)
	(envelope-from m.seaman@infracaninophile.co.uk)
Received: from happy-idiot-talk.infracaninophile.co.uk (localhost [127.0.0.1])
	i05FSixn003035
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 5 Jan 2004 15:28:44 GMT
	(envelope-from matthew@happy-idiot-talk.infracaninophile.co.uk)
Received: (from matthew@localhost)id i05FSfwo003030;
	Mon, 5 Jan 2004 15:28:41 GMT	(envelope-from matthew)
Date: Mon, 5 Jan 2004 15:28:41 +0000
From: Matthew Seaman <m.seaman@infracaninophile.co.uk>
To: zhangweiwu@realss.com
Message-ID: <20040105152841.GA2784@happy-idiot-talk.infracaninophile.co.uk>
Mail-Followup-To: Matthew Seaman <m.seaman@infracaninophile.co.uk>,
	zhangweiwu@realss.com, questions@freebsd.org
References: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
User-Agent: Mutt/1.5.5.1i
X-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=2.61
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	happy-idiot-talk.infracaninophile.co.uk
cc: questions@freebsd.org
Subject: Re: help me with this sed expression
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Jan 2004 15:28:53 -0000

On Mon, Jan 05, 2004 at 07:49:43PM +0800, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to proce=
ss=20
> some text (without any luck, you kown I'm kinda newbie). I really=20
> appreciate your help.
>=20
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
>=20
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word=20
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English wor=
d=20
> following in the same line, repeat the Chinese word to satisfy 1).
>=20
> Define: Chinese word =3D one or more continous bytes of data where each b=
yte=20
> is greater then 128 in value. (it is true in GB2312 Chinese charset which=
=20
> this email is written in.)
> Define: English word =3D one or more continous bytes of [a-z].
>=20
> Say, for the original file:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> ??a av
> ????????aaav
> ????????aacm
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> The target file should be:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> a ??
> av ??
> aaav ????????
> aacm ????????
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) =
is=20
> too greedy and included the rest [a-z].

Dunno about sed(1) but you could do the job like this:

    perl -ne '($c, $e) =3D m/^([\x{81}-\x{ff}]+)([a-z ]+)\z/; foreach $x (s=
plit / /, $e) {  print "$c $x\n"; }'  filename

	Cheers,

	Matthew

--=20
Dr Matthew J Seaman MA, D.Phil.                       26 The Paddocks
                                                      Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey         Marlow
Tel: +44 1628 476614                                  Bucks., SL7 1TH UK