From owner-freebsd-questions Fri Feb 19 16:52:45 1999 Delivered-To: freebsd-questions@freebsd.org Received: from waldorf.cs.uni-dortmund.de (waldorf.cs.uni-dortmund.de [129.217.4.42]) by hub.freebsd.org (Postfix) with ESMTP id B5FD211C71 for ; Fri, 19 Feb 1999 16:52:42 -0800 (PST) (envelope-from grossjoh@ramses.informatik.uni-dortmund.de) Received: from ramses.informatik.uni-dortmund.de (ramses.cs.uni-dortmund.de [129.217.20.180]) by waldorf.cs.uni-dortmund.de with SMTP id BAA08540; Sat, 20 Feb 1999 01:52:34 +0100 (MET) Received: (grossjoh@localhost) by ramses.informatik.uni-dortmund.de id BAA26743; Sat, 20 Feb 1999 01:52:33 +0100 To: Sue Blake Cc: Mark Ovens , questions@FreeBSD.ORG Subject: Re: cleaning a text file References: <19990215201056.19929@welearn.com.au> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990216002703.A337@localhost> <19990216114959.08931@welearn.com.au> Content-Type: text/plain; charset=us-ascii From: Kai.Grossjohann@CS.Uni-Dortmund.DE Date: 20 Feb 1999 01:38:49 +0100 In-Reply-To: Sue Blake's message of "Tue, 16 Feb 1999 11:49:59 +1100" Message-ID: <864sohkixy.fsf@slowfox.frob.org> User-Agent: Gnus/5.070074 (Pterodactyl Gnus v0.74) Emacs/20.3 MIME-Version: 1.0 Lines: 22 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Sue Blake writes: > On Tue, Feb 16, 1999 at 12:27:03AM +0000, Mark Ovens wrote: > > > > First you need to identify the offending characters. > > Indeed. That is my sole problem. Well, search forward for the following regex: [^a-z0-9A-Z_+= \t\r\n-] If you find a character that's ok, add it to the list. After all, there are only 255 characters, and some of them will be bad. So you won't have to add characters often. Or are you saying you're looking at Japanese or Chinese text with multibyte characters? Then, you're screwed. kai -- I like _b_o_t_h kinds of music. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message