From owner-freebsd-questions Mon Feb 15 17:16:51 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA06993 for freebsd-questions-outgoing; Mon, 15 Feb 1999 17:16:51 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from phoenix.welearn.com.au (phoenix.welearn.com.au [139.130.44.81] (may be forged)) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA06987 for ; Mon, 15 Feb 1999 17:16:47 -0800 (PST) (envelope-from sue@phoenix.welearn.com.au) Received: (from sue@localhost) by phoenix.welearn.com.au (8.9.1/8.9.0) id MAA19891; Tue, 16 Feb 1999 12:15:08 +1100 (EST) Message-ID: <19990216121500.33635@welearn.com.au> Date: Tue, 16 Feb 1999 12:15:00 +1100 From: Sue Blake To: Dan Nelson Cc: Greg Lehey , rick hamell , freebsd-questions@FreeBSD.ORG Subject: Re: cleaning a text file References: <19990215201056.19929@welearn.com.au> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990215185722.A21817@dan.emsphone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <19990215185722.A21817@dan.emsphone.com>; from Dan Nelson on Mon, Feb 15, 1999 at 06:57:22PM -0600 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Feb 15, 1999 at 06:57:22PM -0600, Dan Nelson wrote: > In the last episode (Feb 16), Sue Blake said: > > The problem is that I don't know which funny characters exist in the > > file, if any. I want to find out what they are, so I can search for > > them and eyeball them before killing them. > > How about something like > > grep "^[ -~]" file.txt > > That will print any lines that have characters outside the standard > printable ascii set. Then you can look at the oddball letters and > figure out appropriate replacement characters. Hey, yeah, that'd be a great first check, enough to give it a clean bill of health or deal with a few characters that are easily spotted. Don Read sent this one too: fold -w1 yourfile.txt |sort |uniq | grep -v "[A-Za-z0-9]" which seems to do the trick. It's very slow, but it works. With either or both of these, it's just a matter of finding the character among what's pulled out, determining its character number, checking its context in the file and making a decision about substitution, then running tr or doing a replace with a text editor. For the most common case, where there is nothing wrong with the file, it's possible to confirm that the file is OK as is. Reidar Bratsberg mentioned a utility called pep which might be good, but so far I haven't been able to randomly press the right buttons to make it compile. Experiments continue. -- Regards, -*Sue*- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message