Date: Tue, 16 Feb 1999 12:15:00 +1100 From: Sue Blake <sue@welearn.com.au> To: Dan Nelson <dnelson@emsphone.com> Cc: Greg Lehey <grog@lemis.com>, rick hamell <hamellr@dsinw.com>, freebsd-questions@FreeBSD.ORG Subject: Re: cleaning a text file Message-ID: <19990216121500.33635@welearn.com.au> In-Reply-To: <19990215185722.A21817@dan.emsphone.com>; from Dan Nelson on Mon, Feb 15, 1999 at 06:57:22PM -0600 References: <19990215201056.19929@welearn.com.au> <Pine.BSF.3.91.990215010943.20451F-100000@dsinw.com> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990215185722.A21817@dan.emsphone.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 15, 1999 at 06:57:22PM -0600, Dan Nelson wrote: > In the last episode (Feb 16), Sue Blake said: > > The problem is that I don't know which funny characters exist in the > > file, if any. I want to find out what they are, so I can search for > > them and eyeball them before killing them. > > How about something like > > grep "^[ -~]" file.txt > > That will print any lines that have characters outside the standard > printable ascii set. Then you can look at the oddball letters and > figure out appropriate replacement characters. Hey, yeah, that'd be a great first check, enough to give it a clean bill of health or deal with a few characters that are easily spotted. Don Read sent this one too: fold -w1 yourfile.txt |sort |uniq | grep -v "[A-Za-z0-9]" which seems to do the trick. It's very slow, but it works. With either or both of these, it's just a matter of finding the character among what's pulled out, determining its character number, checking its context in the file and making a decision about substitution, then running tr or doing a replace with a text editor. For the most common case, where there is nothing wrong with the file, it's possible to confirm that the file is OK as is. Reidar Bratsberg mentioned a utility called pep which might be good, but so far I haven't been able to randomly press the right buttons to make it compile. Experiments continue. -- Regards, -*Sue*- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990216121500.33635>