From owner-freebsd-questions Mon Feb 15 16:50:24 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA03864 for freebsd-questions-outgoing; Mon, 15 Feb 1999 16:50:24 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from phoenix.welearn.com.au (phoenix.welearn.com.au [139.130.44.81] (may be forged)) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA03857 for ; Mon, 15 Feb 1999 16:50:17 -0800 (PST) (envelope-from sue@phoenix.welearn.com.au) Received: (from sue@localhost) by phoenix.welearn.com.au (8.9.1/8.9.0) id LAA19789; Tue, 16 Feb 1999 11:50:05 +1100 (EST) Message-ID: <19990216114959.08931@welearn.com.au> Date: Tue, 16 Feb 1999 11:49:59 +1100 From: Sue Blake To: Mark Ovens Cc: questions@FreeBSD.ORG Subject: Re: cleaning a text file References: <19990215201056.19929@welearn.com.au> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990216002703.A337@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <19990216002703.A337@localhost>; from Mark Ovens on Tue, Feb 16, 1999 at 12:27:03AM +0000 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, Feb 16, 1999 at 12:27:03AM +0000, Mark Ovens wrote: > On Tue, Feb 16, 1999 at 10:37:40AM +1100, Sue Blake wrote: > > On Tue, Feb 16, 1999 at 09:52:32AM +1030, Greg Lehey wrote: > > > On Monday, 15 February 1999 at 1:10:36 -0800, rick hamell wrote: > > > > > > > >> Also, this file has some very long lines which would get truncated > > > >> or unexpectedly wrapped when sent as email. And if there is something > > > >> strange, I have to read it and guess what it should have been. > > > >> > > > >> Maybe someone will come up with something for this particular case. > > > >> I can't believe there's not some little untility for this that's been > > > >> hanging around unloved for years. > > > > > > > > Oy! Ok... how does Greg reformat all those emails? > > > > > > With Emacs. I have a collection of macros which I'm constantly > > > changing to catch up with new tricks that mailers discover. > > > > > > To Sue's original question: it depends on what your text looks like. > > > tr(1) will remove characters if you ask it to. > > > > If I knew which characters were there (so I could ask tr to remove > > them) I would have already removed them with my text editor. > > > > > fmt(1) might be useful for wrapping lines. > > > > I don't see the long line lengths as a big problem at this stage, but > > fmt might be useful later. > > > > The problem is that I don't know which funny characters exist in the > > file, if any. I want to find out what they are, so I can search for > > them and eyeball them before killing them. > > > > > > Just knowing which characters they are would give me many solutions > > immediately. There still doesn't seem to be a way to find this out :-( > > > > First you need to identify the offending characters. Indeed. That is my sole problem. > Use od(1) or > hexdump(1) to identify them and then work out a filter. Well, you hit it on the head. I was being very lazy about this because I really don't feel like reading and assessing 3 million hex numbers as they flow across the screen today, it's too hot. Maybe tomorrow. > Are they all extended ASCII (>127) chars? or are some of them > control (<32) chars?. If any exist in either of these categories, I want to be informed. That's all I really need. > You could possibly use awk(1) as a filter, > or write a simple C prog using issprint() and isspace(). Not in the short term I couldn't. But I'm surprised that if it's so easy to write a program to do this nobody has done so in the ancient or recent past. It seems like something that'd be wanted frequently, yet the responses I'm getting suggest that hardly anyone has thought about this problem much previously. I find that hard to believe, but I am slowly coming round. -- Regards, -*Sue*- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message