From owner-freebsd-questions Mon Feb 15 17:03:00 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA05531 for freebsd-questions-outgoing; Mon, 15 Feb 1999 17:03:00 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA05516 for ; Mon, 15 Feb 1999 17:02:49 -0800 (PST) (envelope-from marko@uk.radan.com) Received: from [158.152.75.22] (helo=uk.radan.com) by post.mail.demon.net with smtp (Exim 2.10 #2) id 10CYuP-0004iR-00; Tue, 16 Feb 1999 01:02:42 +0000 Organisation: Radan Computational Ltd., Bath, UK. Phone: +44-1225-320320 Fax: +44-1225-320311 Received: from marder-1. (rasnt-1 [193.114.228.211]) by uk.radan.com (8.6.10/8.6.10) with ESMTP id BAA00528; Tue, 16 Feb 1999 01:02:08 GMT Received: (from marko@localhost) by marder-1. (8.8.8/8.8.8) id BAA00230; Tue, 16 Feb 1999 01:00:15 GMT (envelope-from marko) Message-ID: <19990216010015.A190@localhost> Date: Tue, 16 Feb 1999 01:00:15 +0000 From: Mark Ovens To: Sue Blake Cc: questions@FreeBSD.ORG Subject: Re: cleaning a text file References: <19990215201056.19929@welearn.com.au> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990216002703.A337@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <19990216002703.A337@localhost>; from Mark Ovens on Tue, Feb 16, 1999 at 12:27:03AM +0000 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, Feb 16, 1999 at 12:27:03AM +0000, Mark Ovens wrote: > On Tue, Feb 16, 1999 at 10:37:40AM +1100, Sue Blake wrote: > > > > The problem is that I don't know which funny characters exist in the > > file, if any. I want to find out what they are, so I can search for > > them and eyeball them before killing them. > > > > > > Just knowing which characters they are would give me many solutions > > immediately. There still doesn't seem to be a way to find this out :-( > > > > First you need to identify the offending characters. Use od(1) or > hexdump(1) to identify them and then work out a filter. > > Are they all extended ASCII (>127) chars? or are some of them > control (<32) chars?. You could possibly use awk(1) as a filter, > or write a simple C prog using issprint() and isspace(). > > HTH > As soon as I'd sent my previous message I remembered something. If you have (or can lay your hands on) a copy of The Unix Programming Environment by Kernighan & Pike you will find, starting on p172, this very problem addressed, complete with several (short) C code listings which give you the option to print the offending characters as octal codes or to strip them from the file. > > Maybe there's a long way... somehow put a line-feed after each character > > in the file (with sed?) and then sort it and look at the top and bottom > > of the sorted file. > > > > -- > > > > Regards, > > -*Sue*- > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-questions" in the body of the message > > > > -- > FreeBSD - The Power To Serve http://www.freebsd.org > My Webpage http://www.users.globalnet.co.uk/~markov > _______________________________________________________________ > Mark Ovens, CNC Apps Engineer, Radan Computational Ltd. Bath UK > CAD/CAM solutions for Sheetmetal Working Industry > mailto:marko@uk.radan.com http://www.radan.com > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > -- FreeBSD - The Power To Serve http://www.freebsd.org My Webpage http://www.users.globalnet.co.uk/~markov _______________________________________________________________ Mark Ovens, CNC Apps Engineer, Radan Computational Ltd. Bath UK CAD/CAM solutions for Sheetmetal Working Industry mailto:marko@uk.radan.com http://www.radan.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message