From owner-freebsd-questions Fri Jan 1 11:53:19 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA16086 for freebsd-questions-outgoing; Fri, 1 Jan 1999 11:53:19 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from post.mail.demon.net (post-12.mail.demon.net [194.217.242.41]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA16079 for ; Fri, 1 Jan 1999 11:53:17 -0800 (PST) (envelope-from marko@uk.radan.com) Received: from [158.152.75.22] (helo=uk.radan.com) by post.mail.demon.net with smtp (Exim 2.10 #2) id 0zwAcu-000317-00; Fri, 1 Jan 1999 19:52:52 +0000 Organisation: Radan Computational Ltd., Bath, UK. Phone: +44-1225-320320 Fax: +44-1225-320311 Received: from beavis.uk.radan.com (beavis [193.114.228.122]) by uk.radan.com (8.6.10/8.6.10) with SMTP id TAA01817; Fri, 1 Jan 1999 19:52:25 GMT Received: from uk.radan.com (rasnt-1) by beavis.uk.radan.com (4.1/SMI-4.1) id AA04349; Fri, 1 Jan 99 19:52:21 GMT Message-Id: <368D274F.7A57D11A@uk.radan.com> Date: Fri, 01 Jan 1999 19:51:43 +0000 From: Mark Ovens X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 2.2.8-RELEASE i386) X-Accept-Language: en Mime-Version: 1.0 To: Jerry Preeper Cc: freebsd-questions@FreeBSD.ORG Subject: Re: replace non-ascii characters References: <3.0.5.32.19990101042759.008a1a70@crash.cts.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Jerry Preeper wrote: > > I know this isn't really a freebsd question, but I'm not sure where else to > ask. I'm trying to write a small shell script that replaces non-ascii > characters with the html equivalent in a file and just can't seem to figure > how to identify the non-ascii characters. > > for example, I have written a small shell script that takes a file name as > input to replace them using sed. Here is the script. > > #!/bin/sh > for file in $* > do > sed -n "s/\\0x80/\Ç\;/g" ${file} > sed -n "s/\\0x81/\ü\;/g" ${file} > ..... bunches more > done > > The problem is the search part isn't finding the special character. I have > tried cutting and pasting the special character directly into the script as > well, but it doesn't seem to work either. > > Does anyone have any ideas on how to accomplish. > I've found this problem before. As a suggestion try #!/bin/sh for file in $* do cp ${file} /tmp/${file} awk '{gsub("\x80", "\\Ç"); \ gsub("\x81", "\\&Cuuml"); \ ...more of the same print}' < /tmp/${file} > ${file} rm /tmp/${file} done ``gsub()'' replaces all occurrences in a line. Note that ``&'' needs escaping with ``\\''. I'm not sure if there is a limit to the length of line that awk can process. Perl may well provide the best solution though. HTH, Happy New Year > Thanks in advance. > > Jerry > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message -- Trust the computer industry to shorten Year 2000 to Y2K. It was this thinking that caused the problem in the first place. Mark Ovens, CNC Applications Engineer, Radan Computational Ltd Sheet Metal CAD/CAM Solutions mailto:marko@uk.radan.com http://www.radan.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message