From owner-freebsd-questions  Fri Jan  1 11:53:19 1999
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id LAA16086
          for freebsd-questions-outgoing; Fri, 1 Jan 1999 11:53:19 -0800 (PST)
          (envelope-from owner-freebsd-questions@FreeBSD.ORG)
Received: from post.mail.demon.net (post-12.mail.demon.net [194.217.242.41])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA16079
          for <freebsd-questions@freebsd.org>; Fri, 1 Jan 1999 11:53:17 -0800 (PST)
          (envelope-from marko@uk.radan.com)
Received: from [158.152.75.22] (helo=uk.radan.com)
	by post.mail.demon.net with smtp (Exim 2.10 #2)
	id 0zwAcu-000317-00; Fri, 1 Jan 1999 19:52:52 +0000
Organisation: Radan Computational Ltd., Bath, UK.
Phone: +44-1225-320320   Fax: +44-1225-320311
Received: from beavis.uk.radan.com (beavis [193.114.228.122]) by uk.radan.com (8.6.10/8.6.10) with SMTP id TAA01817; Fri, 1 Jan 1999 19:52:25 GMT
Received: from uk.radan.com (rasnt-1) by beavis.uk.radan.com (4.1/SMI-4.1)
	id AA04349; Fri, 1 Jan 99 19:52:21 GMT
Message-Id: <368D274F.7A57D11A@uk.radan.com>
Date: Fri, 01 Jan 1999 19:51:43 +0000
From: Mark Ovens <marko@uk.radan.com>
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 2.2.8-RELEASE i386)
X-Accept-Language: en
Mime-Version: 1.0
To: Jerry Preeper <preeper@cts.com>
Cc: freebsd-questions@FreeBSD.ORG
Subject: Re: replace non-ascii characters
References: <3.0.5.32.19990101042759.008a1a70@crash.cts.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Jerry Preeper wrote:
> 
> I know this isn't really a freebsd question, but I'm not sure where else to
> ask.  I'm trying to write a small shell script that replaces non-ascii
> characters with the html equivalent in a file and just can't seem to figure
> how to identify the non-ascii characters.
> 
> for example, I have written a small shell script that takes a file name as
> input to replace them using sed.  Here is the script.
> 
> #!/bin/sh
>   for file in $*
>   do
>     sed -n "s/\\0x80/\&Ccedil\;/g" ${file}
>     sed -n "s/\\0x81/\&uuml\;/g" ${file}
>     ..... bunches more
>   done
> 
> The problem is the search part isn't finding the special character.  I have
> tried cutting and pasting the special character directly into the script as
> well, but it doesn't seem to work either.
> 
> Does anyone have any ideas on how to accomplish.
> 

I've found this problem before. As a suggestion try

#!/bin/sh
  for file in $*
  do
	cp ${file} /tmp/${file}
        awk '{gsub("\x80", "\\&Ccedil"); \
                gsub("\x81", "\\&Cuuml"); \

                ...more of the same

                print}' < /tmp/${file} > ${file}
        rm /tmp/${file}
  done

``gsub()'' replaces all occurrences in a line. Note that ``&'' needs
escaping with ``\\''. I'm not sure if there is a limit to the length of
line that awk can process.

Perl may well provide the best solution though.

HTH, Happy New Year

> Thanks in advance.
> 
> Jerry
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message

-- 
  Trust the computer industry to shorten Year 2000 to Y2K. It
  was this thinking that caused the problem in the first place.

Mark Ovens, CNC Applications Engineer, Radan Computational Ltd
Sheet Metal CAD/CAM Solutions
mailto:marko@uk.radan.com    http://www.radan.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message