From owner-freebsd-questions@FreeBSD.ORG Tue Feb 8 11:11:26 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 947C216A4CE for ; Tue, 8 Feb 2005 11:11:26 +0000 (GMT) Received: from male.aldigital.co.uk (male.thebunker.net [213.129.64.13]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2DF6343D3F for ; Tue, 8 Feb 2005 11:11:26 +0000 (GMT) (envelope-from matthew@thebunker.net) Received: from gravitas.thebunker.net (gateway.ash.thebunker.net [213.129.64.4]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by male.aldigital.co.uk (Postfix) with ESMTP id CBF1297750; Tue, 8 Feb 2005 11:11:24 +0000 (GMT) Received: from gravitas.thebunker.net (localhost [127.0.0.1]) j18BBIR7025300; Tue, 8 Feb 2005 11:11:18 GMT (envelope-from matthew@gravitas.thebunker.net) Received: (from matthew@localhost) by gravitas.thebunker.net (8.13.1/8.13.1/Submit) id j18BBF1p025299; Tue, 8 Feb 2005 11:11:15 GMT (envelope-from matthew) Date: Tue, 8 Feb 2005 11:11:15 +0000 From: Matthew Seaman To: Anthony Atkielski Message-ID: <20050208111115.GA75417@gravitas.thebunker.net> Mail-Followup-To: Matthew Seaman , Anthony Atkielski , freebsd-questions@freebsd.org References: <1667502496.20050208025619@wanadoo.fr> <757352437.20050208034447@wanadoo.fr> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cWoXeonUoKmBZSoM" Content-Disposition: inline In-Reply-To: <757352437.20050208034447@wanadoo.fr> User-Agent: Mutt/1.5.7i cc: freebsd-questions@freebsd.org Subject: Re: Another grep question X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2005 11:11:26 -0000 --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 08, 2005 at 03:44:47AM +0100, Anthony Atkielski wrote: > Giorgos Keramidas writes: >=20 > GK> It may not be related to what you are seeing, but grep(1) > GK> is locale-aware. What it considers a "text" character > GK> depends on the current locale settings. >=20 > I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and > en_US.ISO8859-1, with no effect. The character in question is an > opening double quotation mark in the Windows character set. I want to > find it in my Web pages and replace it by an appropriate HTML escape > sequence. I know it's out there, but grep isn't finding it, or I'm not > telling it how to find the character correctly. Ah -- well, the beauty of Unix is that if the first tool you think of doesn't do the job, then the next one probably will. You can use perl to match and replace arbitrary characters: % perl -pi.bak -e 's/\x93/“/g' foo.html Or you could go for the bulk method and run HTML tidy(1) over the file, which is usually pretty good at converting any-old HTML into something that will pass validation: (ports: www/tidy) http://www.w3c.org/People/Raggett/tidy/ (ports: www/tidy-devel) http://tidy.sourceforge.net/ Cheers, Matthew --=20 Dr Matthew J Seaman MA, D.Phil. 8 Dane Court Manor School Rd PGP: http://www.infracaninophile.co.uk/pgpkey Tilmanstone Tel: +44 1304 617253 Kent, CT14 0JL UK --cWoXeonUoKmBZSoM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iQCVAwUBQgieU5r7OpndfbmCAQK++AP/SMIzrkTJ1+iETTW7G5meSYVGiHoifn8C AYboigg5D+1iWzD6mKAQiQ4AZF4sjdIBXrWI1997q5p+SnSb3Ulq3IVM8KQ9Iqts l1e0qWKMozF4wmuWe40wOMNzFKJ63fveSRFxKpSb0bfuqN8Jqkjx0ApaI1MetG9t cNeb6yMd+cw= =NLio -----END PGP SIGNATURE----- --cWoXeonUoKmBZSoM--