From owner-freebsd-questions@FreeBSD.ORG Fri Oct 9 18:06:30 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 883A9106566B for ; Fri, 9 Oct 2009 18:06:30 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 2D28D8FC0A for ; Fri, 9 Oct 2009 18:06:29 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.3/8.14.3) with ESMTP id n99I6T5c029017; Fri, 9 Oct 2009 12:06:29 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.3/8.14.3/Submit) with ESMTP id n99I6TRn029014; Fri, 9 Oct 2009 12:06:29 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Fri, 9 Oct 2009 12:06:29 -0600 (MDT) From: Warren Block To: Oliver Fromme In-Reply-To: <200910091701.n99H19sq028830@lurza.secnetix.de> Message-ID: References: <200910091701.n99H19sq028830@lurza.secnetix.de> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (wonkity.com [127.0.0.1]); Fri, 09 Oct 2009 12:06:29 -0600 (MDT) Cc: freebsd-questions@freebsd.org Subject: Re: for perl wizards. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Oct 2009 18:06:30 -0000 On Fri, 9 Oct 2009, Oliver Fromme wrote: > Warren Block wrote: > > Oliver Fromme wrote: > > > Gary Kline wrote: > > > > > > > > Whenever I save a wordpeocessoe file [OOo, say] into a > > > > text file, I get a slew of hex codes to indicate the char to be > > > > used. I'm looking for a perl one-liner or script to translate > > > > hex back into ', ", -- [that's a dash), and so forth. Why does > > > > this fail to trans the hex code to an apostrophe? > > > > > > > > perl -pi.bak -e 's/\xe2\x80\x99/'/g' > > > > > > You need to escape the inner quote character, of course. > > > I think sed is better suited for this task than perl. > > > > That's twice now people have suggested sed instead of perl. Why? For > > many uses, perl is a better sed than sed. The regex engine is far more > > powerful and escapes are much simpler. > > Neither powerful regexes nor escapes will help in this case. Certainly \x will not help in sed; sed doesn't have it. > A simple basic regex is more than sufficient (in fact this > isn't even a regex, it's a fixed string). And the escaping > is a problem of the shell, not perl or sed. And by the way, > I stongly disagree that perl's escapes are much simpler. > In my opinion perl has the most complex escaping and quoting > I have seen in any language so far. I was thinking of the escapes needed for sed that should not be needed. Some of those are shell problems, many are due to the regex library. More basic things than \x are missing. \t, for instance, or useful \s instead of picking spaces or tabs or trying to navigate using | in sed expressions. > The basic UNIX philosophy is to use the smallest or simplest > tool that does the job. In this case that's clearly sed. Since sed doesn't have \x, it would appear that sed does not do the job. Maybe I just don't see it. And in most cases, the external simplicity of a tool is more important to the user than its internals. Put another way, if you have it, and it does a better/easier/faster job, why *not* use it? > (Not to mention the fact that perl isn't even in FreeBSD's > base system, so might not be available at all.) But the OP is using it, so that's clearly not the case here. Or in most FreeBSD installations. It's possible "Mastering Regular Expressions" has influenced my thinking on this. -Warren Block * Rapid City, South Dakota USA