From owner-freebsd-questions@FreeBSD.ORG Fri Oct 9 18:29:44 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5A14106568B for ; Fri, 9 Oct 2009 18:29:44 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 6875A8FC13 for ; Fri, 9 Oct 2009 18:29:44 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n99ITSkm031875; Fri, 9 Oct 2009 20:29:43 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n99ITRFG031873; Fri, 9 Oct 2009 20:29:27 +0200 (CEST) (envelope-from olli) From: Oliver Fromme Message-Id: <200910091829.n99ITRFG031873@lurza.secnetix.de> To: wblock@wonkity.com (Warren Block) Date: Fri, 9 Oct 2009 20:29:27 +0200 (CEST) In-Reply-To: X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 09 Oct 2009 20:29:43 +0200 (CEST) Cc: freebsd-questions@freebsd.org Subject: Re: for perl wizards. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Oct 2009 18:29:45 -0000 Warren Block wrote: > Oliver Fromme wrote: > > Warren Block wrote: > > > Oliver Fromme wrote: > > > > Gary Kline wrote: > > > > > > > > > > Whenever I save a wordpeocessoe file [OOo, say] into a > > > > > text file, I get a slew of hex codes to indicate the char to be > > > > > used. I'm looking for a perl one-liner or script to translate > > > > > hex back into ', ", -- [that's a dash), and so forth. Why does > > > > > this fail to trans the hex code to an apostrophe? > > > > > > > > > > perl -pi.bak -e 's/\xe2\x80\x99/'/g' > > > > > > > > You need to escape the inner quote character, of course. > > > > I think sed is better suited for this task than perl. > > > > > > That's twice now people have suggested sed instead of perl. Why? For > > > many uses, perl is a better sed than sed. The regex engine is far more > > > powerful and escapes are much simpler. > > > > Neither powerful regexes nor escapes will help in this case. > > Certainly \x will not help in sed; sed doesn't have it. Right, that's an annoying flaw in sed (it doesn't even support the \0 syntax for octal values, which is more standard than \x). Normally I just type such characters literally, which is accepted fine by sed (it is 8 bit clean). However, in this particular case I really recommend to use the "recode" tool (ports/conversion/recode) to convert from UTF-8 to some other encoding. Much easier, and more correct. E2-80-99 (unicode 2019) isn't even a real apostrophe in UTF-8, it's a right single quotation mark. An apostrophe would be ASCII 27. Maybe the OP should configure his software to not save the file with UTF-8 encoding in the first place. I'm not an OOo user, so I can't tell how to do that. But obviously the OP doesn't want the file to be stored as UTF-8. > It's possible "Mastering Regular Expressions" has influenced my thinking > on this. This isn't about regular expressions at all. This is about replacing fixed strings. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "One of the main causes of the fall of the Roman Empire was that, lacking zero, they had no way to indicate successful termination of their C programs." -- Robert Firth