From owner-freebsd-questions@FreeBSD.ORG Fri Oct 9 10:26:43 2009 Return-Path: Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 289AE1065676 for ; Fri, 9 Oct 2009 10:26:43 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9FB748FC1A for ; Fri, 9 Oct 2009 10:26:42 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n99AQP5U014686; Fri, 9 Oct 2009 12:26:41 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n99AQPUv014685; Fri, 9 Oct 2009 12:26:25 +0200 (CEST) (envelope-from olli) Date: Fri, 9 Oct 2009 12:26:25 +0200 (CEST) Message-Id: <200910091026.n99AQPUv014685@lurza.secnetix.de> From: Oliver Fromme To: freebsd-questions@FreeBSD.ORG, kline@thought.org In-Reply-To: <20091009083516.GA60096@thought.org> X-Newsgroups: list.freebsd-questions User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 09 Oct 2009 12:26:41 +0200 (CEST) Cc: Subject: Re: for perl wizards. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-questions@FreeBSD.ORG, kline@thought.org List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Oct 2009 10:26:43 -0000 Gary Kline wrote: > > Whenever I save a wordpeocessoe file [OOo, say] into a > text file, I get a slew of hex codes to indicate the char to be > used. I'm looking for a perl one-liner or script to translate > hex back into ', ", -- [that's a dash), and so forth. Why does > this fail to trans the hex code to an apostrophe? > > perl -pi.bak -e 's/\xe2\x80\x99/'/g' You need to escape the inner quote character, of course. I think sed is better suited for this task than perl. > If there any another other tools, I'm interested! That "hex code" rather looks like UTF-8. For conversion between character encodings I recommend recode from the ports collection (ports/converters/recode). For example, to convert file.txt from UTF-8 to ISO8859-15: $ recode utf8..iso8859-15 file.txt To preserve the previous file contents, do this: $ recode utf8..iso8859-15 new.txt Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Python tricks" is a tough one, cuz the language is so clean. E.g., C makes an art of confusing pointers with arrays and strings, which leads to lotsa neat pointer tricks; APL mistakes everything for an array, leading to neat one-liners; and Perl confuses everything period, making each line a joyous adventure . -- Tim Peters