From owner-freebsd-questions@FreeBSD.ORG Fri May 23 15:23:26 2008 Return-Path: Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81C1B1065679 for ; Fri, 23 May 2008 15:23:26 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (unknown [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id D32978FC1E for ; Fri, 23 May 2008 15:23:25 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id m4NFNOsx024116; Fri, 23 May 2008 17:23:24 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.1/8.14.1/Submit) id m4NFNOwO024115; Fri, 23 May 2008 17:23:24 +0200 (CEST) (envelope-from olli) Date: Fri, 23 May 2008 17:23:24 +0200 (CEST) Message-Id: <200805231523.m4NFNOwO024115@lurza.secnetix.de> From: Oliver Fromme To: freebsd-questions@FreeBSD.ORG, karel@inetis.com In-Reply-To: <4833CBAC.801@inetis.com> X-Newsgroups: list.freebsd-questions User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.2-STABLE-20070808 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 23 May 2008 17:23:24 +0200 (CEST) Cc: Subject: Re: Sed, shell and hexadecimal character codes X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 May 2008 15:23:26 -0000 Karel Miklav wrote: > There's a tip in the FreeBSD fortunes database that says: > > > Want to strip UTF-8 BOM(Bye Order Mark) from given files? > > > > sed -e '1s/^\xef\xbb\xbf//' < bomfile > newfile FreeBSD's sed(1) doesn't support hexadecimal or octal sequences. I think even gnu sed doesn't support it, but you might try it yourself (/usr/ports/textprog/gsed). I don't know why that fortunes entry exist. It's wrong. > I can't make it work, and I can't find any other method to > work with hexa codes in scripts or on the command line so > I'm kind-a depressed :) I help myself with xxd now, but if > it is possible to avoid it, I'd like to hear about it. There is no standard for handling octal and hexadecimal sequences, unfortunately, so you have to consult the manual page to find out. For example, tr(1) supports octal sequences only (no hexadecimal), while awk(1) supports both. So the above line could be rewritten with awk: awk '{if(NR==1)sub(/^\xef\xbb\xbf/, "");print}' < bomfile > newfile Basically that's exactly the same instruction as the sed one above, but awk is a little more verbose: "1" in sed means that the following command should only affect the first line. That's what "if(NR==1)" does in awk. "s/OLD/NEW/" is the replacement command in sed. In awk it looks like "sub(/old/, "new")". Finally, sed prints all resulting lines by default, while awk has to be told with an explicit "print" command. (awk prints lines automatically only if there are no other commands at all.) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.'