Date: Tue, 27 May 2008 08:29:03 +0200 From: Karel Miklav <karel@inetis.com> To: Oliver Fromme <olli@lurza.secnetix.de> Cc: delphij@freebsd.org, chinsan <chinsan.tw@gmail.com>, freebsd-questions@FreeBSD.ORG Subject: Re: Sed, shell and hexadecimal character codes Message-ID: <483BAA2F.30009@inetis.com> In-Reply-To: <200805231523.m4NFNOwO024115@lurza.secnetix.de> References: <200805231523.m4NFNOwO024115@lurza.secnetix.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Oliver Fromme wrote:
> Karel Miklav wrote:
> > There's a tip in the FreeBSD fortunes database that says:
> >
> > > Want to strip UTF-8 BOM(Bye Order Mark) from given files?
> > >
> > > sed -e '1s/^\xef\xbb\xbf//' < bomfile > newfile
>
> FreeBSD's sed(1) doesn't support hexadecimal or octal
> sequences. I think even gnu sed doesn't support it, but
> you might try it yourself (/usr/ports/textprog/gsed).
>
> I don't know why that fortunes entry exist. It's wrong.
That's what I thought. Maybe we should replace the recipe with
the awk version Oliver proposed below?
> > I can't make it work, and I can't find any other method to
> > work with hexa codes in scripts or on the command line so
> > I'm kind-a depressed :) I help myself with xxd now, but if
> > it is possible to avoid it, I'd like to hear about it.
>
> There is no standard for handling octal and hexadecimal
> sequences, unfortunately, so you have to consult the
> manual page to find out. For example, tr(1) supports
> octal sequences only (no hexadecimal), while awk(1)
> supports both. So the above line could be rewritten
> with awk:
>
> awk '{if(NR==1)sub(/^\xef\xbb\xbf/, "");print}' < bomfile > newfile
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?483BAA2F.30009>
