Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 05 Sep 2008 17:58:22 +0300
From:      Giorgos Keramidas <keramida@ceid.upatras.gr>
To:        "Mark B." <mkbucc@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: How to delete non-ASCII chars in file
Message-ID:  <87vdxa4p2p.fsf@kobe.laptop>
In-Reply-To: <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com> (Mark B.'s message of "Fri, 5 Sep 2008 10:14:08 -0400")
References:  <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 5 Sep 2008 10:14:08 -0400, "Mark B." <mkbucc@gmail.com> wrote:
> I have a text file that includes some non-ASCII characters
> For example, opening the file in vi shows lines like this:
>
>  'easth_0.541716776378'      0   \xe2\x80\x98dire'       2
>
> Is there a command-line tool I can use to delete these
> characters?  I tried:
>
>     cat f | tr -cd [:print:]
>
> but this removes the newlines.

Hi Mark,

It may be more useful to run the file through sed(1).  The newlines
aren't deleted by sed:

$ echo '^Fhello^F' | sed -e 's/[^[:print:]]*//' | hd
00000000  68 65 6c 6c 6f 06 0a                              |hello..|
00000007
$

> I also tried
>
>    cat f | sed "s/[^:print:]//g"
>
> but it didn't remove the characters.

The matching pattern is wrong.  You need `[^[:print:]]'.  The character
class of printable characters is `[:print:]', and you can negate the
pattern with `[^xxxx]' where `xxxx' is the character class; hence the
extra pair of brackets in `[^[:print:]]'.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87vdxa4p2p.fsf>