Date: Fri, 05 Sep 2008 17:58:22 +0300 From: Giorgos Keramidas <keramida@ceid.upatras.gr> To: "Mark B." <mkbucc@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: How to delete non-ASCII chars in file Message-ID: <87vdxa4p2p.fsf@kobe.laptop> In-Reply-To: <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com> (Mark B.'s message of "Fri, 5 Sep 2008 10:14:08 -0400") References: <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 5 Sep 2008 10:14:08 -0400, "Mark B." <mkbucc@gmail.com> wrote: > I have a text file that includes some non-ASCII characters > For example, opening the file in vi shows lines like this: > > 'easth_0.541716776378' 0 \xe2\x80\x98dire' 2 > > Is there a command-line tool I can use to delete these > characters? I tried: > > cat f | tr -cd [:print:] > > but this removes the newlines. Hi Mark, It may be more useful to run the file through sed(1). The newlines aren't deleted by sed: $ echo '^Fhello^F' | sed -e 's/[^[:print:]]*//' | hd 00000000 68 65 6c 6c 6f 06 0a |hello..| 00000007 $ > I also tried > > cat f | sed "s/[^:print:]//g" > > but it didn't remove the characters. The matching pattern is wrong. You need `[^[:print:]]'. The character class of printable characters is `[:print:]', and you can negate the pattern with `[^xxxx]' where `xxxx' is the character class; hence the extra pair of brackets in `[^[:print:]]'.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87vdxa4p2p.fsf>