From owner-freebsd-questions@FreeBSD.ORG Fri Sep 5 14:58:37 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1368A1065678 for ; Fri, 5 Sep 2008 14:58:37 +0000 (UTC) (envelope-from keramida@ceid.upatras.gr) Received: from igloo.linux.gr (igloo.linux.gr [62.1.205.36]) by mx1.freebsd.org (Postfix) with ESMTP id 81F8B8FC23 for ; Fri, 5 Sep 2008 14:58:36 +0000 (UTC) (envelope-from keramida@ceid.upatras.gr) Received: from kobe.laptop (adsl57-66.kln.forthnet.gr [77.49.184.66]) (authenticated bits=128) by igloo.linux.gr (8.14.3/8.14.3/Debian-5) with ESMTP id m85EwNGG015711 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 5 Sep 2008 17:58:29 +0300 Received: from kobe.laptop (kobe.laptop [127.0.0.1]) by kobe.laptop (8.14.3/8.14.3) with ESMTP id m85EwNmW021064; Fri, 5 Sep 2008 17:58:23 +0300 (EEST) (envelope-from keramida@ceid.upatras.gr) Received: (from keramida@localhost) by kobe.laptop (8.14.3/8.14.3/Submit) id m85EwNlq021063; Fri, 5 Sep 2008 17:58:23 +0300 (EEST) (envelope-from keramida@ceid.upatras.gr) From: Giorgos Keramidas To: "Mark B." References: <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com> Date: Fri, 05 Sep 2008 17:58:22 +0300 In-Reply-To: <59f4cb420809050714i16ebe30bmd9f325592f05516e@mail.gmail.com> (Mark B.'s message of "Fri, 5 Sep 2008 10:14:08 -0400") Message-ID: <87vdxa4p2p.fsf@kobe.laptop> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-MailScanner-ID: m85EwNGG015711 X-Hellug-MailScanner: Found to be clean X-Hellug-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-3.836, required 5, autolearn=not spam, ALL_TRUSTED -1.80, AWL 0.56, BAYES_00 -2.60) X-Hellug-MailScanner-From: keramida@ceid.upatras.gr X-Spam-Status: No Cc: freebsd-questions@freebsd.org Subject: Re: How to delete non-ASCII chars in file X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2008 14:58:37 -0000 On Fri, 5 Sep 2008 10:14:08 -0400, "Mark B." wrote: > I have a text file that includes some non-ASCII characters > For example, opening the file in vi shows lines like this: > > 'easth_0.541716776378' 0 \xe2\x80\x98dire' 2 > > Is there a command-line tool I can use to delete these > characters? I tried: > > cat f | tr -cd [:print:] > > but this removes the newlines. Hi Mark, It may be more useful to run the file through sed(1). The newlines aren't deleted by sed: $ echo '^Fhello^F' | sed -e 's/[^[:print:]]*//' | hd 00000000 68 65 6c 6c 6f 06 0a |hello..| 00000007 $ > I also tried > > cat f | sed "s/[^:print:]//g" > > but it didn't remove the characters. The matching pattern is wrong. You need `[^[:print:]]'. The character class of printable characters is `[:print:]', and you can negate the pattern with `[^xxxx]' where `xxxx' is the character class; hence the extra pair of brackets in `[^[:print:]]'.