From owner-freebsd-questions@FreeBSD.ORG Tue Sep 8 19:21:45 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39F921065676 for ; Tue, 8 Sep 2009 19:21:45 +0000 (UTC) (envelope-from kline@thought.org) Received: from aristotle.thought.org (aristotle.thought.org [209.180.213.210]) by mx1.freebsd.org (Postfix) with ESMTP id CD9E08FC08 for ; Tue, 8 Sep 2009 19:21:44 +0000 (UTC) Received: from thought.org (tao.thought.org [10.47.0.250]) (authenticated bits=0) by aristotle.thought.org (8.14.2/8.14.2) with ESMTP id n88JKqlM095631; Tue, 8 Sep 2009 12:20:52 -0700 (PDT) (envelope-from kline@thought.org) Received: by thought.org (nbSMTP-1.00) for uid 1002 kline@thought.org; Tue, 8 Sep 2009 12:21:40 -0700 (PDT) Date: Tue, 8 Sep 2009 12:21:38 -0700 From: Gary Kline To: Jim White Message-ID: <20090908192138.GA19288@thought.org> References: <20090908045202.GA16556@thought.org> <4AA5F2FB.8080006@mapper.nl> <4AA65182.6020200@radel.com> <9ca8670b0909080602ud2e4c0bq819251db916b6d3b@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9ca8670b0909080602ud2e4c0bq819251db916b6d3b@mail.gmail.com> User-Agent: Mutt/1.4.2.3i X-Organization: Thought Unlimited. Public service Unix since 1986. X-Of_Interest: With 23 years of service to the Unix community. X-Spam-Status: No, score=-4.4 required=3.6 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.2.3 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on aristotle.thought.org Cc: FreeBSD Mailing List , Mark Stapper , Jon Radel Subject: Re: how to get rid of a character, 0x80? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Sep 2009 19:21:45 -0000 On Tue, Sep 08, 2009 at 08:02:06AM -0500, Jim White wrote: > On Tue, Sep 8, 2009 at 7:43 AM, Jon Radel wrote: > > > Mark Stapper wrote: > > > > besides.. 0x80!=0200 > >> it's 0200 octal which is 128 decimal... > >> Might be why it doesn't work for you. > >> > > > > Don't mess with his head. ;-) > > > > 0200 = 0x80 = 128 > > > > 200 octal = 80 hex = 128 decimal > > Yup :-) > > > > --Jon Radel > > jon@radel.com > > > > You may want to check if your char type is signed. If it is (and it just > happens to be 8 bits wide), (char)128 is a negative value. I used int ch, not char ch; Also, just found some throwaway code that [of course] never gets pitched, and found the for chars >= 128, it's a wide character. while (( ch = getwc(stdin)) != WEOF) { if (ch == L'\xe2') { if ((ch1 = getwc(stdin)) == L'\x80') { printf("'"); /* check for and swallow last of the trio */ if ((ch2 = getwc(stdin)) == L'\x90') { continue; } } } else { putchar (ch); } } I wrote the above to get rid of openoffice TXT that was still cluttered with the trio of wide characters that output one apostrophe. Last night I used pdftotext to translate a pdf file; it was cluttered with a slew of ^L's, which == '\014', and wound up with a greater slew of in more [less] and vi. Nutshell, the file was fubar'd. gary -- Gary Kline kline@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 5.67a release of Jottings: http://jottings.thought.org/index.php