Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Sep 2009 12:21:38 -0700
From:      Gary Kline <kline@thought.org>
To:        Jim White <spamchannel@gmail.com>
Cc:        FreeBSD Mailing List <freebsd-questions@freebsd.org>, Mark Stapper <stark@mapper.nl>, Jon Radel <jon@radel.com>
Subject:   Re: how to get rid of a character, 0x80?
Message-ID:  <20090908192138.GA19288@thought.org>
In-Reply-To: <9ca8670b0909080602ud2e4c0bq819251db916b6d3b@mail.gmail.com>
References:  <20090908045202.GA16556@thought.org> <4AA5F2FB.8080006@mapper.nl> <4AA65182.6020200@radel.com> <9ca8670b0909080602ud2e4c0bq819251db916b6d3b@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 08, 2009 at 08:02:06AM -0500, Jim White wrote:
> On Tue, Sep 8, 2009 at 7:43 AM, Jon Radel <jon@radel.com> wrote:
> 
> > Mark Stapper wrote:
> >
> >  besides.. 0x80!=0200
> >> it's 0200 octal which is 128 decimal...
> >> Might be why it doesn't work for you.
> >>
> >
> > Don't mess with his head.  ;-)
> >
> > 0200 = 0x80 = 128
> >
> > 200 octal = 80 hex = 128 decimal
> >

	Yup :-)
> >
> > --Jon Radel
> > jon@radel.com
> >
> 
> You may want to check if your char type is signed.  If it is (and it just
> happens to be 8 bits wide), (char)128 is a negative value.


	I used 

	int ch, not char ch;

	Also, just found some throwaway code that [of course] never gets pitched, and
	found the for chars >= 128, it's a wide character.   



        while (( ch = getwc(stdin)) != WEOF)
        {

                if (ch == L'\xe2')
                {
                                if ((ch1 = getwc(stdin)) == L'\x80')
                                {
                                        printf("'");


                                        /* check for and swallow last of the trio */
                                        if ((ch2 = getwc(stdin)) == L'\x90')
                                        {
                                                continue;
                                        }
                                }
                }
                else
                {
                        putchar (ch);
                }
        }


	I wrote the above to get rid of openoffice TXT that was still cluttered with the
	trio of wide characters that output one apostrophe.

	Last night I used pdftotext to translate a pdf file; it was cluttered with a slew of
	^L's, which == '\014', and wound up with a greater slew of <x80> in more [less]
	and vi.  Nutshell, the file was fubar'd.

	gary





-- 
 Gary Kline  kline@thought.org  http://www.thought.org  Public Service Unix
        http://jottings.thought.org   http://transfinite.thought.org
    The 5.67a release of Jottings: http://jottings.thought.org/index.php




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090908192138.GA19288>