Date: Thu, 11 Apr 2002 13:26:49 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: dan@langille.org Cc: chat@freebsd.org Subject: Re: what are these characters please? Message-ID: <3CB5F189.3DEA9304@mindspring.com> References: <20020411113858.E48BB3F30@bast.unixathome.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Dan Langille wrote: > > Most likely, you are going to have to live with it. > > I have to find a solution as non-ISO-8859-1 are causing grief when it > comes to reading in the XML. See below. [ ... ] > I'm not at all worried about restoring the original text. I'm going for a > "ignore what I can't use"-solution. > > > Basically, anything that isn't ISO-8859-1 is pretty much lost, since > > that's what CVS stores. > > ISO-8859-1 is fine by me. FWIW, the XML headers include: > > <?xml version="1.0" encoding="ISO-8859-1"?> > > The encoding problem actually occurs later when I try to process the XML > with XML::Parser : > > not well-formed (invalid token) at line 14, column 34, byte 559 at > /usr/local/lib/perl5/site_perl/5.005/i386-freebsd/XML/Parser.pm line 185 > > And line 14 is: > > [Submitted by: Ville SkyttESC,AdESC(B <ville.skytta@iki.fi>] > > I think my goal here is remove all non-ISO-8859-1 characters from the > incoming cvs-all message. I've been searching newsgroups (comp.lang.perl > and comp.text.xml) trying to find a simple solution. An "escape" character *is* a valid ISO-8859-1 character. > > If you want to get complicated, the email address is actually > > <ville.skytta@iki.fi>, and anything not inside the "<" ">" is > > comments. Email addresses aren't allowed to have special > > characters in them (US ASCII strikes again!). > > I agree, it's too complicated for the objective at hand. The only other option would be to pre-parse for ANSI escape sequences, and strip them. This basically means eating everything between the <ESC> and the next character betwwn 0x40 and 0x80 (for the most part; that should do it for what you have seen so far, unless you hit something like sixels). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CB5F189.3DEA9304>