Date: Thu, 11 Apr 2002 08:53:34 -0400 From: "Dan Langille" <dan@langille.org> To: freebsd-chat@freebsd.org Cc: naddy@mips.inka.de (Christian Weisgerber) Subject: Re: what are these characters please? Message-ID: <20020411125429.C73703F30@bast.unixathome.org> References: <a93ugk$155s$1@kemoauc.mips.inka.de>
next in thread | previous in thread | raw e-mail | index | archive | help
naddy@mips.inka.de (Christian Weisgerber) wrote > Dan Langille <dan@langille.org> wrote: > > > Given that I'm trying to process the cvs-all messages into XML documents > > (using the perl module XML::Writer which does not do any encoding beyond > > characters such as >, <, etc), any suggestions as to how to deal with such > > characters? I've been looking through cpan but I suspect I'm using the > > wrong search criteria ("encoding"). Any clues? > > Well what encoding do your XML documents use? It was UTF-8. Some months ago it changed to ISO-8859-1 when I first encountered this type of issue (back then it was Lyngb<F8>l). > I guess your basic situation is that you are getting unknown > characters in an unknown encoding. You then have to manually figure > out what this is, e.g. you asked here and I'm telling you it's > character U+00E4. You can now store this in your encoding of choice. Given that the incoming characters are supposed to be ISO-8859-1 (which is what CVS stores (see Tony's message), I'm quite sure the best thing to do is just ignore the non-standard characters (i.e. by removing them). What's your view on that approach? p.s. I caught your message by reading the archives, I wasn't subscribed to -chat at the time but I am now. -- Dan Langille The FreeBSD Diary - http://freebsddiary.org/ - practical examples To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020411125429.C73703F30>