From owner-freebsd-chat Thu Apr 11 5:53:39 2002 Delivered-To: freebsd-chat@freebsd.org Received: from bast.unixathome.org (bast.unixathome.org [216.187.105.150]) by hub.freebsd.org (Postfix) with ESMTP id DD3A537B41C for ; Thu, 11 Apr 2002 05:53:36 -0700 (PDT) Received: from wocker (wocker.unixathome.org [192.168.0.99]) by bast.unixathome.org (Postfix) with ESMTP id C73703F30; Thu, 11 Apr 2002 08:54:29 -0400 (EDT) From: "Dan Langille" Organization: DVL Software Limited To: freebsd-chat@freebsd.org Date: Thu, 11 Apr 2002 08:53:34 -0400 MIME-Version: 1.0 Subject: Re: what are these characters please? Reply-To: dan@langille.org Cc: naddy@mips.inka.de (Christian Weisgerber) References: X-mailer: Pegasus Mail for Windows (v4.01) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body Message-Id: <20020411125429.C73703F30@bast.unixathome.org> Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org naddy@mips.inka.de (Christian Weisgerber) wrote > Dan Langille wrote: > > > Given that I'm trying to process the cvs-all messages into XML documents > > (using the perl module XML::Writer which does not do any encoding beyond > > characters such as >, <, etc), any suggestions as to how to deal with such > > characters? I've been looking through cpan but I suspect I'm using the > > wrong search criteria ("encoding"). Any clues? > > Well what encoding do your XML documents use? It was UTF-8. Some months ago it changed to ISO-8859-1 when I first encountered this type of issue (back then it was Lyngbl). > I guess your basic situation is that you are getting unknown > characters in an unknown encoding. You then have to manually figure > out what this is, e.g. you asked here and I'm telling you it's > character U+00E4. You can now store this in your encoding of choice. Given that the incoming characters are supposed to be ISO-8859-1 (which is what CVS stores (see Tony's message), I'm quite sure the best thing to do is just ignore the non-standard characters (i.e. by removing them). What's your view on that approach? p.s. I caught your message by reading the archives, I wasn't subscribed to -chat at the time but I am now. -- Dan Langille The FreeBSD Diary - http://freebsddiary.org/ - practical examples To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message