Date: Thu, 10 Jun 2004 12:17:52 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: Palle Girgensohn <girgen@pingpong.net>, Greg Lewis <glewis@eyesbeyond.com> Cc: freebsd-java@freebsd.org Subject: Re: problems with java.util.zip and diacritical characters in file names Message-ID: <84B75B389C49D6FF3ED95F29@rambutan.pingpong.net> In-Reply-To: <D17F6CD704077FEABC1AE296@palle.girgensohn.se> References: <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com> <D17F6CD704077FEABC1AE296@palle.girgensohn.se>
index | next in thread | previous in thread | raw e-mail
I've tried this on Linux, seems to act in the same way. One problem is Java converting the entries to unicode (this is NOT done by jazzlib, it seems to keep the name in a byte array instead of a String). Anther problem is winzip uses the character set cp850 (! I though this was dead for ages...), so there really seems to be no hope unless I hack up jazzlib and convert the file names somehow? /Palle --On Thursday, June 10, 2004 02:25:28 +0200 Palle Girgensohn <girgen@pingpong.net> wrote: > Hi, > > Well, the problem is about character sets. A zip file seems to have no > attribute telling which charset it uses for representing file names. Not > very surprising. > > Java seems to handle this by reading filenames correctly and converting > them to java Strings (in unicode). But when fetching data, it uses the > unicode byte sequence to find and fetch the entry, and comes out empty > handed, the getInputString returns null. I know of no way to tell > java.util.zip that it should use some other character set? > > Hexdumping the resulting zip file, it is obvious that it has used unicode > in the zip file when saving the file name entries. I'm not sure how > winzip would react, but I assume it will show them as latin1, i.e. ä -> > ä. While this is really bad for me, since there is no standard I'm not > quite sure this is wrong? > > BTW, there is a plugin pure java implementation on sourceforge, > <http://jazzlib.sourceforge.net/>. It seems to result in same filenames > on input and output. > > In (getName): z/ > Out (getName): z/ > In (getName): z/åäöÅÄÖ.txt > Out (getName): z/åäöÅÄÖ.txt > in is null > > with java.util.zip, in is null and the file is renamed to same thing but > in unicode, and is zero bytes in the zip file. > > with jazzlib, this seems to work, in is not null and the åäöÅÄÖ.txt file > is not empty > > > I'm running this in a shell with > $ echo $LC_ALL > sv_SE.ISO8859-1 > > Regards, > Palle > > > --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis > <glewis@eyesbeyond.com> wrote: > >> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote: >>> java.util.zip cannot inflate a zip archive that contains eight bit >>> characters in file names, it simply crashes. I haven't been able to try >>> it on ither platforms yet, but I'd like to hear from others who might >>> have seen this problem. Odd thing is there is no exception or anything >>> it just stops when the first character comes up, and returns null. >>> >>> Anyone else seen this? Is it just FreeBSD? >> >> If you send a small test programme and zip I can quickly try it on >> Linux to compare. >> >> -- >> Greg Lewis Email : glewis@eyesbeyond.com >> Eyes Beyond Web : http://www.eyesbeyond.com >> Information Technology FreeBSD : glewis@FreeBSD.org > > >help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84B75B389C49D6FF3ED95F29>
