Date: Thu, 10 Jun 2004 12:17:52 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: Palle Girgensohn <girgen@pingpong.net>, Greg Lewis <glewis@eyesbeyond.com> Cc: freebsd-java@freebsd.org Subject: Re: problems with java.util.zip and diacritical characters in file names Message-ID: <84B75B389C49D6FF3ED95F29@rambutan.pingpong.net> In-Reply-To: <D17F6CD704077FEABC1AE296@palle.girgensohn.se> References: <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com> <D17F6CD704077FEABC1AE296@palle.girgensohn.se>
next in thread | previous in thread | raw e-mail | index | archive | help
I've tried this on Linux, seems to act in the same way. One problem is Java = converting the entries to unicode (this is NOT done by jazzlib, it seems to = keep the name in a byte array instead of a String). Anther problem is=20 winzip uses the character set cp850 (! I though this was dead for ages...), = so there really seems to be no hope unless I hack up jazzlib and convert=20 the file names somehow? /Palle --On Thursday, June 10, 2004 02:25:28 +0200 Palle Girgensohn=20 <girgen@pingpong.net> wrote: > Hi, > > Well, the problem is about character sets. A zip file seems to have no > attribute telling which charset it uses for representing file names. Not > very surprising. > > Java seems to handle this by reading filenames correctly and converting > them to java Strings (in unicode). But when fetching data, it uses the > unicode byte sequence to find and fetch the entry, and comes out empty > handed, the getInputString returns null. I know of no way to tell > java.util.zip that it should use some other character set? > > Hexdumping the resulting zip file, it is obvious that it has used unicode > in the zip file when saving the file name entries. I'm not sure how > winzip would react, but I assume it will show them as latin1, i.e. =E4 -> > =C3=A4. While this is really bad for me, since there is no standard I'm = not > quite sure this is wrong? > > BTW, there is a plugin pure java implementation on sourceforge, > <http://jazzlib.sourceforge.net/>. It seems to result in same filenames > on input and output. > > In (getName): z/ > Out (getName): z/ > In (getName): z/=E5=E4=F6=C5=C4=D6.txt > Out (getName): z/=E5=E4=F6=C5=C4=D6.txt > in is null > > with java.util.zip, in is null and the file is renamed to same thing but > in unicode, and is zero bytes in the zip file. > > with jazzlib, this seems to work, in is not null and the = =E5=E4=F6=C5=C4=D6.txt file > is not empty > > > I'm running this in a shell with > $ echo $LC_ALL > sv_SE.ISO8859-1 > > Regards, > Palle > > > --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis > <glewis@eyesbeyond.com> wrote: > >> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote: >>> java.util.zip cannot inflate a zip archive that contains eight bit >>> characters in file names, it simply crashes. I haven't been able to try >>> it on ither platforms yet, but I'd like to hear from others who might >>> have seen this problem. Odd thing is there is no exception or anything >>> it just stops when the first character comes up, and returns null. >>> >>> Anyone else seen this? Is it just FreeBSD? >> >> If you send a small test programme and zip I can quickly try it on >> Linux to compare. >> >> -- >> Greg Lewis Email : glewis@eyesbeyond.com >> Eyes Beyond Web : http://www.eyesbeyond.com >> Information Technology FreeBSD : glewis@FreeBSD.org > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84B75B389C49D6FF3ED95F29>