Date: Thu, 10 Jun 2004 02:25:28 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: Greg Lewis <glewis@eyesbeyond.com> Cc: freebsd-java@freebsd.org Subject: Re: problems with java.util.zip and diacritical characters in file names Message-ID: <D17F6CD704077FEABC1AE296@palle.girgensohn.se> In-Reply-To: <20040609175626.GB83936@misty.eyesbeyond.com> References: <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--==========29584E48F3C762197735========== Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, Well, the problem is about character sets. A zip file seems to have no=20 attribute telling which charset it uses for representing file names. Not=20 very surprising. Java seems to handle this by reading filenames correctly and converting=20 them to java Strings (in unicode). But when fetching data, it uses the=20 unicode byte sequence to find and fetch the entry, and comes out empty=20 handed, the getInputString returns null. I know of no way to tell=20 java.util.zip that it should use some other character set? Hexdumping the resulting zip file, it is obvious that it has used unicode=20 in the zip file when saving the file name entries. I'm not sure how winzip=20 would react, but I assume it will show them as latin1, i.e. =E4 -> =C3=A4. = While=20 this is really bad for me, since there is no standard I'm not quite sure=20 this is wrong? BTW, there is a plugin pure java implementation on sourceforge,=20 <http://jazzlib.sourceforge.net/>. It seems to result in same filenames on=20 input and output. In (getName): z/ Out (getName): z/ In (getName): z/=E5=E4=F6=C5=C4=D6.txt Out (getName): z/=E5=E4=F6=C5=C4=D6.txt in is null with java.util.zip, in is null and the file is renamed to same thing but in = unicode, and is zero bytes in the zip file. with jazzlib, this seems to work, in is not null and the = =E5=E4=F6=C5=C4=D6.txt file is=20 not empty I'm running this in a shell with $ echo $LC_ALL sv_SE.ISO8859-1 Regards, Palle --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis=20 <glewis@eyesbeyond.com> wrote: > On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote: >> java.util.zip cannot inflate a zip archive that contains eight bit >> characters in file names, it simply crashes. I haven't been able to try >> it on ither platforms yet, but I'd like to hear from others who might >> have seen this problem. Odd thing is there is no exception or anything >> it just stops when the first character comes up, and returns null. >> >> Anyone else seen this? Is it just FreeBSD? > > If you send a small test programme and zip I can quickly try it on > Linux to compare. > > -- > Greg Lewis Email : glewis@eyesbeyond.com > Eyes Beyond Web : http://www.eyesbeyond.com > Information Technology FreeBSD : glewis@FreeBSD.org --==========29584E48F3C762197735========== Content-Type: text/plain; charset=iso-8859-1; name="ZipTest.java" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="ZipTest.java"; size=1411 import java.io.*; import java.util.*; import java.util.zip.*; //import net.sf.jazzlib.*; /** Text a zip file. run as "java ZipText infile.zip filetocreate.zip" */ public class ZipTest { public static void main(String[] args) { try { ZipFile zipIn =3D new ZipFile(args[0]); ZipOutputStream zipOut =3D new ZipOutputStream(new = FileOutputStream(args[1])); Enumeration inFiles =3D zipIn.entries(); while(inFiles.hasMoreElements()) { ZipEntry inEntry =3D (ZipEntry) inFiles.nextElement(); System.out.print("In (getName): "); System.out.println(inEntry.getName()); ZipEntry outEntry =3D new ZipEntry(inEntry.getName()); System.out.print("Out (getName): "); System.out.println(outEntry.getName()); zipOut.putNextEntry(outEntry); if (inEntry.isDirectory()) { continue; } copy(zipIn.getInputStream(inEntry), zipOut); zipOut.closeEntry(); } zipOut.close(); zipIn.close(); } catch (Exception e) { e.printStackTrace(); } } private static void copy(InputStream in, OutputStream out)=20 throws IOException { if (in =3D=3D null) { System.out.println("in is null"); return ; } synchronized (in) { synchronized (out) { byte[] buffer =3D new byte[2048]; while(true) { int bytesRead =3D in.read(buffer); if (bytesRead =3D=3D -1) break; out.write(buffer, 0, bytesRead); } } } } } --==========29584E48F3C762197735==========--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D17F6CD704077FEABC1AE296>