Date: Thu, 10 Jun 2004 02:25:28 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: Greg Lewis <glewis@eyesbeyond.com> Cc: freebsd-java@freebsd.org Subject: Re: problems with java.util.zip and diacritical characters in file names Message-ID: <D17F6CD704077FEABC1AE296@palle.girgensohn.se> In-Reply-To: <20040609175626.GB83936@misty.eyesbeyond.com> References: <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Hi, Well, the problem is about character sets. A zip file seems to have no attribute telling which charset it uses for representing file names. Not very surprising. Java seems to handle this by reading filenames correctly and converting them to java Strings (in unicode). But when fetching data, it uses the unicode byte sequence to find and fetch the entry, and comes out empty handed, the getInputString returns null. I know of no way to tell java.util.zip that it should use some other character set? Hexdumping the resulting zip file, it is obvious that it has used unicode in the zip file when saving the file name entries. I'm not sure how winzip would react, but I assume it will show them as latin1, i.e. ä -> ä. While this is really bad for me, since there is no standard I'm not quite sure this is wrong? BTW, there is a plugin pure java implementation on sourceforge, <http://jazzlib.sourceforge.net/>. It seems to result in same filenames on input and output. In (getName): z/ Out (getName): z/ In (getName): z/åäöÅÄÖ.txt Out (getName): z/åäöÅÄÖ.txt in is null with java.util.zip, in is null and the file is renamed to same thing but in unicode, and is zero bytes in the zip file. with jazzlib, this seems to work, in is not null and the åäöÅÄÖ.txt file is not empty I'm running this in a shell with $ echo $LC_ALL sv_SE.ISO8859-1 Regards, Palle --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis <glewis@eyesbeyond.com> wrote: > On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote: >> java.util.zip cannot inflate a zip archive that contains eight bit >> characters in file names, it simply crashes. I haven't been able to try >> it on ither platforms yet, but I'd like to hear from others who might >> have seen this problem. Odd thing is there is no exception or anything >> it just stops when the first character comes up, and returns null. >> >> Anyone else seen this? Is it just FreeBSD? > > If you send a small test programme and zip I can quickly try it on > Linux to compare. > > -- > Greg Lewis Email : glewis@eyesbeyond.com > Eyes Beyond Web : http://www.eyesbeyond.com > Information Technology FreeBSD : glewis@FreeBSD.org [-- Attachment #2 --] import java.io.*; import java.util.*; import java.util.zip.*; //import net.sf.jazzlib.*; /** Text a zip file. run as "java ZipText infile.zip filetocreate.zip" */ public class ZipTest { public static void main(String[] args) { try { ZipFile zipIn = new ZipFile(args[0]); ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(args[1])); Enumeration inFiles = zipIn.entries(); while(inFiles.hasMoreElements()) { ZipEntry inEntry = (ZipEntry) inFiles.nextElement(); System.out.print("In (getName): "); System.out.println(inEntry.getName()); ZipEntry outEntry = new ZipEntry(inEntry.getName()); System.out.print("Out (getName): "); System.out.println(outEntry.getName()); zipOut.putNextEntry(outEntry); if (inEntry.isDirectory()) { continue; } copy(zipIn.getInputStream(inEntry), zipOut); zipOut.closeEntry(); } zipOut.close(); zipIn.close(); } catch (Exception e) { e.printStackTrace(); } } private static void copy(InputStream in, OutputStream out) throws IOException { if (in == null) { System.out.println("in is null"); return ; } synchronized (in) { synchronized (out) { byte[] buffer = new byte[2048]; while(true) { int bytesRead = in.read(buffer); if (bytesRead == -1) break; out.write(buffer, 0, bytesRead); } } } } }help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D17F6CD704077FEABC1AE296>
