From owner-freebsd-current@FreeBSD.ORG Wed Sep 28 05:43:25 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3945A16A41F; Wed, 28 Sep 2005 05:43:25 +0000 (GMT) (envelope-from kientzle@freebsd.org) Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id BDF4E43D48; Wed, 28 Sep 2005 05:43:24 +0000 (GMT) (envelope-from kientzle@freebsd.org) Received: from freebsd.org (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id j8S5hMOZ092940; Tue, 27 Sep 2005 22:43:22 -0700 (PDT) (envelope-from kientzle@freebsd.org) Message-ID: <433A2D6E.7020205@freebsd.org> Date: Tue, 27 Sep 2005 22:43:10 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Tim Kientzle References: <20050926195807.GD95971@sandvine.com> <17208.30606.117170.36398@khavrinen.csail.mit.edu> <20050927001650.GA9994@sandvine.com> <20050927180021.GB9994@sandvine.com> <433A2882.4030003@freebsd.org> In-Reply-To: <433A2882.4030003@freebsd.org> Content-Type: multipart/mixed; boundary="------------020704000509080109060909" Cc: Garrett Wollman , freebsd-current@freebsd.org, Ed Maste Subject: Re: Bsdtar and archive torture tests X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 05:43:25 -0000 This is a multi-part message in MIME format. --------------020704000509080109060909 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Ed, Try the attached patch (for /usr/src/lib/libarchive) and let me know if that fixes it for you. libarchive was actually skipping the UTF-8 conversion when storing the long linkname but then (correctly) converting from UTF-8 on extraction. The patch fixes the pax archive writer so it does correctly convert to UTF-8. Tim Tim Kientzle wrote: > Hmmm.... Looking at the internals of the generated archive > shows that the extended attribute is definitely getting > stored incorrectly. I'll look into this. > > If you see any other problems, please let me know! > > Tim > > > Ed Maste wrote: > >> On Mon, Sep 26, 2005 at 08:16:50PM -0400, Ed Maste wrote: >> >> >>> Hmm, good point. I haven't set it to anything; locale(1) shows >>> that the LC_ variables are set to "C". So then I can see how this >>> happens, but it's still surprising (to me) behaviour. >> >> >> >> Ok, now I've definately encountered some non-obvious behaviour. >> A symlink target of 100 bytes or less keeps the same name, while >> a target of more than 100 bytes gets munged from the converstion >> to UTF-8 and back. >> >> For example, the symlink created by the following script doesn't >> change the link target: >> >> #!/bin/sh >> fname=$(printf $(jot -b \\303\\240 -s '' 50)) >> ln -fs $fname test >> tar -cf - test | tar -tvf - >> >> but if the 50 in the jot command is changed to 51, the target >> changes. So I guess that the link target doesn't fit in the >> standard header anymore, and needs an extended tag. Having >> different behaviour for the two cases does seem odd. >> >> -- >> Ed Maste, Sandvine Incorporated >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to >> "freebsd-current-unsubscribe@freebsd.org" >> >> > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > --------------020704000509080109060909 Content-Type: text/plain; name="longsymlinkwcs.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="longsymlinkwcs.patch" Index: archive_entry.c =================================================================== RCS file: /home/ncvs/src/lib/libarchive/archive_entry.c,v retrieving revision 1.31 diff -u -r1.31 archive_entry.c --- archive_entry.c 21 Sep 2005 04:25:05 -0000 1.31 +++ archive_entry.c 28 Sep 2005 05:36:04 -0000 @@ -203,6 +203,8 @@ static const char * aes_get_mbs(struct aes *aes) { + if (aes->aes_mbs == NULL && aes->aes_wcs == NULL) + return NULL; if (aes->aes_mbs == NULL && aes->aes_wcs != NULL) { /* * XXX Need to estimate the number of byte in the @@ -224,6 +226,8 @@ static const wchar_t * aes_get_wcs(struct aes *aes) { + if (aes->aes_wcs == NULL && aes->aes_mbs == NULL) + return NULL; if (aes->aes_wcs == NULL && aes->aes_mbs != NULL) { /* * No single byte will be more than one wide character, @@ -463,6 +467,12 @@ return (aes_get_mbs(&entry->ae_hardlink)); } +const wchar_t * +archive_entry_hardlink_w(struct archive_entry *entry) +{ + return (aes_get_wcs(&entry->ae_hardlink)); +} + ino_t archive_entry_ino(struct archive_entry *entry) { @@ -536,6 +546,12 @@ return (aes_get_mbs(&entry->ae_symlink)); } +const wchar_t * +archive_entry_symlink_w(struct archive_entry *entry) +{ + return (aes_get_wcs(&entry->ae_symlink)); +} + uid_t archive_entry_uid(struct archive_entry *entry) { Index: archive_entry.h =================================================================== RCS file: /home/ncvs/src/lib/libarchive/archive_entry.h,v retrieving revision 1.17 diff -u -r1.17 archive_entry.h --- archive_entry.h 10 Sep 2005 22:58:06 -0000 1.17 +++ archive_entry.h 28 Sep 2005 05:36:05 -0000 @@ -80,6 +80,7 @@ gid_t archive_entry_gid(struct archive_entry *); const char *archive_entry_gname(struct archive_entry *); const char *archive_entry_hardlink(struct archive_entry *); +const wchar_t *archive_entry_hardlink_w(struct archive_entry *); ino_t archive_entry_ino(struct archive_entry *); mode_t archive_entry_mode(struct archive_entry *); time_t archive_entry_mtime(struct archive_entry *); @@ -92,6 +93,7 @@ int64_t archive_entry_size(struct archive_entry *); const struct stat *archive_entry_stat(struct archive_entry *); const char *archive_entry_symlink(struct archive_entry *); +const wchar_t *archive_entry_symlink_w(struct archive_entry *); uid_t archive_entry_uid(struct archive_entry *); const char *archive_entry_uname(struct archive_entry *); Index: archive_write_set_format_pax.c =================================================================== RCS file: /home/ncvs/src/lib/libarchive/archive_write_set_format_pax.c,v retrieving revision 1.30 diff -u -r1.30 archive_write_set_format_pax.c --- archive_write_set_format_pax.c 21 Sep 2005 04:25:05 -0000 1.30 +++ archive_write_set_format_pax.c 28 Sep 2005 05:36:06 -0000 @@ -393,11 +393,14 @@ /* If link name is too long, add 'linkpath' to pax extended attrs. */ linkname = hardlink; - if (linkname == NULL) + if (linkname == NULL) { linkname = archive_entry_symlink(entry_main); + wp = archive_entry_symlink_w(entry_main); + } else + wp = archive_entry_hardlink_w(entry_main); if (linkname != NULL && strlen(linkname) > 100) { - add_pax_attr(&(pax->pax_header), "linkpath", linkname); + add_pax_attr_w(&(pax->pax_header), "linkpath", wp); if (hardlink != NULL) archive_entry_set_hardlink(entry_main, "././@LongHardLink"); --------------020704000509080109060909--