From owner-freebsd-standards@FreeBSD.ORG Wed Oct 10 11:26:55 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D46616A417 for ; Wed, 10 Oct 2007 11:26:55 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id 8AC7213C4BE for ; Wed, 10 Oct 2007 11:26:52 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 8A10F74400A; Wed, 10 Oct 2007 13:57:09 +0300 (EEST) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13DqSDSCOeu9; Wed, 10 Oct 2007 13:57:09 +0300 (EEST) Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [88.81.251.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 08476744001; Wed, 10 Oct 2007 13:57:08 +0300 (EEST) Message-ID: <470CB004.2050603@icyb.net.ua> Date: Wed, 10 Oct 2007 13:57:08 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.6 (X11/20070803) MIME-Version: 1.0 To: freebsd-stable@freebsd.org, freebsd-standards@freebsd.org References: <46F29B3B.3010304@icyb.net.ua> In-Reply-To: <46F29B3B.3010304@icyb.net.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: Re: pax misbehavior X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 11:26:55 -0000 Sorry for top-posting, but I am replying to myself and the context is rather lengthy. It seems the issue is that our pax has an internal heuristic to apply -s transformations not only to file names, but to hard- and sym- link targets also. On one hand this seems to be beneficial, on the other hand this can lead to some confusion, because symlink targets can be relative and their pathnames can match quite unexpected patterns as compared to normal file pathnames. What makes this behavior is even less obvious to understand is that if link target is transformed into an an empty string then link is omitted altogether. This, of course, makes certain sense: there can not be a link without any target at all. On the other hand, POSIX explicitly gives one and only one reason to omit a file - when its _name_ is transformed to empty string. So this looks like a POSIX violation and unexpected behavior. I have several proposals on fixing this situation: 1. since link target modifying behavior is something that POSIX is silent about then it seems to be an extension and it would be nice to provide extended options to turn on/off (and maybe control some aspects of) this behavior. AIX pax, for instance, doesn't do that. Solaris and Linux seem to have the same behavior. 2. I think that regardless if #1 is implemented pax man page should describe this behavior and even warn about it. 3. symlink target modification heuristic may be updated to exclude the most trivial and probably widespread case of symlinks into the same directory, i.e. its target doesn't contain any '/'. 4. symlink target modification heuristic may be updated to leave link target alone if its substitution results in empty string (rather than throwing the symlink out as it is done now). There is, of course, a workaround for my particular case which is to never use kill-all substitution -s '#.*##', but instead to explicitly list all archive hierarchies roots like -s '#^root1/.*##' -s '#^root2/.*##' ... But even then there might be some unpleasant and hard-to-debug surprises with other patterns being misapplied where no one expected them to be applied. on 20/09/2007 19:09 Andriy Gapon said the following: > Preparation first: > $ mkdir xxxxx > $ cd xxxxx/ > $ touch yyyyy > $ ln -s yyyyy yyyyy.0 > $ ln -s yyyyy.0 yyyyy.0.0 > $ cd .. > > Demonstration of expected behavior: > $ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" xxxxx > $ pax -vf xxxxx.tar > drwxr-xr-x 2 ... 0 20 Sep 18:51 zzzzz > -rw-r--r-- 1 ... 0 20 Sep 18:51 zzzzz/yyyyy > lrwxr-xr-x 1 ... 0 20 Sep 18:51 zzzzz/yyyyy.0 => yyyyy > lrwxr-xr-x 1 ... 0 20 Sep 18:51 zzzzz/yyyyy.0.0 => yyyyy.0 > pax: ustar vol 1, 4 files, 10240 bytes read, 0 bytes written. > > Demonstration of misbehavior: > $ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" -s "#.*##" xxxxx > $ pax -vf xxxxx.tar > drwxr-xr-x 2 ... 0 20 Sep 18:51 zzzzz > -rw-r--r-- 1 ... 0 20 Sep 18:51 zzzzz/yyyyy > pax: ustar vol 1, 2 files, 10240 bytes read, 0 bytes written. > > > The only thing added in the second test is -s "#.*##" option _after_ the > first -s option. Mysteriously it caused all symlinks to not be included > into an archive. But this should not happen if the behavior in the first > test is correct and pax follows POSIX specification: if an entry is > handled by the first -s (which it was in the first test), then further > -s options should not be applied to it. Our man page also says it: > > Multiple -s expressions can be specified. The > expressions are applied in the order they are specified on the com- > mand line, terminating with the first successful substitution. > > Of course, this synthetic test is a simplification of something done for > a real task with a real purpose. -s "#.*##" is meant to exclude from an > archive all "other" files and the side-effect of excluding symlinks as > well is very unfortunate. > > Should I file a PR ? > -- Andriy Gapon