From owner-p4-projects@FreeBSD.ORG Wed Apr 14 07:56:26 2010 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 792751065676; Wed, 14 Apr 2010 07:56:26 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BF281065670; Wed, 14 Apr 2010 07:56:26 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id A9B2D8FC0A; Wed, 14 Apr 2010 07:56:25 +0000 (UTC) Received: by gyh20 with SMTP id 20so4308425gyh.13 for ; Wed, 14 Apr 2010 00:56:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:received:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=QaEzGGRrJkhDuRoYAkoNBQzw7t3rzG0b81oSAP1RiRA=; b=RODBhp+X70zboqLTGZUVHBNNW5PmkMDJv+7P7sGTdL3nuoDeC8ZnZLNX6V4sbCR/oJ q+Uu0KiDDJVQWPOle55vquiigg7/4Ux10vdHrIrtqeNwop2vfh1G1E3aT8fLu7aj69VB R38RrWmi75Rh2vX9sP4ItlaNfhS+l6yhdSQOA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=KeOLlxnq38pJgADhK3P+X3wpGrJM5VPxy2Hwz+jtJBryLUN+7aU02rwtGvOklETCiJ L98pVOGDB7jKXSIvP+KVr2ZdIaVjFYnn983e27Oa1P5lupQm2EZGTXDjlxJkuJFXpAEN /iQYXPZr2NoOeA8PK9j8QvjpIndJNOyRHZh8A= MIME-Version: 1.0 Sender: yanegomi@gmail.com Received: by 10.231.183.17 with HTTP; Wed, 14 Apr 2010 00:56:22 -0700 (PDT) In-Reply-To: References: <201004121230.o3CCUsIX029146@repoman.freebsd.org> <4BC3EB5B.5070801@freebsd.org> <4BC53714.80805@freebsd.org> Date: Wed, 14 Apr 2010 00:56:22 -0700 X-Google-Sender-Auth: 9aca1ff85600f2b9 Received: by 10.101.183.23 with SMTP id k23mr12678219anp.160.1271231782739; Wed, 14 Apr 2010 00:56:22 -0700 (PDT) Message-ID: From: Garrett Cooper To: Garrett Cooper Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: David Forsythe , Florent Thoumie , Tim Kientzle , Perforce Change Reviews Subject: Re: PERFORCE change 176831 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2010 07:56:26 -0000 On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper wrot= e: > On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle wrot= e: >>>> ... I would also put in a simple verification >>>> that the entry you found is a regular entry: >>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG >>> >>> Ok. What does a regular entry correspond to (I assume not a regular fil= e)? >> >> Sorry, "regular entry" =3D=3D "regular file" (or at least, what >> will become a regular file once you write it to disk, if you >> want to be really pedantic about it ;-). =A0Coincidentally, >> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants >> are there mostly portability shims (e.g., Windows doesn't >> have block devices, so doesn't define S_IFBLK, >> but AE_IFBLK is always available). > > Ok. I think I got everything you were trying to say before, but just > to be clear: > > To prevent race conditions [with regular files only], I should use > open(2). For all other files I would use the standard extract method > (so I don't get too fancy?). It seems like the file creation times > should be sufficiently fast with other file types that this kind of > behavior of open(2), blah wouldn't be required. > >>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly int= o >>>>> exit >>>>> + * when doing piped tar commands for copying hierarchies *hint*, >>>>> *hint*. >>>> >>>> Why do you want to copy hierarchies? >>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-) >>> >>> There's a fair amount of tar cpf - -C . | tar xpf - -C >>> in libinstall . This is being done to preserve file attributes, >>> fflags, modes, ownership, permissions, etc. ... Installing directly to >>> hierarchies works in theory as well, but it's >>> considerably more dangerous unless you do an information transfer >>> between the pkg contents and the install base, and unfortunately that >>> is tough to achieve with 100% clarity from what I've seen. >> >> That's exactly what I was getting to: =A0Someday, pkg_add >> should write the files directly to their final location >> and avoid the copy. =A0(You're exactly right that you >> don't want to build a "copy tree" facility.) >> >> Clearly, direct install is not something you want to >> tackle in this stage of the rewrite. =A0There's a lot of >> gains from exactly what you're doing. > > Yeah, I've been thinking about what you said... and there's ultimately > nothing to be gained through a staged install (in fact there's a lot > lost). So unless two files overwrite one another (which suggest broken > or conflicting packages, or not so clever users), people shouldn't be > installing conflicting packages or partially installing multiple > versions of the same package unless they're using -f, and if they're > doing that, they're basically damning the consequences of doing so. > > Until we have functional tests in place to actually test this though, > I need to stick with a conservative, functional analog to piped > tar(1)'ing though. > >> But I really do believe that single-pass direct >> install is feasible and is eventually where we want >> to be. =A0The key insight for me was when Florent recently >> pointed out that you could just read the +CONTENTS, >> then do a verify pass, then extract everything. >> (Any conflict during the extraction pass >> would be a fatal error: =A0delete everything >> extracted so far and scream loudly.) > > Agreed on the final point. I'm kind of interested though about the > first point -- the verify pass -- are you verifying that the contents > are sane on the disk or in the tarball? If so, why not just verify the > contents after the fact [with mtree -- it's has checksumming, mode and > permissions checking, you name it... everything minus some special > facilities like ACLs, fflags, etc -- I guess I'll need to add those], > then roll everything back if it isn't? > > One the entire operation is atomic and exclusive -- only one package > operation is allowed to run at any given time on a particular package, > running into this problem with the packaging software will be a > non-issue, but instead a problem with an external source (user, etc), > or a bad package. > >> Dealing with dependent packages is still >> a little tricky but I think it's all doable. > > Yeah, there's a crapload of overlap between multiple packages, like > cad-salome, and with some packages where you're need to build up a > hierarchy that's common between several packages, it becomes a royal > pain in the ass, but it's required otherwise stuff won't get setup and > torn down properly. > > So I'm guessing for right now we should just avoid squealing loudly > except with regular files; maybe whining a bit with non-regular files > would be a good idea though because it would force people to clean up > packages, and then things could be completely fixed later on so that > they're hard errors. > >>> I saw that in the original comments and while I think that you have a >>> point, I still am leery about someone specifying a package with >>> contents that start with + because it immediately breaks the >>> assumption. ... >> >> Unless I'm mistaken, pkg_add has used "+*" to identify >> metadata files for a long time, so I think your own >> argument from compatibility might be against you here. >> (Though I suspect that you're right in the long term that >> we should be obeying the +CONTENTS declarations instead of >> using filename conventions.) > > I still want to avoid this potential corner case. Users have an > incapability to surprise developers, and I don't want to have to deal s/in/unerring / > with this item in the future. A few less CPU cycles processing strings > is worth sacrificing over trying to be clever and producing a broken > program. > >>> unpack_file, blah with the appropriate metadata filenames considering >>> that it's a lot quicker than having to go through the entire tarball >>> end to end (especially if it's a large tarfile like openoffice), >>> unless someone unfortunately put the files at the end of the tarball >>> (in that case they do need to modify how the package is created). >> >> As for the speed argument: =A0Do you really need to >> extract the metadata files before you extract >> everything else? =A0Those files are going to a >> different place, but that's not an issue with a >> libarchive-driven extraction. =A0At one point, I had >> convinced myself that you could do it all in one >> pass, just writing the files to different places >> depending on whether they were metadata files >> (however identified) or not. =A0I recall, for example, >> that the +MTREE file needs to be used only at the >> end of the extraction (to fix-up permissions). > > Yes. pkg_info's primary uses (not -W) read the metadata if the > metadata isn't on the disk already in $PKGDBDIR; it's an optimization, > and I don't want to reduce performance in pkg_info. > > Other than that, pkg_install really doesn't care one iota about those > files (minus +CONTENTS). > >>> Thanks again for the comments :)... it would be helpful BTW if the >>> manpages linked together because archive_read(3) doesn't reference the >>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the >>> right APIs to use, and I feel like I'm catching more boots than fish >>> sometimes... but it's a learning experience and I don't expect to get >>> it right the first time. >> >> Please let me know any doc shortcomings; the documentation >> could use a lot of work, I know. =A0I also have a growing >> Examples page on the libarchive Wiki and conversations >> like this help me a lot to better understand what needs >> to go there. > > Will definitely help whenever I can :). > > Also, getting back to a topic you mentioned earlier, the issue with > creating the decompressor multiple times can be alleviated like so: > > Make the basic extract API follow a format like so: > > struct package_archive; > > struct package_archive { > =A0 =A0/* > =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want > to reuse the > =A0 =A0 * descriptor, i.e. want to only use the initializer once. > =A0 =A0 */ > =A0 =A0int *pkg_fd; > =A0 =A0/* > =A0 =A0 * Compare a package filename vs an expression. > =A0 =A0 * > =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob express= ions), > =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX), > glob(3), or it could > =A0 =A0 * always return 0 (for match all). Would return 0 if the files > matched, non-zero > =A0 =A0 * if they didn't match. > =A0 =A0 * > =A0 =A0 * This would help us avoid hardcoding and at the small cost of a = stack > =A0 =A0 * push, pop, the additional overhead would be fairly negligible I= would > =A0 =A0 * think, and could potentially be optimized out using a smart com= piler. > =A0 =A0 */ > =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const c= har *); > }; > > =A0 =A0I'd need to develop new and free `methods', but that's the general > idea I had in mind. > Thanks! > -Garrett >