From owner-p4-projects@FreeBSD.ORG Wed Apr 14 08:00:20 2010 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 2386D1065678; Wed, 14 Apr 2010 08:00:20 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2CD91065673; Wed, 14 Apr 2010 08:00:19 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-iw0-f171.google.com (mail-iw0-f171.google.com [209.85.223.171]) by mx1.freebsd.org (Postfix) with ESMTP id 5597C8FC1F; Wed, 14 Apr 2010 08:00:19 +0000 (UTC) Received: by iwn1 with SMTP id 1so3168131iwn.27 for ; Wed, 14 Apr 2010 01:00:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:received:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=u+BErZw6CTxCYLU3v4bff5Q4EwNLg0pqtNLpCOuibTM=; b=bTtvmkKC5Z7DX3FUcufv035o9WzQnHlfDX3am+jhhjv2WpIxbTtxkBrZtturOHNa+f EAwSLhP4VRAFiFlooooRJP8N1HZ7FiBq7Qpta2tht4NbDhbgKEKCni4hmiS276oQ2dmm UKHK+C15wAZRPWjj34hNUkxcjy7qZ4F2Dk1LE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=EEHfZVrFH6eYomiw5STTdeJZjofEhgSzTr+Sdwob7XUmKVkv1sbnDFwo3jEcr49o1p t7d28bKVqWxh8HJpDKNiIHkc9jOLo39QQIiIeMIZTEiLxs9W+cs33q2MeRHy5W9cbz7U LPrNX0Ugz3b8nz23VeSgdJmY+2XGBqyN1+uhQ= MIME-Version: 1.0 Sender: yanegomi@gmail.com Received: by 10.231.183.17 with HTTP; Wed, 14 Apr 2010 01:00:18 -0700 (PDT) In-Reply-To: References: <201004121230.o3CCUsIX029146@repoman.freebsd.org> <4BC3EB5B.5070801@freebsd.org> <4BC53714.80805@freebsd.org> Date: Wed, 14 Apr 2010 01:00:18 -0700 X-Google-Sender-Auth: 096e76fd92cc7a42 Received: by 10.231.151.129 with SMTP id c1mr3173584ibw.59.1271232018444; Wed, 14 Apr 2010 01:00:18 -0700 (PDT) Message-ID: From: Garrett Cooper To: Garrett Cooper Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: David Forsythe , Florent Thoumie , Tim Kientzle , Perforce Change Reviews Subject: Re: PERFORCE change 176831 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Apr 2010 08:00:20 -0000 And yet one more clerical error after I reread this... On Wed, Apr 14, 2010 at 12:56 AM, Garrett Cooper wrot= e: > On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper wr= ote: >> On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle wro= te: >>>>> ... I would also put in a simple verification >>>>> that the entry you found is a regular entry: >>>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG >>>> >>>> Ok. What does a regular entry correspond to (I assume not a regular fi= le)? >>> >>> Sorry, "regular entry" =3D=3D "regular file" (or at least, what >>> will become a regular file once you write it to disk, if you >>> want to be really pedantic about it ;-). =A0Coincidentally, >>> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants >>> are there mostly portability shims (e.g., Windows doesn't >>> have block devices, so doesn't define S_IFBLK, >>> but AE_IFBLK is always available). >> >> Ok. I think I got everything you were trying to say before, but just >> to be clear: >> >> To prevent race conditions [with regular files only], I should use >> open(2). For all other files I would use the standard extract method >> (so I don't get too fancy?). It seems like the file creation times >> should be sufficiently fast with other file types that this kind of >> behavior of open(2), blah wouldn't be required. >> >>>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly in= to >>>>>> exit >>>>>> + * when doing piped tar commands for copying hierarchies *hint*, >>>>>> *hint*. >>>>> >>>>> Why do you want to copy hierarchies? >>>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-) >>>> >>>> There's a fair amount of tar cpf - -C . | tar xpf - -C >>>> in libinstall . This is being done to preserve file attributes, >>>> fflags, modes, ownership, permissions, etc. ... Installing directly to >>>> hierarchies works in theory as well, but it's >>>> considerably more dangerous unless you do an information transfer >>>> between the pkg contents and the install base, and unfortunately that >>>> is tough to achieve with 100% clarity from what I've seen. >>> >>> That's exactly what I was getting to: =A0Someday, pkg_add >>> should write the files directly to their final location >>> and avoid the copy. =A0(You're exactly right that you >>> don't want to build a "copy tree" facility.) >>> >>> Clearly, direct install is not something you want to >>> tackle in this stage of the rewrite. =A0There's a lot of >>> gains from exactly what you're doing. >> >> Yeah, I've been thinking about what you said... and there's ultimately >> nothing to be gained through a staged install (in fact there's a lot >> lost). So unless two files overwrite one another (which suggest broken >> or conflicting packages, or not so clever users), people shouldn't be >> installing conflicting packages or partially installing multiple >> versions of the same package unless they're using -f, and if they're >> doing that, they're basically damning the consequences of doing so. >> >> Until we have functional tests in place to actually test this though, >> I need to stick with a conservative, functional analog to piped >> tar(1)'ing though. >> >>> But I really do believe that single-pass direct >>> install is feasible and is eventually where we want >>> to be. =A0The key insight for me was when Florent recently >>> pointed out that you could just read the +CONTENTS, >>> then do a verify pass, then extract everything. >>> (Any conflict during the extraction pass >>> would be a fatal error: =A0delete everything >>> extracted so far and scream loudly.) >> >> Agreed on the final point. I'm kind of interested though about the >> first point -- the verify pass -- are you verifying that the contents >> are sane on the disk or in the tarball? If so, why not just verify the >> contents after the fact [with mtree -- it's has checksumming, mode and >> permissions checking, you name it... everything minus some special >> facilities like ACLs, fflags, etc -- I guess I'll need to add those], >> then roll everything back if it isn't? >> >> One the entire operation is atomic and exclusive -- only one package >> operation is allowed to run at any given time on a particular package, >> running into this problem with the packaging software will be a >> non-issue, but instead a problem with an external source (user, etc), >> or a bad package. >> >>> Dealing with dependent packages is still >>> a little tricky but I think it's all doable. >> >> Yeah, there's a crapload of overlap between multiple packages, like >> cad-salome, and with some packages where you're need to build up a >> hierarchy that's common between several packages, it becomes a royal >> pain in the ass, but it's required otherwise stuff won't get setup and >> torn down properly. >> >> So I'm guessing for right now we should just avoid squealing loudly >> except with regular files; maybe whining a bit with non-regular files >> would be a good idea though because it would force people to clean up >> packages, and then things could be completely fixed later on so that >> they're hard errors. >> >>>> I saw that in the original comments and while I think that you have a >>>> point, I still am leery about someone specifying a package with >>>> contents that start with + because it immediately breaks the >>>> assumption. ... >>> >>> Unless I'm mistaken, pkg_add has used "+*" to identify >>> metadata files for a long time, so I think your own >>> argument from compatibility might be against you here. >>> (Though I suspect that you're right in the long term that >>> we should be obeying the +CONTENTS declarations instead of >>> using filename conventions.) >> >> I still want to avoid this potential corner case. Users have an >> incapability to surprise developers, and I don't want to have to deal > > s/in/unerring / > >> with this item in the future. A few less CPU cycles processing strings >> is worth sacrificing over trying to be clever and producing a broken >> program. >> >>>> unpack_file, blah with the appropriate metadata filenames considering >>>> that it's a lot quicker than having to go through the entire tarball >>>> end to end (especially if it's a large tarfile like openoffice), >>>> unless someone unfortunately put the files at the end of the tarball >>>> (in that case they do need to modify how the package is created). >>> >>> As for the speed argument: =A0Do you really need to >>> extract the metadata files before you extract >>> everything else? =A0Those files are going to a >>> different place, but that's not an issue with a >>> libarchive-driven extraction. =A0At one point, I had >>> convinced myself that you could do it all in one >>> pass, just writing the files to different places >>> depending on whether they were metadata files >>> (however identified) or not. =A0I recall, for example, >>> that the +MTREE file needs to be used only at the >>> end of the extraction (to fix-up permissions). >> >> Yes. pkg_info's primary uses (not -W) read the metadata if the >> metadata isn't on the disk already in $PKGDBDIR; it's an optimization, >> and I don't want to reduce performance in pkg_info. >> >> Other than that, pkg_install really doesn't care one iota about those >> files (minus +CONTENTS). +CONTENTS, +REQUIRE, and +REQUIRED_BY; the other files are used at various times during the entire lifetime of installing a package (+INSTALL, +MTREE_DIRS, +POST-DEINSTALL, etc) or inspecting a package (+COMMENTS, +DESC, etc), but they're all fairly late in the process and not nearly as critical as +CONTENTS. If +CONTENTS is gone, game over... you're just dealing with a tarball. >>>> Thanks again for the comments :)... it would be helpful BTW if the >>>> manpages linked together because archive_read(3) doesn't reference the >>>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the >>>> right APIs to use, and I feel like I'm catching more boots than fish >>>> sometimes... but it's a learning experience and I don't expect to get >>>> it right the first time. >>> >>> Please let me know any doc shortcomings; the documentation >>> could use a lot of work, I know. =A0I also have a growing >>> Examples page on the libarchive Wiki and conversations >>> like this help me a lot to better understand what needs >>> to go there. >> >> Will definitely help whenever I can :). >> >> Also, getting back to a topic you mentioned earlier, the issue with >> creating the decompressor multiple times can be alleviated like so: >> >> Make the basic extract API follow a format like so: >> >> struct package_archive; >> >> struct package_archive { >> =A0 =A0/* >> =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want >> to reuse the >> =A0 =A0 * descriptor, i.e. want to only use the initializer once. >> =A0 =A0 */ >> =A0 =A0int *pkg_fd; >> =A0 =A0/* >> =A0 =A0 * Compare a package filename vs an expression. >> =A0 =A0 * >> =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob expres= sions), >> =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX), >> glob(3), or it could >> =A0 =A0 * always return 0 (for match all). Would return 0 if the files >> matched, non-zero >> =A0 =A0 * if they didn't match. >> =A0 =A0 * >> =A0 =A0 * This would help us avoid hardcoding and at the small cost of a= stack >> =A0 =A0 * push, pop, the additional overhead would be fairly negligible = I would >> =A0 =A0 * think, and could potentially be optimized out using a smart co= mpiler. >> =A0 =A0 */ >> =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const = char *); >> }; >> >> =A0 =A0I'd need to develop new and free `methods', but that's the genera= l >> idea I had in mind. >> Thanks! >> -Garrett >> >