From owner-p4-projects@FreeBSD.ORG  Wed Apr 14 07:56:26 2010
Return-Path: <owner-p4-projects@FreeBSD.ORG>
Delivered-To: p4-projects@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 792751065676; Wed, 14 Apr 2010 07:56:26 +0000 (UTC)
Delivered-To: perforce@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3BF281065670;
	Wed, 14 Apr 2010 07:56:26 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id A9B2D8FC0A;
	Wed, 14 Apr 2010 07:56:25 +0000 (UTC)
Received: by gyh20 with SMTP id 20so4308425gyh.13
	for <multiple recipients>; Wed, 14 Apr 2010 00:56:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:received:in-reply-to
	:references:date:x-google-sender-auth:received:message-id:subject
	:from:to:cc:content-type:content-transfer-encoding;
	bh=QaEzGGRrJkhDuRoYAkoNBQzw7t3rzG0b81oSAP1RiRA=;
	b=RODBhp+X70zboqLTGZUVHBNNW5PmkMDJv+7P7sGTdL3nuoDeC8ZnZLNX6V4sbCR/oJ
	q+Uu0KiDDJVQWPOle55vquiigg7/4Ux10vdHrIrtqeNwop2vfh1G1E3aT8fLu7aj69VB
	R38RrWmi75Rh2vX9sP4ItlaNfhS+l6yhdSQOA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=KeOLlxnq38pJgADhK3P+X3wpGrJM5VPxy2Hwz+jtJBryLUN+7aU02rwtGvOklETCiJ
	L98pVOGDB7jKXSIvP+KVr2ZdIaVjFYnn983e27Oa1P5lupQm2EZGTXDjlxJkuJFXpAEN
	/iQYXPZr2NoOeA8PK9j8QvjpIndJNOyRHZh8A=
MIME-Version: 1.0
Sender: yanegomi@gmail.com
Received: by 10.231.183.17 with HTTP; Wed, 14 Apr 2010 00:56:22 -0700 (PDT)
In-Reply-To: <r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>
References: <201004121230.o3CCUsIX029146@repoman.freebsd.org>
	<4BC3EB5B.5070801@freebsd.org>
	<l2y364299f41004122249q2f5734a7j89be807581e42dac@mail.gmail.com>
	<4BC53714.80805@freebsd.org>
	<r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>
Date: Wed, 14 Apr 2010 00:56:22 -0700
X-Google-Sender-Auth: 9aca1ff85600f2b9
Received: by 10.101.183.23 with SMTP id k23mr12678219anp.160.1271231782739; 
	Wed, 14 Apr 2010 00:56:22 -0700 (PDT)
Message-ID: <g2v364299f41004140056pee1cde28laee26c8d5d1c9d19@mail.gmail.com>
From: Garrett Cooper <gcooper@FreeBSD.org>
To: Garrett Cooper <gcooper@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: David Forsythe <dforsyth@freebsd.org>, Florent Thoumie <flz@esat.net>,
	Tim Kientzle <kientzle@freebsd.org>,
	Perforce Change Reviews <perforce@freebsd.org>
Subject: Re: PERFORCE change 176831 for review
X-BeenThere: p4-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: p4 projects tree changes <p4-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/p4-projects>
List-Post: <mailto:p4-projects@freebsd.org>
List-Help: <mailto:p4-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2010 07:56:26 -0000

On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper <gcooper@freebsd.org> wrot=
e:
> On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle <kientzle@freebsd.org> wrot=
e:
>>>> ... I would also put in a simple verification
>>>> that the entry you found is a regular entry:
>>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG
>>>
>>> Ok. What does a regular entry correspond to (I assume not a regular fil=
e)?
>>
>> Sorry, "regular entry" =3D=3D "regular file" (or at least, what
>> will become a regular file once you write it to disk, if you
>> want to be really pedantic about it ;-). =A0Coincidentally,
>> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants
>> are there mostly portability shims (e.g., Windows doesn't
>> have block devices, so doesn't define S_IFBLK,
>> but AE_IFBLK is always available).
>
> Ok. I think I got everything you were trying to say before, but just
> to be clear:
>
> To prevent race conditions [with regular files only], I should use
> open(2). For all other files I would use the standard extract method
> (so I don't get too fancy?). It seems like the file creation times
> should be sufficiently fast with other file types that this kind of
> behavior of open(2), blah wouldn't be required.
>
>>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly int=
o
>>>>> exit
>>>>> + * when doing piped tar commands for copying hierarchies *hint*,
>>>>> *hint*.
>>>>
>>>> Why do you want to copy hierarchies?
>>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-)
>>>
>>> There's a fair amount of tar cpf - -C <dir_a> . | tar xpf - -C <dir_b>
>>> in libinstall . This is being done to preserve file attributes,
>>> fflags, modes, ownership, permissions, etc. ... Installing directly to
>>> hierarchies works in theory as well, but it's
>>> considerably more dangerous unless you do an information transfer
>>> between the pkg contents and the install base, and unfortunately that
>>> is tough to achieve with 100% clarity from what I've seen.
>>
>> That's exactly what I was getting to: =A0Someday, pkg_add
>> should write the files directly to their final location
>> and avoid the copy. =A0(You're exactly right that you
>> don't want to build a "copy tree" facility.)
>>
>> Clearly, direct install is not something you want to
>> tackle in this stage of the rewrite. =A0There's a lot of
>> gains from exactly what you're doing.
>
> Yeah, I've been thinking about what you said... and there's ultimately
> nothing to be gained through a staged install (in fact there's a lot
> lost). So unless two files overwrite one another (which suggest broken
> or conflicting packages, or not so clever users), people shouldn't be
> installing conflicting packages or partially installing multiple
> versions of the same package unless they're using -f, and if they're
> doing that, they're basically damning the consequences of doing so.
>
> Until we have functional tests in place to actually test this though,
> I need to stick with a conservative, functional analog to piped
> tar(1)'ing though.
>
>> But I really do believe that single-pass direct
>> install is feasible and is eventually where we want
>> to be. =A0The key insight for me was when Florent recently
>> pointed out that you could just read the +CONTENTS,
>> then do a verify pass, then extract everything.
>> (Any conflict during the extraction pass
>> would be a fatal error: =A0delete everything
>> extracted so far and scream loudly.)
>
> Agreed on the final point. I'm kind of interested though about the
> first point -- the verify pass -- are you verifying that the contents
> are sane on the disk or in the tarball? If so, why not just verify the
> contents after the fact [with mtree -- it's has checksumming, mode and
> permissions checking, you name it... everything minus some special
> facilities like ACLs, fflags, etc -- I guess I'll need to add those],
> then roll everything back if it isn't?
>
> One the entire operation is atomic and exclusive -- only one package
> operation is allowed to run at any given time on a particular package,
> running into this problem with the packaging software will be a
> non-issue, but instead a problem with an external source (user, etc),
> or a bad package.
>
>> Dealing with dependent packages is still
>> a little tricky but I think it's all doable.
>
> Yeah, there's a crapload of overlap between multiple packages, like
> cad-salome, and with some packages where you're need to build up a
> hierarchy that's common between several packages, it becomes a royal
> pain in the ass, but it's required otherwise stuff won't get setup and
> torn down properly.
>
> So I'm guessing for right now we should just avoid squealing loudly
> except with regular files; maybe whining a bit with non-regular files
> would be a good idea though because it would force people to clean up
> packages, and then things could be completely fixed later on so that
> they're hard errors.
>
>>> I saw that in the original comments and while I think that you have a
>>> point, I still am leery about someone specifying a package with
>>> contents that start with + because it immediately breaks the
>>> assumption. ...
>>
>> Unless I'm mistaken, pkg_add has used "+*" to identify
>> metadata files for a long time, so I think your own
>> argument from compatibility might be against you here.
>> (Though I suspect that you're right in the long term that
>> we should be obeying the +CONTENTS declarations instead of
>> using filename conventions.)
>
> I still want to avoid this potential corner case. Users have an
> incapability to surprise developers, and I don't want to have to deal

s/in/unerring /

> with this item in the future. A few less CPU cycles processing strings
> is worth sacrificing over trying to be clever and producing a broken
> program.
>
>>> unpack_file, blah with the appropriate metadata filenames considering
>>> that it's a lot quicker than having to go through the entire tarball
>>> end to end (especially if it's a large tarfile like openoffice),
>>> unless someone unfortunately put the files at the end of the tarball
>>> (in that case they do need to modify how the package is created).
>>
>> As for the speed argument: =A0Do you really need to
>> extract the metadata files before you extract
>> everything else? =A0Those files are going to a
>> different place, but that's not an issue with a
>> libarchive-driven extraction. =A0At one point, I had
>> convinced myself that you could do it all in one
>> pass, just writing the files to different places
>> depending on whether they were metadata files
>> (however identified) or not. =A0I recall, for example,
>> that the +MTREE file needs to be used only at the
>> end of the extraction (to fix-up permissions).
>
> Yes. pkg_info's primary uses (not -W) read the metadata if the
> metadata isn't on the disk already in $PKGDBDIR; it's an optimization,
> and I don't want to reduce performance in pkg_info.
>
> Other than that, pkg_install really doesn't care one iota about those
> files (minus +CONTENTS).
>
>>> Thanks again for the comments :)... it would be helpful BTW if the
>>> manpages linked together because archive_read(3) doesn't reference the
>>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the
>>> right APIs to use, and I feel like I'm catching more boots than fish
>>> sometimes... but it's a learning experience and I don't expect to get
>>> it right the first time.
>>
>> Please let me know any doc shortcomings; the documentation
>> could use a lot of work, I know. =A0I also have a growing
>> Examples page on the libarchive Wiki and conversations
>> like this help me a lot to better understand what needs
>> to go there.
>
> Will definitely help whenever I can :).
>
> Also, getting back to a topic you mentioned earlier, the issue with
> creating the decompressor multiple times can be alleviated like so:
>
> Make the basic extract API follow a format like so:
>
> struct package_archive;
>
> struct package_archive {
> =A0 =A0/*
> =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want
> to reuse the
> =A0 =A0 * descriptor, i.e. want to only use the initializer once.
> =A0 =A0 */
> =A0 =A0int *pkg_fd;
> =A0 =A0/*
> =A0 =A0 * Compare a package filename vs an expression.
> =A0 =A0 *
> =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob express=
ions),
> =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX),
> glob(3), or it could
> =A0 =A0 * always return 0 (for match all). Would return 0 if the files
> matched, non-zero
> =A0 =A0 * if they didn't match.
> =A0 =A0 *
> =A0 =A0 * This would help us avoid hardcoding and at the small cost of a =
stack
> =A0 =A0 * push, pop, the additional overhead would be fairly negligible I=
 would
> =A0 =A0 * think, and could potentially be optimized out using a smart com=
piler.
> =A0 =A0 */
> =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const c=
har *);
> };
>
> =A0 =A0I'd need to develop new and free `methods', but that's the general
> idea I had in mind.
> Thanks!
> -Garrett
>