From owner-p4-projects@FreeBSD.ORG  Wed Apr 14 08:00:20 2010
Return-Path: <owner-p4-projects@FreeBSD.ORG>
Delivered-To: p4-projects@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 2386D1065678; Wed, 14 Apr 2010 08:00:20 +0000 (UTC)
Delivered-To: perforce@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C2CD91065673;
	Wed, 14 Apr 2010 08:00:19 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-iw0-f171.google.com (mail-iw0-f171.google.com
	[209.85.223.171])
	by mx1.freebsd.org (Postfix) with ESMTP id 5597C8FC1F;
	Wed, 14 Apr 2010 08:00:19 +0000 (UTC)
Received: by iwn1 with SMTP id 1so3168131iwn.27
	for <multiple recipients>; Wed, 14 Apr 2010 01:00:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:received:in-reply-to
	:references:date:x-google-sender-auth:received:message-id:subject
	:from:to:cc:content-type:content-transfer-encoding;
	bh=u+BErZw6CTxCYLU3v4bff5Q4EwNLg0pqtNLpCOuibTM=;
	b=bTtvmkKC5Z7DX3FUcufv035o9WzQnHlfDX3am+jhhjv2WpIxbTtxkBrZtturOHNa+f
	EAwSLhP4VRAFiFlooooRJP8N1HZ7FiBq7Qpta2tht4NbDhbgKEKCni4hmiS276oQ2dmm
	UKHK+C15wAZRPWjj34hNUkxcjy7qZ4F2Dk1LE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=EEHfZVrFH6eYomiw5STTdeJZjofEhgSzTr+Sdwob7XUmKVkv1sbnDFwo3jEcr49o1p
	t7d28bKVqWxh8HJpDKNiIHkc9jOLo39QQIiIeMIZTEiLxs9W+cs33q2MeRHy5W9cbz7U
	LPrNX0Ugz3b8nz23VeSgdJmY+2XGBqyN1+uhQ=
MIME-Version: 1.0
Sender: yanegomi@gmail.com
Received: by 10.231.183.17 with HTTP; Wed, 14 Apr 2010 01:00:18 -0700 (PDT)
In-Reply-To: <g2v364299f41004140056pee1cde28laee26c8d5d1c9d19@mail.gmail.com>
References: <201004121230.o3CCUsIX029146@repoman.freebsd.org>
	<4BC3EB5B.5070801@freebsd.org>
	<l2y364299f41004122249q2f5734a7j89be807581e42dac@mail.gmail.com>
	<4BC53714.80805@freebsd.org>
	<r2x364299f41004140053y7b43d454qb7804e111d1920a6@mail.gmail.com>
	<g2v364299f41004140056pee1cde28laee26c8d5d1c9d19@mail.gmail.com>
Date: Wed, 14 Apr 2010 01:00:18 -0700
X-Google-Sender-Auth: 096e76fd92cc7a42
Received: by 10.231.151.129 with SMTP id c1mr3173584ibw.59.1271232018444; Wed, 
	14 Apr 2010 01:00:18 -0700 (PDT)
Message-ID: <v2j364299f41004140100gdd528438j17ef6ae6d2962cae@mail.gmail.com>
From: Garrett Cooper <gcooper@FreeBSD.org>
To: Garrett Cooper <gcooper@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: David Forsythe <dforsyth@freebsd.org>, Florent Thoumie <flz@esat.net>,
	Tim Kientzle <kientzle@freebsd.org>,
	Perforce Change Reviews <perforce@freebsd.org>
Subject: Re: PERFORCE change 176831 for review
X-BeenThere: p4-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: p4 projects tree changes <p4-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/p4-projects>
List-Post: <mailto:p4-projects@freebsd.org>
List-Help: <mailto:p4-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/p4-projects>,
	<mailto:p4-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2010 08:00:20 -0000

And yet one more clerical error after I reread this...

On Wed, Apr 14, 2010 at 12:56 AM, Garrett Cooper <gcooper@freebsd.org> wrot=
e:
> On Wed, Apr 14, 2010 at 12:53 AM, Garrett Cooper <gcooper@freebsd.org> wr=
ote:
>> On Tue, Apr 13, 2010 at 8:31 PM, Tim Kientzle <kientzle@freebsd.org> wro=
te:
>>>>> ... I would also put in a simple verification
>>>>> that the entry you found is a regular entry:
>>>>> =A0archive_entry_filetype(entry) =3D=3D AE_IFREG
>>>>
>>>> Ok. What does a regular entry correspond to (I assume not a regular fi=
le)?
>>>
>>> Sorry, "regular entry" =3D=3D "regular file" (or at least, what
>>> will become a regular file once you write it to disk, if you
>>> want to be really pedantic about it ;-). =A0Coincidentally,
>>> AE_IFREG =3D=3D S_IFREG. =A0The separate AE_IFxxx constants
>>> are there mostly portability shims (e.g., Windows doesn't
>>> have block devices, so doesn't define S_IFBLK,
>>> but AE_IFBLK is always available).
>>
>> Ok. I think I got everything you were trying to say before, but just
>> to be clear:
>>
>> To prevent race conditions [with regular files only], I should use
>> open(2). For all other files I would use the standard extract method
>> (so I don't get too fancy?). It seems like the file creation times
>> should be sufficiently fast with other file types that this kind of
>> behavior of open(2), blah wouldn't be required.
>>
>>>>>> + * NOTE: the exit code is 0 / 1 so that this can be fed directly in=
to
>>>>>> exit
>>>>>> + * when doing piped tar commands for copying hierarchies *hint*,
>>>>>> *hint*.
>>>>>
>>>>> Why do you want to copy hierarchies?
>>>>> Seems a waste of disk bandwidth. =A0*hint* *hint* ;-)
>>>>
>>>> There's a fair amount of tar cpf - -C <dir_a> . | tar xpf - -C <dir_b>
>>>> in libinstall . This is being done to preserve file attributes,
>>>> fflags, modes, ownership, permissions, etc. ... Installing directly to
>>>> hierarchies works in theory as well, but it's
>>>> considerably more dangerous unless you do an information transfer
>>>> between the pkg contents and the install base, and unfortunately that
>>>> is tough to achieve with 100% clarity from what I've seen.
>>>
>>> That's exactly what I was getting to: =A0Someday, pkg_add
>>> should write the files directly to their final location
>>> and avoid the copy. =A0(You're exactly right that you
>>> don't want to build a "copy tree" facility.)
>>>
>>> Clearly, direct install is not something you want to
>>> tackle in this stage of the rewrite. =A0There's a lot of
>>> gains from exactly what you're doing.
>>
>> Yeah, I've been thinking about what you said... and there's ultimately
>> nothing to be gained through a staged install (in fact there's a lot
>> lost). So unless two files overwrite one another (which suggest broken
>> or conflicting packages, or not so clever users), people shouldn't be
>> installing conflicting packages or partially installing multiple
>> versions of the same package unless they're using -f, and if they're
>> doing that, they're basically damning the consequences of doing so.
>>
>> Until we have functional tests in place to actually test this though,
>> I need to stick with a conservative, functional analog to piped
>> tar(1)'ing though.
>>
>>> But I really do believe that single-pass direct
>>> install is feasible and is eventually where we want
>>> to be. =A0The key insight for me was when Florent recently
>>> pointed out that you could just read the +CONTENTS,
>>> then do a verify pass, then extract everything.
>>> (Any conflict during the extraction pass
>>> would be a fatal error: =A0delete everything
>>> extracted so far and scream loudly.)
>>
>> Agreed on the final point. I'm kind of interested though about the
>> first point -- the verify pass -- are you verifying that the contents
>> are sane on the disk or in the tarball? If so, why not just verify the
>> contents after the fact [with mtree -- it's has checksumming, mode and
>> permissions checking, you name it... everything minus some special
>> facilities like ACLs, fflags, etc -- I guess I'll need to add those],
>> then roll everything back if it isn't?
>>
>> One the entire operation is atomic and exclusive -- only one package
>> operation is allowed to run at any given time on a particular package,
>> running into this problem with the packaging software will be a
>> non-issue, but instead a problem with an external source (user, etc),
>> or a bad package.
>>
>>> Dealing with dependent packages is still
>>> a little tricky but I think it's all doable.
>>
>> Yeah, there's a crapload of overlap between multiple packages, like
>> cad-salome, and with some packages where you're need to build up a
>> hierarchy that's common between several packages, it becomes a royal
>> pain in the ass, but it's required otherwise stuff won't get setup and
>> torn down properly.
>>
>> So I'm guessing for right now we should just avoid squealing loudly
>> except with regular files; maybe whining a bit with non-regular files
>> would be a good idea though because it would force people to clean up
>> packages, and then things could be completely fixed later on so that
>> they're hard errors.
>>
>>>> I saw that in the original comments and while I think that you have a
>>>> point, I still am leery about someone specifying a package with
>>>> contents that start with + because it immediately breaks the
>>>> assumption. ...
>>>
>>> Unless I'm mistaken, pkg_add has used "+*" to identify
>>> metadata files for a long time, so I think your own
>>> argument from compatibility might be against you here.
>>> (Though I suspect that you're right in the long term that
>>> we should be obeying the +CONTENTS declarations instead of
>>> using filename conventions.)
>>
>> I still want to avoid this potential corner case. Users have an
>> incapability to surprise developers, and I don't want to have to deal
>
> s/in/unerring /
>
>> with this item in the future. A few less CPU cycles processing strings
>> is worth sacrificing over trying to be clever and producing a broken
>> program.
>>
>>>> unpack_file, blah with the appropriate metadata filenames considering
>>>> that it's a lot quicker than having to go through the entire tarball
>>>> end to end (especially if it's a large tarfile like openoffice),
>>>> unless someone unfortunately put the files at the end of the tarball
>>>> (in that case they do need to modify how the package is created).
>>>
>>> As for the speed argument: =A0Do you really need to
>>> extract the metadata files before you extract
>>> everything else? =A0Those files are going to a
>>> different place, but that's not an issue with a
>>> libarchive-driven extraction. =A0At one point, I had
>>> convinced myself that you could do it all in one
>>> pass, just writing the files to different places
>>> depending on whether they were metadata files
>>> (however identified) or not. =A0I recall, for example,
>>> that the +MTREE file needs to be used only at the
>>> end of the extraction (to fix-up permissions).
>>
>> Yes. pkg_info's primary uses (not -W) read the metadata if the
>> metadata isn't on the disk already in $PKGDBDIR; it's an optimization,
>> and I don't want to reduce performance in pkg_info.
>>
>> Other than that, pkg_install really doesn't care one iota about those
>> files (minus +CONTENTS).

+CONTENTS, +REQUIRE, and +REQUIRED_BY; the other files are used at
various times during the entire lifetime of installing a package
(+INSTALL, +MTREE_DIRS, +POST-DEINSTALL, etc) or inspecting a package
(+COMMENTS, +DESC, etc), but they're all fairly late in the process
and not nearly as critical as +CONTENTS. If +CONTENTS is gone, game
over... you're just dealing with a tarball.

>>>> Thanks again for the comments :)... it would be helpful BTW if the
>>>> manpages linked together because archive_read(3) doesn't reference the
>>>> archive_read_disk(3) APIs, etc, so I kind of have to fish for the
>>>> right APIs to use, and I feel like I'm catching more boots than fish
>>>> sometimes... but it's a learning experience and I don't expect to get
>>>> it right the first time.
>>>
>>> Please let me know any doc shortcomings; the documentation
>>> could use a lot of work, I know. =A0I also have a growing
>>> Examples page on the libarchive Wiki and conversations
>>> like this help me a lot to better understand what needs
>>> to go there.
>>
>> Will definitely help whenever I can :).
>>
>> Also, getting back to a topic you mentioned earlier, the issue with
>> creating the decompressor multiple times can be alleviated like so:
>>
>> Make the basic extract API follow a format like so:
>>
>> struct package_archive;
>>
>> struct package_archive {
>> =A0 =A0/*
>> =A0 =A0 * File descriptor for the archive. Set to NULL if you don't want
>> to reuse the
>> =A0 =A0 * descriptor, i.e. want to only use the initializer once.
>> =A0 =A0 */
>> =A0 =A0int *pkg_fd;
>> =A0 =A0/*
>> =A0 =A0 * Compare a package filename vs an expression.
>> =A0 =A0 *
>> =A0 =A0 * Could be a small wrapper above fnmatch (for simple glob expres=
sions),
>> =A0 =A0 * strncmp (for exact matches -- would be set to PATH_MAX),
>> glob(3), or it could
>> =A0 =A0 * always return 0 (for match all). Would return 0 if the files
>> matched, non-zero
>> =A0 =A0 * if they didn't match.
>> =A0 =A0 *
>> =A0 =A0 * This would help us avoid hardcoding and at the small cost of a=
 stack
>> =A0 =A0 * push, pop, the additional overhead would be fairly negligible =
I would
>> =A0 =A0 * think, and could potentially be optimized out using a smart co=
mpiler.
>> =A0 =A0 */
>> =A0 =A0int (*pkg_file_comparator) (const struct *package_archive, const =
char *);
>> };
>>
>> =A0 =A0I'd need to develop new and free `methods', but that's the genera=
l
>> idea I had in mind.
>> Thanks!
>> -Garrett
>>
>