From owner-freebsd-arch@FreeBSD.ORG Wed Jul 11 04:47:15 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 841B8106564A for ; Wed, 11 Jul 2012 04:47:15 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from monday.kientzle.com (99-115-135-74.uvs.sntcca.sbcglobal.net [99.115.135.74]) by mx1.freebsd.org (Postfix) with ESMTP id 5FF3A8FC08 for ; Wed, 11 Jul 2012 04:47:15 +0000 (UTC) Received: (from root@localhost) by monday.kientzle.com (8.14.4/8.14.4) id q6B4lD1W033032; Wed, 11 Jul 2012 04:47:13 GMT (envelope-from tim@kientzle.com) Received: from [192.168.2.143] (CiscoE3000 [192.168.1.65]) by kientzle.com with SMTP id 8f3qq7yj7ic4cq4hcby3rdra9e; Wed, 11 Jul 2012 04:47:13 +0000 (UTC) (envelope-from tim@kientzle.com) Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=iso-8859-1 From: Tim Kientzle In-Reply-To: Date: Tue, 10 Jul 2012 21:47:13 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <25149679-6B99-4FF0-AB8C-90D5A7880F00@kientzle.com> References: To: Ryan Stone X-Mailer: Apple Mail (2.1278) Cc: freebsd-arch@freebsd.org Subject: Re: Generating a tarball directly from make installworld X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2012 04:47:15 -0000 On Jul 10, 2012, at 8:17 PM, Ryan Stone wrote: >=20 > The other problem that I have is performance. When bsdtar appends to > a tar file, it iterates over every entry in the tar to figure out > where the end of it is. I gather that this is to get rid of padding > but I'm not entirely sure. It is unfortunately necessary if you want to append to an existing file. In short: The tar format wasn't designed for appending, and the append option of the standard tar command is a 30-year old hack. But, I wrote a better approach into libarchive and bsdtar a few years ago. See below. > Even if this isn't necessary I still have > to iterate over the entire file in most cases. The problem is in the > sloppy semantics of ln and install: install foo bar means "install foo > to path bar/foo" if bar is a directory, but "install foo to path bar" > if bar is a regular file or it doesn't exist(symlinks add an extra > layer of complexity). In order to implement this correctly, I have to > iterate over the tar to figure out what type of file bar is, every > time that install or ln is invoked.=20 The sloppy semantics are indeed a problem and I hadn't considered this before. I fear the only answer might be to fix the Makefiles so they don't rely on this (fortunately, most of the install and ln invocations are built from just a few places, so it might not be necessary to change very many places to fix it). > I know that a lot of people have suggested generating an mtree file > and then converting the mtree file into a tarball, but I admit that > it's not at all clear to me how to generate the mtree file. I can definitely help with this, since I had this exact use in mind when I originally built that part of libarchive. First, there are actually two different variants of mtree format. The one supported by FreeBSD's mtree is the older one. It's very pretty with all that indentation but not particularly amenable to this kind of task. The interesting one is a newer variant supported by NetBSD's mtree and also supported by libarchive. In the newer mtree variant, each line is completely self-contained, e.g., /bin/ls group=3Dwheel user=3Droot mode=3D0755 Such files can be easily combined (just append them together), can be appended to via "echo spec >>file", etc. Libarchive extends this further by adding a "contents" keyword, e.g., /bin/ls user=3Droot group=3Dwheel mode=3D0755 = contents=3D/usr/obj/usr/src/bin/ls/ls When libarchive reads this line, it returns a file description that has: * The specified name * The specified properties * The specified contents * (other properties --- including file size --- are taken from the = contents file) So, my idea was that 'install' or 'ln' could write a line like the above to /usr/obj/usr/src/bin/ls/ls.dist-mtree and at the end you could pull all those together and build a tar ball in a single fast pass: find . -name '*.dist-mtree' | xargs cat | bsdtar czf distfile.tgz @- I wrote "man 5 mtree" to attempt to develop a single consistent description of both mtree variants. Ignore the mention of a signature; that was misguided wishful thinking on my part as I wrestled with how to teach libarchive to automatically recognize mtree files. Michihiro fortunately figured out a better way to do that. The '@-' here is a bsdtar extension that reads an archive and appends the entries from that archive to the archive being created. For even more fun, you can install directly from the mtree descriptions: find . -name '*.dist-mtree' | xargs cat | bsdtar xf - Another nice trick with this extended mtree format: it's relatively easy to use tools like grep and see to filter the mtree description, so you could play with having makewhatis or kldxref read from an archive and then let libarchive unpack directly from mtree for you, e.g., find . -name '*.dist-mtree' | xargs cat | grep '/man/' | makewhatis = --read-from-stdin-archive Let me know if I can help. As I said, I had this exact application in mind when I built this support into libarchive and bsdtar. If there are additional tweaks that would help, I'll see what I can do. Tim