From owner-freebsd-arch@FreeBSD.ORG  Wed Jan 14 17:23:46 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4F4F116A4CF; Wed, 14 Jan 2004 17:23:46 -0800 (PST)
Received: from kientzle.com (h-66-166-149-50.SNVACAID.covad.net
	[66.166.149.50])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0383943D70; Wed, 14 Jan 2004 17:23:43 -0800 (PST)
	(envelope-from kientzle@acm.org)
Received: from acm.org ([66.166.149.54])
	by kientzle.com (8.12.9/8.12.9) with ESMTP id i0F1NgkX076018;
	Wed, 14 Jan 2004 17:23:42 -0800 (PST)
	(envelope-from kientzle@acm.org)
Message-ID: <4005EB9D.50506@acm.org>
Date: Wed, 14 Jan 2004 17:23:41 -0800
From: Tim Kientzle <kientzle@acm.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Tim Robbins <tjr@freebsd.org>
References: <4004D445.7020205@acm.org>
	<20040114234829.GA19067@cat.robbins.dropbear.id.au>
In-Reply-To: <20040114234829.GA19067@cat.robbins.dropbear.id.au>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-arch@freebsd.org
Subject: Re: Request for Comments: libarchive, bsdtar
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: kientzle@acm.org
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Jan 2004 01:23:46 -0000

Tim Robbins wrote:
> On Tue, Jan 13, 2004 at 09:31:49PM -0800, Tim Kientzle wrote:
> 
>>Request for Comments:  libarchive, bsdtar
>>
>>Add "libarchive" to the tree, prepare to change the system
>>tar command to "bsdtar" once it is sufficiently stable.
> 
> [...]
> 
> Let me start by thanking you for working on replacing GNU utilities with
> higher quality and less restrictively licensed alternatives. I haven't
> had time to read over the code very thoroughly, but I have a few initial
> comments:

Thanks for the feedback.  A lot of people rely on 'tar',
so I want to make sure it's well-tested and does what
people really need before it becomes the default.  When
you do have time to look over the code, please let me
know what you think.

> - Padding gzip'd tar archives (with bsdtar czf) causes gzip to report
>   "trailing garbage" and fail, and in turn this causes GNU tar to fail.

Oddly, GNU tar does successfully and correctly extract the archive,
and then exits with an error code.  There's an easy one-line
patch that fixes this bug in GNU tar, by the way.  ;-)

>   BSD pax (-wzf) and GNU tar (czf) do not pad compressed archives.

The issue here is correct blocking for devices that require it.
(E.g., tape drives, floppies)  libarchive correctly blocks all
output, regardless of whether or not it is compressed.  Neither
GNU tar nor BSD pax gaurantee this.

It goes a bit deeper in the case of libarchive.  By design,
libarchive knows nothing about the archive storage.  This means
there is no simple way for it to vary it's operation depending
on whether it's writing to a file or character device, unlike
monolithic programs such as GNU tar or BSD pax.

I have some ideas about how to change this by generalizing the
blocking calculations within libarchive and providing some
client hooks for finer control over the blocking, but I haven't
decided whether or not it's worth the effort.

Somehow, though, I doubt you'll be the last person to complain about
this ;-), so I'll start looking for a good way to change this
behavior.

> - I would prefer it if compression was done by opening a pipe to gzip/bzip2
>   instead of using libz/libbz2. This would make things simpler, and make it
>   easier to support compress(1).

Not really simpler for the library, and definitely not simpler
for clients of the library.

This is related to the blocking issue I mentioned just above.
In order to correctly block the output, you need to collect the
output of the compression program and reblock it.  An early version
of libarchive did exactly this, forking a three-stage pipeline with
the compression/decompression program in the middle.  Unfortunately,
this created some odd problems, as the archive I/O then occurred
in a separate process from the rest of the program.  For example,
this made it difficult for clients to monitor the I/O status
from their mainline code, and hampered proper error reporting.

It also seemed inappropriate for a library to be invoking
client-provided callbacks in a different process.

However, each compression type is handled in a cleanly-factored
code module, and I do still have the code in my personal CVS repo to
fork out the pipeline.  I could resurrect this to fork compress(1)
if there's real demand.

> - I don't think the URL/libfetch support belongs in a library that deals
>   with archives. Perhaps the interface could be changed so that the
>   caller could pass a FILE * or a file descriptor instead of a filename.

The libfetch tie-in (archive_read_open_url) is provided purely
for the convenience of simple clients.  If you don't like it,
don't use it.  It is completely optional.  Generally, I've gone
to a great deal of effort to minimize link pollution.  For
example, if you don't call the functions that handle gzip/bzip2
compression, they won't be linked in and neither will libz/libbz2.
Similar comments apply to the various format support functions.
I've even carefully separated archive reading and writing
in case you only want to use one of them.

As for I/O interfaces, the core archive_read_open and
archive_write_open functions accept a collection of function
pointers that the library will invoke for open/read/write/close
operations on the archive.  This is considerably more
flexible than FILE * or file descriptors.

Not to mention that passing file descriptors has
some tricky implications if the library forks to run
archive I/O in a separate process.  FILE * is simply
a bad idea because the stdio interface doesn't provide client
control over blocking.  (Yes, the libfetch convenience
hooks do use FILE *, but blocking is unimportant
for sockets, so that's okay.)

> - Filenames are too long :-)

Take a typing class. ;-)