Date: Wed, 14 Jan 2004 17:23:41 -0800 From: Tim Kientzle <kientzle@acm.org> To: Tim Robbins <tjr@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: Request for Comments: libarchive, bsdtar Message-ID: <4005EB9D.50506@acm.org> In-Reply-To: <20040114234829.GA19067@cat.robbins.dropbear.id.au> References: <4004D445.7020205@acm.org> <20040114234829.GA19067@cat.robbins.dropbear.id.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Tim Robbins wrote: > On Tue, Jan 13, 2004 at 09:31:49PM -0800, Tim Kientzle wrote: > >>Request for Comments: libarchive, bsdtar >> >>Add "libarchive" to the tree, prepare to change the system >>tar command to "bsdtar" once it is sufficiently stable. > > [...] > > Let me start by thanking you for working on replacing GNU utilities with > higher quality and less restrictively licensed alternatives. I haven't > had time to read over the code very thoroughly, but I have a few initial > comments: Thanks for the feedback. A lot of people rely on 'tar', so I want to make sure it's well-tested and does what people really need before it becomes the default. When you do have time to look over the code, please let me know what you think. > - Padding gzip'd tar archives (with bsdtar czf) causes gzip to report > "trailing garbage" and fail, and in turn this causes GNU tar to fail. Oddly, GNU tar does successfully and correctly extract the archive, and then exits with an error code. There's an easy one-line patch that fixes this bug in GNU tar, by the way. ;-) > BSD pax (-wzf) and GNU tar (czf) do not pad compressed archives. The issue here is correct blocking for devices that require it. (E.g., tape drives, floppies) libarchive correctly blocks all output, regardless of whether or not it is compressed. Neither GNU tar nor BSD pax gaurantee this. It goes a bit deeper in the case of libarchive. By design, libarchive knows nothing about the archive storage. This means there is no simple way for it to vary it's operation depending on whether it's writing to a file or character device, unlike monolithic programs such as GNU tar or BSD pax. I have some ideas about how to change this by generalizing the blocking calculations within libarchive and providing some client hooks for finer control over the blocking, but I haven't decided whether or not it's worth the effort. Somehow, though, I doubt you'll be the last person to complain about this ;-), so I'll start looking for a good way to change this behavior. > - I would prefer it if compression was done by opening a pipe to gzip/bzip2 > instead of using libz/libbz2. This would make things simpler, and make it > easier to support compress(1). Not really simpler for the library, and definitely not simpler for clients of the library. This is related to the blocking issue I mentioned just above. In order to correctly block the output, you need to collect the output of the compression program and reblock it. An early version of libarchive did exactly this, forking a three-stage pipeline with the compression/decompression program in the middle. Unfortunately, this created some odd problems, as the archive I/O then occurred in a separate process from the rest of the program. For example, this made it difficult for clients to monitor the I/O status from their mainline code, and hampered proper error reporting. It also seemed inappropriate for a library to be invoking client-provided callbacks in a different process. However, each compression type is handled in a cleanly-factored code module, and I do still have the code in my personal CVS repo to fork out the pipeline. I could resurrect this to fork compress(1) if there's real demand. > - I don't think the URL/libfetch support belongs in a library that deals > with archives. Perhaps the interface could be changed so that the > caller could pass a FILE * or a file descriptor instead of a filename. The libfetch tie-in (archive_read_open_url) is provided purely for the convenience of simple clients. If you don't like it, don't use it. It is completely optional. Generally, I've gone to a great deal of effort to minimize link pollution. For example, if you don't call the functions that handle gzip/bzip2 compression, they won't be linked in and neither will libz/libbz2. Similar comments apply to the various format support functions. I've even carefully separated archive reading and writing in case you only want to use one of them. As for I/O interfaces, the core archive_read_open and archive_write_open functions accept a collection of function pointers that the library will invoke for open/read/write/close operations on the archive. This is considerably more flexible than FILE * or file descriptors. Not to mention that passing file descriptors has some tricky implications if the library forks to run archive I/O in a separate process. FILE * is simply a bad idea because the stdio interface doesn't provide client control over blocking. (Yes, the libfetch convenience hooks do use FILE *, but blocking is unimportant for sockets, so that's okay.) > - Filenames are too long :-) Take a typing class. ;-)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4005EB9D.50506>