Date: Sat, 24 Apr 2004 14:52:08 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: tim@kientzle.com Cc: current@FreeBSD.org Subject: Re: Amanda, bsdtar, and libarchive Message-ID: <200404242152.i3OLq87E057024@gw.catspoiler.org> In-Reply-To: <408AC83D.6010404@kientzle.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 24 Apr, Tim Kientzle wrote: > A few people have commented about mixing Amanda and bsdtar. > Here's my take on some of the issues: > > Don Lewis wrote: >> On 23 Apr, Tim Kientzle wrote: >> >>>Hmmm... How accurate does [--totals] need to be? >> >> ... not terribly accurate. ... not so much [uncertainty] as to cause Amanda to >> underestimate the amount of tape needed, ... > > In particular, it sounds like a simple sum of file > sizes would actually be a useful proxy. It's very, very easy to scan > a directory heirarchy to collect such estimates. I have some (very > simple) code sitting around that does this if anyone wants to > investigate incorporating it into Amanda. > >> On the other hand, it does look bad if you archive to a file and the >> output of --totals doesn't match the archive file size. > > This is hard. Remember that libarchive supports a lot > more than just tar format, so any mechanism would have to > work correctly for formats as different as tar, cpio, shar, > and even zip or ar. With shar format, for example, you cannot > get an accurate archive size without reading the files, > because the final size varies depending on the actual > file content. > > Putting all of this knowledge into bsdtar is simply out > of the question. The whole point of the bsdtar/libarchive > distinction is that bsdtar doesn't know anything about > archive formats. > > If you want reasonably accurate estimates without reading the > file data, you could either use proxy data (feed the right amount > of random data to libarchive in place of the actual file > data), or build some sort of "size estimate" capability into > libarchive that would build and measure headers and allow > the format-specific logic to estimate what happens to the > file data. (Which is, admittedly, pretty simple for tar and cpio.) Those are the two that are probably the most important for getting quick estimates. There are three variations on the archive size estimate: Fast and exact Fast and approximate Slow and exact Only some formats would support fast and exact estimates (uncompressed tar and cpio). Some formats (uncompressed tar and cpio) will get the same results for the exact and approximate cases. Allow the user to specify whether he wants a slow or fast estimate rather than deciding based on whether or not the output is going to /dev/null. For the fast estimates I'd put a format specific file size estimater into libarchive. For each file in the archive, call the estimator function with the file name, file size, and (user specified?) estimated compression ratio. Add the returned values to any format-specific overall archive header and trailer sizes. For slow estimates, go through the motions of creating the archive, but toss the output.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200404242152.i3OLq87E057024>