From owner-freebsd-current@FreeBSD.ORG Sat Apr 24 13:04:15 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 42D7716A4CE; Sat, 24 Apr 2004 13:04:15 -0700 (PDT) Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB32843D4C; Sat, 24 Apr 2004 13:04:14 -0700 (PDT) (envelope-from tim@kientzle.com) Received: from kientzle.com (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id i3OK4E90097749; Sat, 24 Apr 2004 13:04:14 -0700 (PDT) (envelope-from tim@kientzle.com) Message-ID: <408AC83D.6010404@kientzle.com> Date: Sat, 24 Apr 2004 13:04:13 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Don Lewis , richardcoleman@mindspring.com, current@FreeBSD.org References: <200404240743.i3O7hl7E053216@gw.catspoiler.org> In-Reply-To: <200404240743.i3O7hl7E053216@gw.catspoiler.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Amanda, bsdtar, and libarchive X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Apr 2004 20:04:15 -0000 A few people have commented about mixing Amanda and bsdtar. Here's my take on some of the issues: Don Lewis wrote: > On 23 Apr, Tim Kientzle wrote: > >>Hmmm... How accurate does [--totals] need to be? > > ... not terribly accurate. ... not so much [uncertainty] as to cause Amanda to > underestimate the amount of tape needed, ... In particular, it sounds like a simple sum of file sizes would actually be a useful proxy. It's very, very easy to scan a directory heirarchy to collect such estimates. I have some (very simple) code sitting around that does this if anyone wants to investigate incorporating it into Amanda. > On the other hand, it does look bad if you archive to a file and the > output of --totals doesn't match the archive file size. This is hard. Remember that libarchive supports a lot more than just tar format, so any mechanism would have to work correctly for formats as different as tar, cpio, shar, and even zip or ar. With shar format, for example, you cannot get an accurate archive size without reading the files, because the final size varies depending on the actual file content. Putting all of this knowledge into bsdtar is simply out of the question. The whole point of the bsdtar/libarchive distinction is that bsdtar doesn't know anything about archive formats. If you want reasonably accurate estimates without reading the file data, you could either use proxy data (feed the right amount of random data to libarchive in place of the actual file data), or build some sort of "size estimate" capability into libarchive that would build and measure headers and allow the format-specific logic to estimate what happens to the file data. (Which is, admittedly, pretty simple for tar and cpio.) Richard Coleman suggested: > ... a version of Amanda that natively uses libarchive ... Now *that's* a worthwhile idea. (And it is, after all, the whole point of libarchive, that programs can just use it to build/extract archives directly without going through a separate program.) Richard Coleman also observed: > Until libarchive gets support for sparse files, it's probably better to stick with gtar or rdump with Amanda. A very good point. Although I've been studying sparse file issues, and even gtar doesn't entirely do the "right thing," partly because FreeBSD doesn't provide any way to query the layout of a sparse file. At best, gtar can guess, but that requires scanning the entire file twice to identify large blocks of zeros, which is a performance problem for large files. Also, gtar's sparse file storage doesn't really scale well to very large numbers of holes. Joerg Schilling (author of "star") and I have traded some ideas about approaches that might scale to petabyte files with millions of holes, but nothing concrete enough to actually implement yet. If you actually have large sparse database files, I strongly suggest that you: 1) flag the database files themselves as "nodump" and use a backup program that will honor that flag, which includes bsdtar, star, and rdump all do. 2) Use a database-specific tool to dump the database to one or more non-sparse files that will get picked up by the backup program. This approach also allows you to run backups while the database is running, as the database dumps themselves aren't changing during the backup. Backing up database storage while the database is running is a very good way to create completely useless backups. Tim