From owner-freebsd-bugs@FreeBSD.ORG Sun May 22 13:20:10 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 364071065670 for ; Sun, 22 May 2011 13:20:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1B4E68FC0C for ; Sun, 22 May 2011 13:20:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4MDK9dO020126 for ; Sun, 22 May 2011 13:20:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4MDK9lO020125; Sun, 22 May 2011 13:20:09 GMT (envelope-from gnats) Date: Sun, 22 May 2011 13:20:09 GMT Message-Id: <201105221320.p4MDK9lO020125@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Gene Stark Cc: Subject: Re: bin/157244: dump/restore: unknown tape header type -230747966 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Gene Stark List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 May 2011 13:20:10 -0000 The following reply was made to PR bin/157244; it has been noted by GNATS. From: Gene Stark To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org Cc: Subject: Re: bin/157244: dump/restore: unknown tape header type -230747966 Date: Sun, 22 May 2011 08:33:23 -0400 I wrote a program to compare the blocks in another copy of one of the large files in the dump with the version extracted from restore after applying my header reordering program. The program read each of the files in blocks of TP_BSIZE bytes, computed the SHA1 hash of each block, stored the resulting pairs in a hash map for each file, unioned the key sets of the two hash maps to obtain a single master list of block hashes, traversed the master key set to construct a map > that gave the correspondence between the blocks in the two files, and printed out the contents of that map in increasing order of offset, showing the differences between the two files. Here is the initial part of the result: Lectures.zip.bad: 52469795 bytes Lectures.zip.good: 52469795 bytes 11612 11622 10 11613 11623 10 11614 11624 10 11615 11625 10 11616 11626 10 11617 11627 10 11618 11628 10 11619 11629 10 11620 11630 10 11621 11631 10 11622 11632 10 11623 11633 10 11624 11634 10 11625 11635 10 11626 11636 10 11627 11637 10 11628 11638 10 11629 11639 10 11630 11640 10 11631 11641 10 11632 11612 -20 11633 11613 -20 11634 11614 -20 11635 11615 -20 11636 11616 -20 11637 11617 -20 11638 11618 -20 11639 11619 -20 11640 11620 -20 11641 11621 -20 11642 11652 10 11643 11653 10 11644 11654 10 11645 11655 10 11646 11656 10 11647 11657 10 11648 11658 10 11649 11659 10 11650 11660 10 11651 11661 10 11652 11662 10 11653 11663 10 11654 11664 10 11655 11665 10 11656 11666 10 11657 11667 10 11658 11668 10 11659 11669 10 11660 11670 10 11661 11671 10 11662 11642 -20 11663 11643 -20 11664 11644 -20 11665 11645 -20 11666 11646 -20 11667 11647 -20 11668 11648 -20 11669 11649 -20 11670 11650 -20 11671 11651 -20 11672 11682 10 11673 11683 10 The pattern repeats this way for *almost* the entire file. There are sets of 20 blocks that occur 10 blocks ahead of the corresponding blocks in the other file, and then a set of 10 blocks that occur 20 blocks behind the corresponding blocks in the other file. There are occasional values of 9 and 19 for the differences, which I don't have a ready explanation for, except that my header reordering relied on the magic number to identify the header blocks and it is possible there were a few blocks that were misidentified as headers that were actually data blocks. At the end of the files there are a few blocks that do not correspond; these are probably due to alignment at the end which caused some of the last data blocks to be used as the first blocks for the next file in the dump. To test my suspicion that it is a concurrency issue in dump, I recompiled dump after setting #define SLAVES 1 in tape.c (rather than the value 3 it had before). I then was able to complete two rounds of "dump 0f - /mail | restore rfN -" without any errors, whereas if I use /sbin/dump it fails out very quickly as indicated in the original PR. I am not familiar with the locking features, etc. being used in dump, so I don't know if I will be able to go farther than this with a reasonable expenditure of time. However, I strongly suggest that the "concurrency modifications" in dump be turned off (perhaps by setting SLAVES to 1 as I did) until somebody can get to the bottom of this. If this is happening to me, then I suspect there are *massive* numbers of bad dumps out there that people think are actually good. It will really be a rude awakening when people try to read them back. Since the data blocks don't contain any tape address information in them, it is not possible to recover.