From owner-freebsd-bugs@FreeBSD.ORG Sun May 22 03:30:11 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 825EE106566B for ; Sun, 22 May 2011 03:30:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5618F8FC15 for ; Sun, 22 May 2011 03:30:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4M3UBKf044669 for ; Sun, 22 May 2011 03:30:11 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4M3UBaf044667; Sun, 22 May 2011 03:30:11 GMT (envelope-from gnats) Resent-Date: Sun, 22 May 2011 03:30:11 GMT Resent-Message-Id: <201105220330.p4M3UBaf044667@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Gene Stark Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CFC08106566B for ; Sun, 22 May 2011 03:25:57 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id B6E548FC0A for ; Sun, 22 May 2011 03:25:57 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p4M3PvGL074101 for ; Sun, 22 May 2011 03:25:57 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id p4M3PvYD074100; Sun, 22 May 2011 03:25:57 GMT (envelope-from nobody) Message-Id: <201105220325.p4M3PvYD074100@red.freebsd.org> Date: Sun, 22 May 2011 03:25:57 GMT From: Gene Stark To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: bin/157244: dump/restore: unknown tape header type -230747966 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 May 2011 03:30:11 -0000 >Number: 157244 >Category: bin >Synopsis: dump/restore: unknown tape header type -230747966 >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun May 22 03:30:10 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Gene Stark >Release: 8.0-RELEASE >Organization: >Environment: FreeBSD home.starkeffect.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #10: Fri Jul 16 12:32:08 EDT 2010 root@home.starkeffect.com:/huge/src/sys/i386/compile/STARKHOME-SMP_8_0 i386 >Description: I made an 18 gig dump of a (gvinum) filesystem using the command: "dump 0f - /dev/gvinum/A > A.dump". There were no problems reported during the dump, and the volume fsck'ed clean beforehand. I newfs'ed the volume and attempted to restore via "restore rf - < /A.dump" and it failed with the error message: unknown tape header type -230747966. This was quite irritating, as I have grown to trust dump/restore over many years and due to the size I had already destroyed the original volume without trying to to read through the dump file with restore. I spent substantial time analyzing the dump to try to determine the failure mode. It turns out that header blocks actually occur out-of-order in the dump file, as indicated by comparing the actual offset of the block in the dump file with the spcl.c_tapea fields of the headers. Once the problem started (during a large file near the beginning of the dump), the difference between the actual offset (in units of TP_BSIZE) and the claimed offset in the spcl.c_tapea block was either -10, 0, or 20. That is, sometimes the header blocks came earlier than expected, sometimes they came on time, and sometimes they came later, and there were only a few possibilities. I wrote a program to read the dump records and reorder them so that the headers were emitted at their claimed offsets. This was done by queueing the headers and data blocks separately, emitting headers when they were due, and emitting data blocks otherwise. This program could then verify that the correct number of data blocks were present to match the information in the headers. However, when I pipe the reordered block stream into restore, there are still some issues. For one thing, there is no way to verify the order of the data blocks, and it seems that reordering might also have occurred on those. I have other copies of some of the large files that were in the dump, and I will attempt to determine how the data blocks have been reordered, but I haven't done that yet. I was at a loss to explain how this kind of reordering could have occurred, until I read some of the source to dump and saw that it is using multiple processes to write the dump file. I am running on a 2-core system (4 CPUs after hyperthreading). I strongly suspect a concurrency issue in the way the dump tape is written, otherwise I don't see how the header blocks could have been reordered in the way I observed. >How-To-Repeat: Although I have (unfortunately) already destroyed the original filesystem, I was able to repeat the behavior on another filesystem using the following command: home# dump 0f - /mail | restore rfN - DUMP: Date of this level 0 dump: Sat May 21 22:57:40 2011 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/gvinum/mail_new (/mail) to standard output DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 11291623 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] unknown tape header type 1781888358 abort? [yn] y dump core? [yn] n DUMP: Broken pipe DUMP: The ENTIRE dump is aborted. This problem really needs to be looked into, because it is a disaster to create an apparently successful dump with the idea of doing a simple filesystem volume rebuild and then find out that it fails in the restore. The reordering of the dump stream to put the header blocks back in their proper positions helps quite a bit, but I have not been able to recover my data at this time, because the data blocks are also apparently reordered. If there is a systematic mechanism to the reordering, I might still be able to recover, but if it is a concurrency/synchronization thing it might well be hopeless. >Fix: >Release-Note: >Audit-Trail: >Unformatted: