Date: Thu, 7 May 1998 02:32:23 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: tom@sdf.com (Tom) Cc: tlambert@primenet.com, beng@lcs.mit.edu, dec@phoenix.its.rpi.edu, freebsd-hackers@FreeBSD.ORG Subject: Re: Network problem with 2.2.6-STABLE Message-ID: <199805070232.TAA19518@usr01.primenet.com> In-Reply-To: <Pine.BSF.3.95q.980505223633.24411A-100000@misery.sdf.com> from "Tom" at May 5, 98 10:52:51 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> What? Was something about my message unclear? restore dies with a > "hole in map". You can also search the PR database for that phrase to > find an indentical report to the one I filed. You are assuming that the "hole in map" is there because dump put it there (which is unlikely), or because there was actually a hole in the map, faithfully copied (which is unlikely, but possible, if you partition didn't pass fsck prior to dump). Vs. because your tape is/went bad and/or has a bad electrical connection. So far we have the following possibilities, wchich I would like to eliminate one-by-one: o It may be the IDE disk (D) o It may be the IDE controller (D) o It may be the IDE controller driver (D) o It may be the raw disk device driver (D) o It may be the SCSI controller (T) o It may be the SCSI controller driver (T) o It may be the raw tape device driver (T) o It may be the admixture of an IDE controller that fails when used in combination with a SCSI controller (D) o It may be the tape drive's default block size (H) o It may be the tape drive's firmware (H) o It may be the media you are using (M) o It may be dump (S) o It may be restore (S) Below I detail some steps to tell whether or not it is in the path of (T) or whether or not it is in the path of (S). You need to take these steps before you can point at 2 of the thirteen possible failure spots and say with confidence "it's dump/restore". > The tape and drive are ok. I tar'ed the entire filesystem up, newfs the > filesystem, and untar the tape, and it works great (which I have done as a > test. Tar does not complain about bad tape blocks, because it can't consistency check them, having no check fields. It will happily write zeroed blocks into your files. Restore is more sensitive to the problem, because restore requires that the referential integrity of the files written to disk be intact. Did you do MD5 checksums before and after, and compare the results? > > What is the controller for the tape drive? > > 2940UW > > > Which driver is responsible for that controller? > > ahc With or without the CAM patches? > > What exact model of tape drive are you using? > > Quantum DLT 4000 > > > What exact brand of tapes are you using? > > Quantum DLT IV You are positive you are using the st/mt command to select a block size for this before starting the dump, right? DAT drives are notoriously finicky about default block size selection. > > What is the controller for the disk showing the problem? > > EIDE This doesn't tell me if it is a CMD640B chip, or an Intel chip, either of which can lose their minds if you take SCSI interrrupts while doing a data transfer. I can't rule out a controller failure without this information. You should fsck your disk a number of times in rapid succession and see if the cylinder group bitmaps are "corrupted". This can happen with IDE cables that are slightly out of spec. (generally: too long). > > What exact model of disk drive are you using? > > Maxtor DiamondMax 8.4GB > > > Are you overclocking your processor? > > No. > > You know what a much better test would be? I can do a dump, read the > first hundred megs or so with dd into a file, and send it to you. Since > "restore -t" reports the "hole in map" within seconds, it obviously hasn't > read very far into the tape yet, so doing a restore from a disk file > should have the same result. Or better, you could dump to a disk instead of to a tape, then also dump to a tape, and then do an MD5 checksum of the images and see if they match, in order to isolate it to "tape or software" vs. "disk or software". Also, a partial dump should exhibit the same problems, since "it obviously hasn't read very far into the tape yet". Which means you don't need to write very far into the tape to trigger the problem. Which means you can do the expriment with a disk image without the disk containing the image needing to be larger than the disk being dumped. You should also simply dump through MD5 to see if the MD5 checksum changes between dump attempts. If it does, the problem is in dump and/or the raw disk device driver. If it doesn't, the problem is in the tape or the restore. If the image restores without the panic, then the problem is in the tape driver, controller, drive, or media. I'm not being a hardass here. Software doesn't mutate, so the problem should be capable of being isolated. I'm just doing fault isolation via email, and it's not very efficient. It would help if I could repeat the problem locally, but it doesn't repeat locally for me on my 9G IBM drive (though I have to change volume sets 9 times to repeat on a > 4G file system, since I don't use DAT; this should not impact it, since I don't get buffer flushes or other code that should change the outcome). One possible discrepancy is that my IBM 9G drive is fast SCSI II, not EIDE. It may be an IDE driver problem. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805070232.TAA19518>