From owner-freebsd-hackers  Wed May  6 19:44:54 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id TAA19053
          for freebsd-hackers-outgoing; Wed, 6 May 1998 19:44:54 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp01.primenet.com (daemon@smtp01.primenet.com [206.165.6.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA19021
          for <freebsd-hackers@FreeBSD.ORG>; Wed, 6 May 1998 19:44:42 -0700 (PDT)
          (envelope-from tlambert@usr01.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.8.8/8.8.8) id TAA25596;
	Wed, 6 May 1998 19:44:42 -0700 (MST)
Received: from usr01.primenet.com(206.165.6.201)
 via SMTP by smtp01.primenet.com, id smtpd025547; Wed May  6 19:44:32 1998
Received: (from tlambert@localhost)
	by usr01.primenet.com (8.8.5/8.8.5) id TAA20312;
	Wed, 6 May 1998 19:44:32 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199805070244.TAA20312@usr01.primenet.com>
Subject: Re: Network problem with 2.2.6-STABLE
To: tom@sdf.com (Tom)
Date: Thu, 7 May 1998 02:44:31 +0000 (GMT)
Cc: tlambert@primenet.com, beng@lcs.mit.edu, freebsd-hackers@FreeBSD.ORG
In-Reply-To: <Pine.BSF.3.95q.980505225338.24411B-100000@misery.sdf.com> from "Tom" at May 5, 98 11:00:25 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > I think you need to read the man pages.
> 
>   In what man page is "hole in map" mentioned?

By class?  Search for the word "abort" in the restore man page.

By explicit reference?  In the 4.4BSD System Administrators Guide
under the detaild discurrion of dump/restore.


> > A hole in a map won't happen unless you have bad media.  If you
> > traceback the panic in the source code, you'll see that the problem
> > is a zero-valued block map.
> 
>   Question:  why does it segfault several seconds after producing the
> "abort?" prompt?

Because it is busy dumping core because of the explicit call to "abort"
because you did not set "yflag" using the "-y" command line option.

> It had already detected the "hole in map" problem?

Yes.  It won't call panic (a function in utilities.c) if it hasn't
panic'ed.

The reason it takes so long is that it has a very large data area.


One possible reason for this problem *could* ge your limits on the
account doing the restore (see login.conf).

If you went over your datasize or stacksize, and then didn't check
the validity of the operation (which it doesn't; restore predates
memory overcommit and it predates login.conf placing arbitrarily
small limits on working set), then the symptoms would be similar.


> Regardless this is still a bug, even assuming that the media is bad (it
> isn't, as I did a full backup and restore to the media with tar).

Tar proves nothing.  See other post.

> > You can ignore your damaged media using the "-y" option to restore:
> 
>   Doesn't do anything in this case.  A "restore -t -v -y" tells me that
> every file (at least those I bother to let it read) had CRC errors.  That
> isn't right, as I can use the same tape to do a full tar and untar with no
> problems.

Use --compare.  You may not be getting back what you think you are
getting.

If this works, you still haven't fairly eliminated everything that
could be the problem besides the dump/restore.  First, tar is very
stupid.  It doesn't do MD5 hashes or any really strong method of
determining identicality.

Second, the problem can be in the raw disk driver.  No one but dump
tends to use the raw disk driver, so you haven't proven a lot.

Third, the tar command had different access characteristics, so it
could still be a conflict between the EIDE and SCSI controllers.


> > 	-y      Do not ask the user whether to abort the restore in
> > 		the event of an error.  Always try to skip over the
> > 		bad block(s) and continue.
> > 
> > It is recommended that you, instead, fix the underlying problem.
> 
>   What is that?  The only thing you've said, is bad tape.  But it isn't.
> Next.

That isn't the only thing.  Read my other post.  I identified by line
item at least 13 things that could give the same symptoms, and a
14th in the text.  Only two of these things are "dump" or "restore",


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message