From owner-freebsd-current  Mon Sep 16 20:17:36 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7225C37B400; Mon, 16 Sep 2002 20:17:34 -0700 (PDT)
Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 13A1643E72; Mon, 16 Sep 2002 20:17:34 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0049.cvx22-bradley.dialup.earthlink.net ([209.179.198.49] helo=mindspring.com)
	by gull.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 17r8re-0004dr-00; Mon, 16 Sep 2002 20:17:26 -0700
Message-ID: <3D869B5F.C48D7F10@mindspring.com>
Date: Mon, 16 Sep 2002 20:02:55 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Martin Blapp <mb@imp.ch>
Cc: current@freebsd.org, Michael Reifenberger <root@nihil.plaut.de>,
	Peter Wemm <peter@wemm.org>, julian@freebsd.org
Subject: Re: filesystem corruption ?
References: <20020917021615.D3162-100000@levais.imp.ch>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Martin Blapp wrote:
> It looks more and more to me that pmap does something wrong.
> I get pmap related vm crashes or corruption, relating in
> filesystem corruption.
> 
> I had about 3-4 different panics. Mozilla build tends to prefer
> "panic: bad link count", openoffice prefers page faults in ffs code ;)
> 
> But these options here are enabled:
> 
> options        DISABLE_PSE
> options        DISABLE_PG_G
> 
> Are there any options I could also add and try ? Page coloring etc ?
> 
> You remember, I had SIG4 and SIG11 over and over until
> I used these options. The builds run fine by then,
> I had no panics at all anymore. I was happy.

[ ... ]

There have been a number of panics reported recently on -current,
which seem to be repeatable on ATA drives, but not on SCSI drives.

A casual perusal of the code seems to indicate that a soft failure
in the paging path will be treated as a hard failure in the ATA
case, instead of retried, but I haven't confirmed this by writing
the code to tape and running the tape between my fillings (i.e. I
have not intensly scrutinized the code, it just looks like it is
that way).

UnixWare had a similar bug in th ATA retry code, only it's FS,
when a soft error was treated as hard, would start marking the
sectors gone, so maybe you should consider yourself lucky... ;^).

Last I heard,the ATA maintainers were looking into reproducing
the problem, with no luck yet.  Check the -current archives, and
volunteer to be a guinea pig (that's my best suggestion at present,
unless you can retry with a SCSI drive instead, and see if the
problems disappear).

You might also try playing with the ATA DMA and tag options (see
NOTES/LINT for what you can turn off that way), and the ATA sysctl's
(most are required to be done at boot time, but a "sysctl -a | grep ata"
will list them out for you to try manually in the boot loader).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message