From owner-freebsd-stable@FreeBSD.ORG  Tue Sep  6 15:04:44 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C76B51065674
	for <freebsd-stable@freebsd.org>; Tue,  6 Sep 2011 15:04:44 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F5F98FC1C
	for <freebsd-stable@freebsd.org>; Tue,  6 Sep 2011 15:04:44 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 5661146B37;
	Tue,  6 Sep 2011 11:04:44 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id DB17C8A02E;
	Tue,  6 Sep 2011 11:04:43 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-stable@freebsd.org
Date: Tue, 6 Sep 2011 11:04:43 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E64933E.8030908@incore.de>
In-Reply-To: <4E64933E.8030908@incore.de>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201109061104.43409.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 06 Sep 2011 11:04:44 -0400 (EDT)
Cc: Andreas Longwitz <longwitz@incore.de>
Subject: Re: UFS_DIRHASH panics on a dozen server within 30 hours
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Sep 2011 15:04:44 -0000

On Monday, September 05, 2011 5:15:42 am Andreas Longwitz wrote:
> Hi,
> 
> a week ago a dozen of my FreeBSD server crashed within a time span of
> 30 hours. On the server run very different applications, some of them
> were only standby. All server has the same kernel with FreeBSD 6 STABLE
> and there were no problems for yours until the "black monday".
> 
> Yes I know that FreeBSD 6 is out of date now, but I don't like to
> change a very good running system. Another reason is that my hardware
> needs the amr driver and because of the outstanding solution of the
> amr_ioctl problem described in kern/155658 it is not possible for me
> to upgrade my production sytems without changing hardware.

Hmm, the patch in that PR should still apply to newer versions.  Also, you 
could just change the malloc() call to always allocate the maximum size 
(instead of using a static buffer) for a smaller diff.  It seems though that a 
specific command is overrunning its buffer.

> Now I have a dozen core dumps and try to understand what happened.
> All dumps looks very similar and the panic is always "page fault"
> in _mtx_lock_sleep called from ufsdirhash_recycle or ufsdirhash_free
> because the used mtx_object is overwritten with zeros by someone
> before _mtx_lock_sleep is called.

I don't know of anything in particular that would explain this, esp. as to
why you would see them all occur at the same time.  Maybe look to see if the
machines were doing something unusual at that time (a cron job, etc.)?

-- 
John Baldwin