From owner-freebsd-stable@FreeBSD.ORG Sun Sep 11 22:24:01 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E97601065746 for ; Sun, 11 Sep 2011 22:24:01 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from mail.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id A5DCD8FC15 for ; Sun, 11 Sep 2011 22:24:01 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by mail.incore.de (Postfix) with ESMTP id A1CBB5F1C4; Mon, 12 Sep 2011 00:24:00 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from mail.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id RIFVDPqUsZ09; Mon, 12 Sep 2011 00:23:59 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by mail.incore.de (Postfix) with ESMTP id D999E5EC21; Mon, 12 Sep 2011 00:23:59 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id 79F8845088; Mon, 12 Sep 2011 00:23:59 +0200 (CEST) Message-ID: <4E6D34FE.70703@incore.de> Date: Mon, 12 Sep 2011 00:23:58 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <4E64933E.8030908@incore.de> <201109061104.43409.jhb@freebsd.org> In-Reply-To: <201109061104.43409.jhb@freebsd.org> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit Cc: John Baldwin Subject: Re: UFS_DIRHASH panics on a dozen server within 30 hours X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2011 22:24:02 -0000 Hi, thank you very much for your answer, I think you pointed me in the right direction. > Hmm, the patch in that PR should still apply to newer versions. Also, you > could just change the malloc() call to always allocate the maximum size > (instead of using a static buffer) for a smaller diff. It seems though that a > specific command is overrunning its buffer. Yes. I found that megarc often wants a buffer of 12868 bytes, but the controller sends always 25412 bytes back. Because this seems to be an error in megarc I have submitted a patch for the existing PR ports/137938. Furthermore I saw some sporadic answers of the controller to megarc ioctl's with much more data than the buffer size stated by megarc. Therefore I still use the maximum size in my updated patch in kern/155658. >> Now I have a dozen core dumps and try to understand what happened. >> All dumps looks very similar and the panic is always "page fault" >> in _mtx_lock_sleep called from ufsdirhash_recycle or ufsdirhash_free >> because the used mtx_object is overwritten with zeros by someone >> before _mtx_lock_sleep is called. > > I don't know of anything in particular that would explain this, esp. as to > why you would see them all occur at the same time. In the meantime I had three more crashes in FreeBSD 6. I assume it is the same problem as in FreeBSD 8, because the memory corruption problem caused by megarc and the controller has nothing to do with the version of FreeBSD. I have verified that the overruns occurs in FreeBSD 6 too, but I do not have an explanation, why FreeBSD did not crash for years because I used megarc all the time every day. -- Dr. Andreas Longwitz Data Service GmbH Beethovenstr. 2A 23617 Stockelsdorf Amtsgericht Lübeck, HRB 318 BS Geschäftsführer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau