From owner-freebsd-stable Wed Jan 3 16:52:48 2001 From owner-freebsd-stable@FreeBSD.ORG Wed Jan 3 16:52:45 2001 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from jamus.xpert.com (jamus.xpert.com [199.203.132.17]) by hub.freebsd.org (Postfix) with ESMTP id C65B737B400 for ; Wed, 3 Jan 2001 16:52:42 -0800 (PST) Received: from roman (helo=localhost) by jamus.xpert.com with local-esmtp (Exim 3.12 #5) id 14DyR6-0005qN-00; Thu, 04 Jan 2001 02:39:20 +0200 Date: Thu, 4 Jan 2001 02:39:20 +0200 (IST) From: Roman Shterenzon To: Greg Lehey Cc: Daniel Lang , Andy Newman , Subject: Re: kern/21148: multiple crashes while using vinum In-Reply-To: <20010104105428.D4336@wantadilla.lemis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 4 Jan 2001, Greg Lehey wrote: ...snip... > > The reason is, that _some code_ writes into unallocated memory, in > > my case overwriting a data-structure of an ata-request with a few > > zero bytes, causing the panic. The stack trace allows me to trace > > the problem back to this point, but not further. I later experienced > > a similar problem on a scsi-only system. > > Yes, this looks very much like the other issues. But you must > understand that there's nothing I can do without further information. > > > The reason, why I filed this pr unter 'vinum' is, that it only > > occured on boxes using vinum, and perfectly reproducable via simple > > operations like a 'find /vinum/file/system -print' on a larger and > > moderately filled vinum-filesystem. Perfectly reproducable means: > > each night, periodic daily caused the panic (traceable to the find > > call in /etc/security, finding files with setuid bits). > > > > As far as I know, the only way to trace this writing into > > unallocated/otherallocated memory resp. buffer overrun > > would be to set a watchpoint to the overwritten data-structure > > within the kernel-debugger. > > The trouble with that is that this only happens when the system is > very active, and there are thousands of potential buffer headers which > could be trashed. I do have a trace facility within Vinum, but even > with that it's difficult to figure out what's going on. I don't agree about "very active". My system in question was calm during the test, I just run find /raid -print and it crashed. > > My stack-traces showed that this memory region stays the same on the > > same machine with the same kernel (although I can't tell how > > reliable this is). > > If you mean that the same part of the buffer header gets smashed every > time, yes, this is reliably reproducible (well, in other words, when > it happens (at random), it happens in the same place every time). It That is correct. Both me and Daniel had the crash occuring exactly at the same place. > may mean that Vinum is doing it, but as far as I can tell it's always > 6 words being zeroed out, and I don't do that anywhere in Vinum. The > other possibility, which I consider most likely, is that the data > structures accidentally get freed and used by some other driver (or, > possibly, that some other driver freed them first and then continued > using them). This would explain the observed correlation with the fxp > driver. Do you think that the later is more probable? I had fxp card there. > > b) I still believe, that there is a problem somewhere in the > > vinum code (probably within raid5 routines, since a mirror > > setup worked fine). > > Correct. I have no doubt about it. But some bugs are difficult to > find, and I need help. Hmm.. that part of the code in question, isn't it shared for both raid1 and raid5? --Roman Shterenzon, UNIX System Administrator and Consultant [ Xpert UNIX Systems Ltd., Herzlia, Israel. Tel: +972-9-9522361 ] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message