Date: Thu, 4 Jan 2001 11:35:21 +1030 From: Greg Lehey <grog@lemis.com> To: Roman Shterenzon <roman@xpert.com> Cc: Daniel Lang <dl@leo.org>, Andy Newman <andy@silverbrook.com.au>, freebsd-stable@freebsd.org Subject: Re: kern/21148: multiple crashes while using vinum Message-ID: <20010104113521.G4336@wantadilla.lemis.com> In-Reply-To: <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>; from roman@xpert.com on Thu, Jan 04, 2001 at 02:39:20AM %2B0200 References: <20010104105428.D4336@wantadilla.lemis.com> <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, 4 January 2001 at 2:39:20 +0200, Roman Shterenzon wrote: > On Thu, 4 Jan 2001, Greg Lehey wrote: > >> The trouble with that is that this only happens when the system is >> very active, and there are thousands of potential buffer headers which >> could be trashed. I do have a trace facility within Vinum, but even >> with that it's difficult to figure out what's going on. > > I don't agree about "very active". My system in question was calm during > the test, I just run find /raid -print and it crashed.o Your problem seems to be a little different, as I mentioned in my reply to your stack trace. It might be related, and it might give us the information we need to find the problem. >>> My stack-traces showed that this memory region stays the same on the >>> same machine with the same kernel (although I can't tell how >>> reliable this is). >> >> If you mean that the same part of the buffer header gets smashed every >> time, yes, this is reliably reproducible (well, in other words, when >> it happens (at random), it happens in the same place every time). It > > That is correct. Both me and Daniel had the crash occuring exactly > at the same place. Hmm. Well, almost: this time the line number is different, and it looks more plausible. Can you get me the local variables from your stack trace of this frame? (gdb) i loc >> may mean that Vinum is doing it, but as far as I can tell it's always >> 6 words being zeroed out, and I don't do that anywhere in Vinum. The >> other possibility, which I consider most likely, is that the data >> structures accidentally get freed and used by some other driver (or, >> possibly, that some other driver freed them first and then continued >> using them). This would explain the observed correlation with the fxp >> driver. > > Do you think that the later is more probable? I had fxp card there. That's what I said. I don't necessarily think it's a bug in the fxp driver, but Vinum is unique in the way it allocates buffer headers, so it's possible. >>> b) I still believe, that there is a problem somewhere in the >>> vinum code (probably within raid5 routines, since a mirror >>> setup worked fine). >> >> Correct. I have no doubt about it. But some bugs are difficult to >> find, and I need help. > > Hmm.. that part of the code in question, isn't it shared for both > raid1 and raid5? Yes, but if you look a few lines higher, you'll see that we've called complete_raid5_write for RAID-[45], and it's possible that something has happened there. Note that in the other case (buffer header corruption), we're executing non-Vinum code when the crash comes, so there's no requirement to stay in RAID-5 code. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010104113521.G4336>