FreeBSD Mail Archives

Date:      Thu, 4 Jan 2001 11:35:21 +1030
From:      Greg Lehey <grog@lemis.com>
To:        Roman Shterenzon <roman@xpert.com>
Cc:        Daniel Lang <dl@leo.org>, Andy Newman <andy@silverbrook.com.au>, freebsd-stable@freebsd.org
Subject:   Re: kern/21148: multiple crashes while using vinum
Message-ID:  <20010104113521.G4336@wantadilla.lemis.com>
In-Reply-To: <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>; from roman@xpert.com on Thu, Jan 04, 2001 at 02:39:20AM %2B0200
References:  <20010104105428.D4336@wantadilla.lemis.com> <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>

On Thursday,  4 January 2001 at  2:39:20 +0200, Roman Shterenzon wrote:
> On Thu, 4 Jan 2001, Greg Lehey wrote:
>
>> The trouble with that is that this only happens when the system is
>> very active, and there are thousands of potential buffer headers which
>> could be trashed.  I do have a trace facility within Vinum, but even
>> with that it's difficult to figure out what's going on.
>
> I don't agree about "very active". My system in question was calm during
> the test, I just run find /raid -print and it crashed.o

Your problem seems to be a little different, as I mentioned in my
reply to your stack trace.  It might be related, and it might give us
the information we need to find the problem.

>>> My stack-traces showed that this memory region stays the same on the
>>> same machine with the same kernel (although I can't tell how
>>> reliable this is).
>>
>> If you mean that the same part of the buffer header gets smashed every
>> time, yes, this is reliably reproducible (well, in other words, when
>> it happens (at random), it happens in the same place every time).  It
>
> That is correct. Both me and Daniel had the crash occuring exactly
> at the same place.

Hmm.  Well, almost: this time the line number is different, and it
looks more plausible.  Can you get me the local variables from your
stack trace of this frame?

 (gdb) i loc

>> may mean that Vinum is doing it, but as far as I can tell it's always
>> 6 words being zeroed out, and I don't do that anywhere in Vinum.  The
>> other possibility, which I consider most likely, is that the data
>> structures accidentally get freed and used by some other driver (or,
>> possibly, that some other driver freed them first and then continued
>> using them).  This would explain the observed correlation with the fxp
>> driver.
>
> Do you think that the later is more probable?  I had fxp card there.

That's what I said.  I don't necessarily think it's a bug in the fxp
driver, but Vinum is unique in the way it allocates buffer headers, so
it's possible.

>>>  b) I still believe, that there is a problem somewhere in the
>>>     vinum code (probably within raid5 routines, since a mirror
>>>     setup worked fine).
>>
>> Correct.  I have no doubt about it.  But some bugs are difficult to
>> find, and I need help.
>
> Hmm.. that part of the code in question, isn't it shared for both
> raid1 and raid5?

Yes, but if you look a few lines higher, you'll see that we've called
complete_raid5_write for RAID-[45], and it's possible that something
has happened there.  Note that in the other case (buffer header
corruption), we're executing non-Vinum code when the crash comes, so
there's no requirement to stay in RAID-5 code.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010104113521.G4336>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation