Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jan 2001 02:39:20 +0200 (IST)
From:      Roman Shterenzon <roman@xpert.com>
To:        Greg Lehey <grog@lemis.com>
Cc:        Daniel Lang <dl@leo.org>, Andy Newman <andy@silverbrook.com.au>, <freebsd-stable@freebsd.org>
Subject:   Re: kern/21148: multiple crashes while using vinum
Message-ID:  <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>
In-Reply-To: <20010104105428.D4336@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Jan 2001, Greg Lehey wrote:

...snip...
> > The reason is, that _some code_ writes into unallocated memory, in
> > my case overwriting a data-structure of an ata-request with a few
> > zero bytes, causing the panic. The stack trace allows me to trace
> > the problem back to this point, but not further. I later experienced
> > a similar problem on a scsi-only system.
>
> Yes, this looks very much like the other issues.  But you must
> understand that there's nothing I can do without further information.
>
> > The reason, why I filed this pr unter 'vinum' is, that it only
> > occured on boxes using vinum, and perfectly reproducable via simple
> > operations like a 'find /vinum/file/system -print' on a larger and
> > moderately filled vinum-filesystem.  Perfectly reproducable means:
> > each night, periodic daily caused the panic (traceable to the find
> > call in /etc/security, finding files with setuid bits).
> >
> > As far as I know, the only way to trace this writing into
> > unallocated/otherallocated memory resp. buffer overrun
> > would be to set a watchpoint to the overwritten data-structure
> > within the kernel-debugger.
>
> The trouble with that is that this only happens when the system is
> very active, and there are thousands of potential buffer headers which
> could be trashed.  I do have a trace facility within Vinum, but even
> with that it's difficult to figure out what's going on.

I don't agree about "very active". My system in question was calm during
the test, I just run find /raid -print and it crashed.

> > My stack-traces showed that this memory region stays the same on the
> > same machine with the same kernel (although I can't tell how
> > reliable this is).
>
> If you mean that the same part of the buffer header gets smashed every
> time, yes, this is reliably reproducible (well, in other words, when
> it happens (at random), it happens in the same place every time).  It
That is correct. Both me and Daniel had the crash occuring exactly at the
same place.

> may mean that Vinum is doing it, but as far as I can tell it's always
> 6 words being zeroed out, and I don't do that anywhere in Vinum.  The
> other possibility, which I consider most likely, is that the data
> structures accidentally get freed and used by some other driver (or,
> possibly, that some other driver freed them first and then continued
> using them).  This would explain the observed correlation with the fxp
> driver.
Do you think that the later is more probable?
I had fxp card there.

> >  b) I still believe, that there is a problem somewhere in the
> >     vinum code (probably within raid5 routines, since a mirror
> >     setup worked fine).
>
> Correct.  I have no doubt about it.  But some bugs are difficult to
> find, and I need help.
Hmm.. that part of the code in question, isn't it shared for both raid1
and raid5?

--Roman Shterenzon, UNIX System Administrator and Consultant
[ Xpert UNIX Systems Ltd., Herzlia, Israel. Tel: +972-9-9522361 ]



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.30.0101040234380.21369-100000>