From owner-freebsd-stable  Wed Jan  3 16:52:48 2001
From owner-freebsd-stable@FreeBSD.ORG  Wed Jan  3 16:52:45 2001
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from jamus.xpert.com (jamus.xpert.com [199.203.132.17])
	by hub.freebsd.org (Postfix) with ESMTP id C65B737B400
	for <freebsd-stable@freebsd.org>; Wed,  3 Jan 2001 16:52:42 -0800 (PST)
Received: from roman (helo=localhost)
	by jamus.xpert.com with local-esmtp (Exim 3.12 #5)
	id 14DyR6-0005qN-00; Thu, 04 Jan 2001 02:39:20 +0200
Date: Thu, 4 Jan 2001 02:39:20 +0200 (IST)
From: Roman Shterenzon <roman@xpert.com>
To: Greg Lehey <grog@lemis.com>
Cc: Daniel Lang <dl@leo.org>, Andy Newman <andy@silverbrook.com.au>,
	<freebsd-stable@freebsd.org>
Subject: Re: kern/21148: multiple crashes while using vinum
In-Reply-To: <20010104105428.D4336@wantadilla.lemis.com>
Message-ID: <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 4 Jan 2001, Greg Lehey wrote:

...snip...
> > The reason is, that _some code_ writes into unallocated memory, in
> > my case overwriting a data-structure of an ata-request with a few
> > zero bytes, causing the panic. The stack trace allows me to trace
> > the problem back to this point, but not further. I later experienced
> > a similar problem on a scsi-only system.
>
> Yes, this looks very much like the other issues.  But you must
> understand that there's nothing I can do without further information.
>
> > The reason, why I filed this pr unter 'vinum' is, that it only
> > occured on boxes using vinum, and perfectly reproducable via simple
> > operations like a 'find /vinum/file/system -print' on a larger and
> > moderately filled vinum-filesystem.  Perfectly reproducable means:
> > each night, periodic daily caused the panic (traceable to the find
> > call in /etc/security, finding files with setuid bits).
> >
> > As far as I know, the only way to trace this writing into
> > unallocated/otherallocated memory resp. buffer overrun
> > would be to set a watchpoint to the overwritten data-structure
> > within the kernel-debugger.
>
> The trouble with that is that this only happens when the system is
> very active, and there are thousands of potential buffer headers which
> could be trashed.  I do have a trace facility within Vinum, but even
> with that it's difficult to figure out what's going on.

I don't agree about "very active". My system in question was calm during
the test, I just run find /raid -print and it crashed.

> > My stack-traces showed that this memory region stays the same on the
> > same machine with the same kernel (although I can't tell how
> > reliable this is).
>
> If you mean that the same part of the buffer header gets smashed every
> time, yes, this is reliably reproducible (well, in other words, when
> it happens (at random), it happens in the same place every time).  It
That is correct. Both me and Daniel had the crash occuring exactly at the
same place.

> may mean that Vinum is doing it, but as far as I can tell it's always
> 6 words being zeroed out, and I don't do that anywhere in Vinum.  The
> other possibility, which I consider most likely, is that the data
> structures accidentally get freed and used by some other driver (or,
> possibly, that some other driver freed them first and then continued
> using them).  This would explain the observed correlation with the fxp
> driver.
Do you think that the later is more probable?
I had fxp card there.

> >  b) I still believe, that there is a problem somewhere in the
> >     vinum code (probably within raid5 routines, since a mirror
> >     setup worked fine).
>
> Correct.  I have no doubt about it.  But some bugs are difficult to
> find, and I need help.
Hmm.. that part of the code in question, isn't it shared for both raid1
and raid5?

--Roman Shterenzon, UNIX System Administrator and Consultant
[ Xpert UNIX Systems Ltd., Herzlia, Israel. Tel: +972-9-9522361 ]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message