Date: Thu, 04 Jun 1998 12:00:46 -0400 (EDT) From: Simon Shapiro <shimon@simon-shapiro.org> To: Greg Lehey <grog@lemis.com> Cc: Michael Hancock <michaelh@cet.co.jp>, "freebsd-current@freebsd.org" <freebsd-current@FreeBSD.ORG>, tcobb <tcobb@staff.circle.net>, Karl Pielorz <kpielorz@tdx.co.uk>, Mike Smith <mike@smith.net.au> Subject: Re: DPT driver fails and panics with Degraded Array Message-ID: <XFMail.980604120046.shimon@simon-shapiro.org> In-Reply-To: <19980603125443.K22406@freebie.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 03-Jun-98 Greg Lehey wrote: > Why would a driver call biodone on a buffer that doens't belong to it? The block belongs to it. Only it gets marked as done somehow. >>> These situations are worth analysing, and I hope to see you and Troy >>> resolving this one, even if it means that you point the finger >>> elsewhere. > > Definitely. I'm surprised nobody has done it yet. I posted some notes on this issue several months ago, with no response. >> I got these particularly with tape devices. Especially if there are two >> tape drives on the system and yoy try to (for example) cpio to both >> independently. I put a ton of debugging code in the DPT driver to try >> and >> catch the DPT sending biodone twice on the same request and am pretty >> comfortable the driver is not it. > > OK, where is the failing biodone called from? >From the DPT driver. Let me clarify the statement above; There was a printf in the driver, just above the biodone call. The driver also contains state info as to biodone was called or not (actually, biodone state is implicit from other states). In every case where the biodone failure occured, there was no prior call to biodone. I.E. the offending call was the first call. I even went as far as putting counters around these calls. It always stayed at zero. Since the greatest sensitivity was in the st.c, and st.c is new in CAM, I basically dropped the ball. Especially when I did not have this problem in 3.0, from very early on. > I find this difficult to follow. Onn the one hand, lots of people > (myself included) regularly use the st driver, and I've never seen > this behaviour. About the only thing that these panics have in common > is the DPT driver. It's easy enough to determine which driver is > involved: all you need to do is follow the stack trace to find what > devices is involved with the buffer (or just look at bp->b_dev). Are you using two tape drives, and write to both concurrently, using 64k blocks? Are you running disk I/O at 1500-1900 operations per second? Is the SCSI controller you use capable of causing biodone to be called within less than 1us from the driver being called? The fact that the DPT driver causes this problem does not automatically vindicate the DPT driver code. I would LOVE for it to be so because this is the part of the FreeBSD kernel I understand the best. Stack traces were analyzed, but did not reveal anything interesting. It is entirely possible that the fast response from the DPT causes a race condition elsewhere. Without cooperation from others who understand the other parts of the kernel better than I do, it is difficult for me to analyze it much farther beyond ``I am pretty confident it is not a coding error in the driver or the immediate code that calls it. Simon --- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG 770.265.7340 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980604120046.shimon>