From owner-freebsd-current Fri Jun 5 22:21:21 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA23231 for freebsd-current-outgoing; Fri, 5 Jun 1998 22:21:21 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero.simon-shapiro.org.142.69.207.in-addr.arpa [207.69.142.25] (may be forged)) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id WAA23226 for ; Fri, 5 Jun 1998 22:21:20 -0700 (PDT) (envelope-from shimon@sendero.simon-shapiro.org) Received: (qmail 29233 invoked by uid 1000); 5 Jun 1998 21:23:12 -0000 Message-ID: X-Mailer: XFMail 1.3 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19980605093046.J768@freebie.lemis.com> Date: Fri, 05 Jun 1998 17:23:12 -0400 (EDT) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: Greg Lehey Subject: Re: DPT driver fails and panics with Degraded Array Cc: Mike Smith , Karl Pielorz , tcobb , "freebsd-current@freebsd.org" , Michael Hancock Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 05-Jun-98 Greg Lehey wrote: > On Thu, 4 June 1998 at 12:00:46 -0400, Simon Shapiro wrote: >> >> On 03-Jun-98 Greg Lehey wrote: >>> Why would a driver call biodone on a buffer that doens't belong to it? >> >> The block belongs to it. Only it gets marked as done somehow. > > That in itself is normal enough. How come it's not busy? I dunno. From the driver, when biodone needs to be called, I enter a critical section, move the block to the proper queue, call biodone, clear some bits, and release critical section. Maybe something times out some blocks? I tried for a while to trace it down, but found nothing interesting. ... > I don't know the driver, but I'm surprised you need to maintain > separate information. I'd use the state in the bp->b_flags. I do not replicate b_flags. I do maintain some other state bits in regards to the DPT state machine. >> Since the greatest sensitivity was in the st.c, and st.c is new in CAM, >> I >> basically dropped the ball. Especially when I did not have this problem >> in >> 3.0, from very early on. > > I haven't seen a driver called st.c in CAM. They've changed the > names, and the tape driver is now called scsi_sa.c. st.c is the old > tape driver. How do you determine "greatest sensitivity"? If I run (including in 3.0, and SMP) two cpio sessions to two tape drives, the system panics. I can access multiple disks, or multiple CD-ROMs without error, but it is easiest to induce an error with two tape drives. > In any case, I can't see how a different driver can influence things. > Heavy tape I/O may help the problem to show itself, but I can't think > it's in any way to blame. Next time I am running multiple tape drives, I will write dowm the failure mode. But things happen like when one tape is rewinding, the other one stops writing as it suddenly ``is'' at EOT. Stuff like that. Please do not go chasing code, as this is a horrible way to describe a problem. I'll post more specifics at a later date. ... >> Are you using two tape drives, and write to both concurrently, using 64k >> blocks? > > Occasionally. Without failure? That's good. >> Are you running disk I/O at 1500-1900 operations per second? Is the >> SCSI controller you use capable of causing biodone to be called >> within less than 1us from the driver being called? > > Well, I suppose each of the controllers could generate a number of > interrupts per second, so sooner or later that scenario would arise. > But as I said above, there's nothing to point to the st driver except > it's the new kid on the block. What you have said points fairly and > squarely to the DPT driver as the culprit. I fail to see how. Read my comments carefully. I am not of the opinion that the tape driver is at fault. I simply say that I observe the failure most dramatically when using DAT drives as destination. For example, last time I tried, I could not write tapes with any blocking factor other than 512 bytes, and still be able to read the tape correctly. When writing to disks, this restriction does not apply. Since the code in the DPT driver is the same, regardless of the nature of the target (or its address), I naively assumed the DPT driver is not the culprit. > OK. What happens if you analyse the buffer header before calling > biodone and just ignore it if it's not busy? I dunno. Excellent suggestion. I'll try that. Anyone willing to test that? Simon --- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG 770.265.7340 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message