From owner-freebsd-questions  Tue Apr  8 19:14:39 1997
Return-Path: <owner-questions>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id TAA19379
          for questions-outgoing; Tue, 8 Apr 1997 19:14:39 -0700 (PDT)
Received: from vicor-nb.com (ftp.vicor-nb.com [208.206.78.223])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA19373
          for <questions@freebsd.org>; Tue, 8 Apr 1997 19:14:36 -0700 (PDT)
Received: from dns.vicor-nb.com (madden.vicor-nb.com [208.206.78.38]) by vicor-nb.com (8.7.5/8.7.3) with SMTP id SAA19932; Tue, 8 Apr 1997 18:04:57 GMT
Message-ID: <334AFA8A.3010@vicor-nb.com>
Date: Tue, 08 Apr 1997 19:10:18 -0700
From: Cayford Burrell <cayford@vicor-nb.com>
Reply-To: cayford@vicor-nb.com
Organization: Vicor, Inc
X-Mailer: Mozilla 3.01 (Win95; I)
MIME-Version: 1.0
To: questions@freebsd.org
CC: julian@whistle.com, phk@tfs.com
Subject: SCSI problems
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-questions@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi there.  We're having trouble with SCSI.  Need
some help.  Console shows "timeout while idle",
with a bunch of variables like SCSI_SIGI, LASTPHASE,
and so on.  Very often, the cpu freezes.  Sometimes,
the system recovers.  I can easily recreate this.

I heard from Poul-Henning that 2.2 had problems like
this and that they were solved with 2.2.1.  Well,
I still have problems.  2.2.1 is better, but under
load (6 processes reading a series of 200k files), it
will crash, after 5-10 minutes, sometimes after 2-3 hours.

Specifics: P5/133, Adaptec 2940UW, Seek Raid disk
configured as one SCSI target with 39gb, AHC_TAGENABLE
amd AHC_MEMIO.  NOT AHC_PAGE_SCB (or whatever turns on
SCB paging).  FreeBSD 2.2.1.

This disk can routinely move data at 15MB/sec; at least,
it has in the past, with 2.1.5.

It seems that either the Raid disk is dropping an SCB, or, 
the kernel is confused.

Reading the scsi driver code, I see that there should
only be 4 SCB's for this disk, even though the 2940
supports 16.

The disk's serial port shows that frequently, on reads,
up to 16 tagged requests are pending.

The SCSI analyzer shows that the disk is
rejecting requests with the scsi response "queue full",
which is supposed to trigger the driver to reduce
the number of opennings.  

The disk supports 16 SCB's.

Seems to me that the kernel is sending more SCB's than it
should, and is not honoring the limit of 4 SCB's, and worse,
when the disk reports back that it can't accept more requests,
the cpu just keeps hammering away.

What should I do next?

-- Cayford Burrell