From owner-freebsd-scsi  Sun Dec 12  5:15:44 1999
Delivered-To: freebsd-scsi@freebsd.org
Received: from mail.omnilink.net (mail.omnilink.net [194.64.25.6])
	by hub.freebsd.org (Postfix) with ESMTP id 6A70D14E49
	for <freebsd-scsi@FreeBSD.ORG>; Sun, 12 Dec 1999 05:15:41 -0800 (PST)
	(envelope-from ob@omnilink.net)
Received: from ntsrv2 (my.jav.net [212.255.14.194])
	by mail.omnilink.net (8.9.3/8.9.3) with SMTP id OAA01806
	for <freebsd-scsi@FreeBSD.ORG>; Sun, 12 Dec 1999 14:17:05 +0100 (CET)
	(envelope-from ob@omnilink.net)
Message-ID: <00e001bf44a2$ecea6c00$c20effd4@jav.net>
From: "Oliver Blasnik" <ob@omnilink.net>
To: <freebsd-scsi@FreeBSD.ORG>
Subject: Again: CRD-Raid-Controller and FreeBSD 3.x
Date: Sun, 12 Dec 1999 14:15:05 +0100
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2014.211
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2014.211
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hi there,

I do have several FreeBSD-Systems (3.1, 3.3) connected to external
CRD-5440-Raid-Controllers and encountered massive Problems on one (new) machine
this weekend :(

All are working with on-board Adaptec 7890-Controllers, set up to UW 40 MHz,
which is in specification to that 5440. Connection is done via Granite-Cables
and Terminators, the Raid is the only device, I do NOT have any termination- or
cabling-Problems on any of that 3 machines. Firmware of the 5440s is 1.9.

But what happend...
The server refused to work on after multiple "nice" messages in
/var/log/messages, telling me:

Dec 11 04:01:57 bigfoot /kernel: (da1:ahc0:0:0:1): SCB 0x47 - timed out while
idle, LASTPHASE == 0x1, SEQADDR == 0x9
Dec 11 04:01:57 bigfoot /kernel: (da1:ahc0:0:0:1): Queuing a BDR SCB
Dec 11 04:01:57 bigfoot /kernel: (da1:ahc0:0:0:1): Bus Device Reset Message Sent
Dec 11 04:01:57 bigfoot /kernel: (da1:ahc0:0:0:1): no longer in timeout, status
= 34b
Dec 11 04:01:57 bigfoot /kernel: ahc0: Bus Device Reset on A:0. 2 SCBs aborted

This server is our backup-machine (amanda) and had high scsi-load at this time.
I think that there were many SCB's waiting to finish and that driver didn't wait
long enough. After that crash I checked out the other Raid-Systems and found
identical messages on all (which were still running)...

As a work-around the only thing was to disable tagged-queuing on all systems
(added that devices to cam_xpt.c and build a new kernel). None of these messages
occured anymore.

Anyone experienced something like this? How could I expand the timeout for
tagged-queuing? Is there any workaround to use tagq (scsi is now extremely slow
in comparision to yesterday...)?

Regards,

Oliver


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message