From owner-freebsd-scsi  Sun Jul  6 00:51:40 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id AAA13090
          for freebsd-scsi-outgoing; Sun, 6 Jul 1997 00:51:40 -0700 (PDT)
Received: from sax.sax.de (sax.sax.de [193.175.26.33])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id AAA13085
          for <scsi@FreeBSD.ORG>; Sun, 6 Jul 1997 00:51:37 -0700 (PDT)
Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id JAA20078; Sun, 6 Jul 1997 09:51:29 +0200
Received: (from j@localhost)
	by uriah.heep.sax.de (8.8.5/8.8.5) id JAA15819;
	Sun, 6 Jul 1997 09:30:53 +0200 (MET DST)
Message-ID: <19970706093053.ZG59677@uriah.heep.sax.de>
Date: Sun, 6 Jul 1997 09:30:53 +0200
From: j@uriah.heep.sax.de (J Wunsch)
To: scsi@FreeBSD.ORG
Cc: kmitch@weenix.guru.org (Keith Mitchell)
Subject: Re: Archive Viper and 3940UW (bad Drive?)
References: <199707052152.PAA26449@pluto.plutotech.com> <199707060106.VAA12128@weenix.guru.org>
X-Mailer: Mutt 0.60_p2-3,5,8-9
Mime-Version: 1.0
X-Phone: +49-351-2012 669
X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F  93 21 E0 7D F9 12 D6 4E
Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch)
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

As Keith Mitchell wrote:

> OK, that is what I thought originally, but then given that erasing them
> "seemingly" solved the timeout problem I don't know what to think.  What
> does erasing a tape actually do?  It doesn't take but a few seconds.

This sounds wrong.  For me, it makes an entire pass over the medium.
This is also what i'm expecting.  (Tandberg TDC4222, arbitrary QIC-150
cartridge.)  I know erasing a tape takes forever on a DAT medium.
QICs are faster here, since the erase head is really a quarter-inch
head, erasing all the parallel tracks at once.

QIC tapes are normally being erased before writing track 1 (i.e.,
while writing from the very beginning).

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

From owner-freebsd-scsi  Sun Jul  6 00:51:58 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id AAA13136
          for freebsd-scsi-outgoing; Sun, 6 Jul 1997 00:51:58 -0700 (PDT)
Received: from sax.sax.de (sax.sax.de [193.175.26.33])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id AAA13126
          for <freebsd-scsi@FreeBSD.ORG>; Sun, 6 Jul 1997 00:51:53 -0700 (PDT)
Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id JAA20080; Sun, 6 Jul 1997 09:51:46 +0200
Received: (from j@localhost)
	by uriah.heep.sax.de (8.8.5/8.8.5) id JAA15872;
	Sun, 6 Jul 1997 09:45:31 +0200 (MET DST)
Message-ID: <19970706094530.PU64236@uriah.heep.sax.de>
Date: Sun, 6 Jul 1997 09:45:30 +0200
From: j@uriah.heep.sax.de (J Wunsch)
To: freebsd-scsi@FreeBSD.ORG
Cc: Janick.Taillandier@ratp.fr (Janick Taillandier)
Subject: Re: Problem with worm in current
References: <19970706081144.27908@fugue.noisy.ratp>
X-Mailer: Mutt 0.60_p2-3,5,8-9
Mime-Version: 1.0
X-Phone: +49-351-2012 669
X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F  93 21 E0 7D F9 12 D6 4E
Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch)
In-Reply-To: <19970706081144.27908@fugue.noisy.ratp>; from Janick Taillandier on Jul 6, 1997 08:11:44 +0200
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

(Moved to freebsd-scsi.)

As Janick Taillandier wrote:

> But when I am trying to burn a CD I get these messages :
> 
> |Jul  5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC

Hmm, i don't have the Philips/HP manual handy.  Somebody with the
manual might decode it.  When does it happen?

> What is the status of this problem ? Do I need to return to 
> 2.2.2 ?

This very likely won't help you at all.

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

From owner-freebsd-scsi  Sun Jul  6 01:26:19 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id BAA14528
          for freebsd-scsi-outgoing; Sun, 6 Jul 1997 01:26:19 -0700 (PDT)
Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA14523
          for <freebsd-scsi@freebsd.org>; Sun, 6 Jul 1997 01:26:16 -0700 (PDT)
Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1])
          by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id KAA09435
          for <freebsd-scsi@freebsd.org>; Sun, 6 Jul 1997 10:26:14 +0200 (METDST)
Received: by arts.ratp.fr id KAA02899
	  for <freebsd-scsi@freebsd.org>; Sun, 6 Jul 1997 10:26:11 +0200 (DST)
Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA002897
	  for <freebsd-scsi@freebsd.org>; Sun Jul  6 10:25:43 1997
Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123])
 	  by minos.noisy.ratp  with ESMTP id KAA03590
 	  for <freebsd-scsi@freebsd.org>; Sun, 6 Jul 1997 10:25:42 +0200 (DST)
Received: by fugue.noisy.ratp id KAA00656
	  ; Sun, 6 Jul 1997 10:24:42 +0200 (DST)
From: Janick.Taillandier@ratp.fr (Janick Taillandier)
Message-ID: <19970706102441.37074@fugue.noisy.ratp>
Date: Sun, 6 Jul 1997 10:24:41 +0200
To: freebsd-scsi@freebsd.org
Subject: Re: Problem with worm in current
References: <19970706081144.27908@fugue.noisy.ratp> <19970706094530.PU64236@uriah.heep.sax.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.76e
In-Reply-To: <19970706094530.PU64236@uriah.heep.sax.de>; from J Wunsch on Sun, Jul 06, 1997 at 09:45:30AM +0200
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Sun, Jul 06, 1997 at 09:45:30AM +0200, J Wunsch wrote:

> > But when I am trying to burn a CD I get these messages :
> > 
> > |Jul  5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC
> 
> Hmm, i don't have the Philips/HP manual handy.  Somebody with the
> manual might decode it.  When does it happen?

When I am trying to write to the disk, after initializing it, with
(for example) :

rtprio 5 team -v 1m 5 < /mnt/jt/track01.pcm  | rtprio 5 dd of=/dev/rworm0 obs=20k 

I get :

worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC
worm0: ILLEGAL REQUEST asc:2c,0 Command sequence error


Janick Taillandier

From owner-freebsd-scsi  Sun Jul  6 16:37:12 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA11446
          for freebsd-scsi-outgoing; Sun, 6 Jul 1997 16:37:12 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id QAA11418
          for <freebsd-SCSI@freebsd.org>; Sun, 6 Jul 1997 16:36:40 -0700 (PDT)
Received: (qmail 3287 invoked by uid 1000); 6 Jul 1997 23:36:19 -0000
Message-ID: <XFMail.970706163618.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
Date: Sun, 06 Jul 1997 16:36:18 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: freebsd-SCSI@freebsd.org
Subject: New Release - DPT RAID Controllers
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hi Y'all SCSIers,

ftp.i-connect.net and/or sendero-ppp.i-connect.net now have version 1.1.6
of the FreeBSd driver for the DPT PCI SCSI RAId controllers.  These are in
the /pub/crash and /crash directory.  I will also upload to freefall a copy
of the patch.  This is against RELENG_2_2 as of today.

New in this release:

* Several new config options.  See sys/i386/conf/LINT for details.
  See more below on HANDLE_TIMEOUTS.

* Got rid of an annoying bug that caused biodone panics.

* SCSI software interrupts are now tested under heavy load (512 processed)
  and seem to be very healthy.

Patch 1.1.6 includes all these changes.

On SCSI Software Interrupts:

I was asked by few of you what we have done here.  so here it goes:

We simply mirrored the net ISR code, and put it one noth below the netowrk
priority.  To use, do the following:

#include <scsi/softisr.h>

Pick an interrupt.  I normally use SCSISR_DPT (bit 0).  There are 32 bits
in the mask.  For every interrupt, use a separate interrupt bit.  For some
strange reasons, the netisr code does not permit more than one interrupt
per source file.  As Justin Gibbs pointed out to me, you really do not need
more than that.

Next, write a routing that will execute every time that particular interrupt
happens.  Say, you call it foo_isr:

static void
foo_isr(void);

Is a good declaration, and the function should be written to match.
For this example, let us assume you want it to be associated with bit 7
of the SCSI software interrupts mask.
Remember:  when the function is called, it will be at a very high priority
(appears higher than splbio().  We really do not know why yet, but it is 
under investigation.  In any case, minimize your critical section.  See 
dpt_scsi.c for details.

Early in your code, put the following:

SCSI_SET(SCSISR_7, foo_isr);

then, at any point in your code, where you want foo_isr to execute, 
ASYNCHRONOUSLY with your code, call:

schedscsisr(SCSISR_7);

Once the kernel goes back to splzero, any request thus scheduled, will be
called, in high priority!

What is it good for?  Just like in the networking code, it allows you to
(essentially) start another thread of execution in the kernel.
For example, the normal SCSI HBA driver receives a request for I/O, tinkers
with it a bit (S/G, etc.) and then sends it to the hardware.  This last
action involved I/O bus operations and a moderate amount of polling.
Instead, the DPT driver (almost) always puts the request in a queue and
imeediately tells the SCSI system ``queued successfully''.  It then
schedules a software interrupt.  The interrupt routine runs whenever it runs
and processes the queue.  This allows I/O requests to never block on (or be
paced by) hardware.  Under moderate I/O loads, it is a waste of time.
Under heavy loads, it really makes a difference. What difference?
With 512 processes concurrently reading and writing raw devices, the load
average goes down from 280 to 0.03 (it went down to 20 with NET software
interrupts).  Yes, the system is still heavily loaded; Disk I/O can take as
long as 13 seconds to complete.  But, networking code, user code, etc. is 
still unhampered.  Actually, even asynch I/O (buffered) improves
dramatically.  The maximum wait goes down to 85us waiting for the controller
and 30us past the interrupt service.  On that test load, the best interrupt
latency is 3us and the worst 37us.  This is within 10us of an idle system.

BTW, these numbers are with a queue of 64 commands on the DPT hardware.
Future release of the firmware will increase that to 256, 1024, and 8192.

On DPT_HANDLE_TIMEOUTS:

Normally, the DPT driver has no timeout mechanism in it, nor does it need
one;  the firmware on the controler does all the I/O management, re-tries,
ECC, and other good stuff.

With this option, commands will timeout after a while.  The timeout
mechanism works as follows:

Once booted, every ten seconds, dpt_handle_timeouts() will be called.
This function scans all submitted commands (sent to the DPT and not done 
yet).  If a SCSI command is older than what the SCSI upper layer wants it
to be (times the current number of requests on the controller), it is
tagged.  Tagged commands are given that much time again, to get done.
If not, they are destroyed, and the upper layer is notified of the failure.
this manifests itself (in functions that examine read/write syscalls
results :-) as an I/O error to the program.  Nothing more.
If a command is completed during this grace period, it will be handled
as if nothing happened 9except for a console message).  If the command
completes after destruction, the results are tossed away.  We simulated,
carefully, all these condsitions and it all appears to work.

Why bother?  Well, try to put a DPT behind certain PCI bridges.
What happens then is that, on accasion, an interrupt will reach the DPT
interrupt service routine sooner than the DMA transfer of the data
stabilized across the bridge (the DPT always does a DMA of a status struct
followed by an interrupt).  The driver reacts to this nonsense by promptly
tossing the whole completion report (we have NO way of telling what the
cirrupt mailbox-struct should have been).  While we so smartly tossed away 
the corrupt message, the DPT has no way of sending it again (4us behind it
will be another DMa nad another interrupt), and the upper layer is still 
waiting for an event that will never happen.  the timeout hack allows the
application to be told about the failure ad releases all the resources
associated - preventing a hang.

This is it for now.  you feedbabck is very welcome.

Simon


From owner-freebsd-scsi  Mon Jul  7 07:49:21 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id HAA16817
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 07:49:21 -0700 (PDT)
Received: from cabri.obs-besancon.fr (cabri.obs-besancon.fr [193.52.184.3])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id HAA16762
          for <freebsd-scsi@freebsd.org>; Mon, 7 Jul 1997 07:48:38 -0700 (PDT)
Received: by cabri.obs-besancon.fr (5.57/Ultrix3.0-C)
	id AA05195; Mon, 7 Jul 97 16:49:15 +0100
Date: Mon, 7 Jul 97 16:49:15 +0100
Message-Id: <9707071549.AA05195@cabri.obs-besancon.fr>
From: Jean-Marc Zucconi <jmz@cabri.obs-besancon.fr>
To: Janick.Taillandier@ratp.fr
Cc: freebsd-scsi@freebsd.org
In-Reply-To: <19970706102441.37074@fugue.noisy.ratp>
	(Janick.Taillandier@ratp.fr)
Subject: Re: Problem with worm in current
X-Mailer: Emacs
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>>>>> Janick Taillandier writes:

 > On Sun, Jul 06, 1997 at 09:45:30AM +0200, J Wunsch wrote:
 >> > But when I am trying to burn a CD I get these messages :
 >> > 
 >> > |Jul  5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC
 >> 
 >> Hmm, i don't have the Philips/HP manual handy.  Somebody with the
 >> manual might decode it.  When does it happen?

 > When I am trying to write to the disk, after initializing it, with
 > (for example) :

 > rtprio 5 team -v 1m 5 < /mnt/jt/track01.pcm  | rtprio 5 dd of=/dev/rworm0 obs=20k 

 > I get :

 > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC

This means "Command Now Not Valid". Can you turn the debugging on so
that we can see what exactly happens?

Do you try to write a data or an audio track?

Jean-Marc
 _____________________________________________________________________________
 Jean-Marc Zucconi       Observatoire de Besancon       F 25010 Besancon cedex
                   PGP Key: finger jmz@cabri.obs-besancon.fr
 =============================================================================

From owner-freebsd-scsi  Mon Jul  7 11:44:32 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id LAA00384
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 11:44:32 -0700 (PDT)
Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id LAA00379
          for <freebsd-scsi@freebsd.org>; Mon, 7 Jul 1997 11:44:29 -0700 (PDT)
Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1])
          by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id UAA06496
          for <freebsd-scsi@freebsd.org>; Mon, 7 Jul 1997 20:44:22 +0200 (METDST)
Received: by arts.ratp.fr id UAA13387
	  for <freebsd-scsi@freebsd.org>; Mon, 7 Jul 1997 20:44:18 +0200 (DST)
Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA013385
	  for <freebsd-scsi@freebsd.org>; Mon Jul  7 20:43:58 1997
Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123])
 	  by minos.noisy.ratp  with ESMTP id UAA21194
 	  for <freebsd-scsi@freebsd.org>; Mon, 7 Jul 1997 20:43:57 +0200 (DST)
Received: by fugue.noisy.ratp id UAA00437
	  ; Mon, 7 Jul 1997 20:42:48 +0200 (DST)
From: Janick.Taillandier@ratp.fr (Janick Taillandier)
Message-ID: <19970707204247.46321@fugue.noisy.ratp>
Date: Mon, 7 Jul 1997 20:42:47 +0200
To: freebsd-scsi@freebsd.org
Subject: Re: Problem with worm in current
References: <19970706102441.37074@fugue.noisy.ratp> <9707071549.AA05195@cabri.obs-besancon.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.76e
In-Reply-To: <9707071549.AA05195@cabri.obs-besancon.fr>; from Jean-Marc Zucconi on Mon, Jul 07, 1997 at 04:49:15PM +0100
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Mon, Jul 07, 1997 at 04:49:15PM +0100, Jean-Marc Zucconi wrote:
> 
>  > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC
> 
> This means "Command Now Not Valid". Can you turn the debugging on so
> that we can see what exactly happens?

Sure. Il will mail you the results.

> Do you try to write a data or an audio track?

It was an audio track.

Janick Taillandier

From owner-freebsd-scsi  Mon Jul  7 17:34:10 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA17670
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 17:34:10 -0700 (PDT)
Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA17665
          for <scsi@FreeBSD.ORG>; Mon, 7 Jul 1997 17:34:07 -0700 (PDT)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
          by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP
	  id RAA22214; Mon, 7 Jul 1997 17:33:52 -0700 (PDT)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA09389;
	Mon, 7 Jul 1997 17:33:51 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA09068;
	Mon, 7 Jul 1997 17:33:45 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199707080033.RAA09068@salsa.gv.tsc.tdk.com>
Date: Mon, 7 Jul 1997 17:33:45 -0700
In-Reply-To: Keith Mitchell <kmitch@weenix.guru.org>
       "Re: Archive Viper and 3940UW (bad Drive?)" (Jul  5,  9:06pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: Keith Mitchell <kmitch@weenix.guru.org>,
        gibbs@plutotech.com (Justin T. Gibbs)
Subject: Re: Archive Viper and 3940UW (bad Drive?)
Cc: scsi@FreeBSD.ORG
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Jul 5,  9:06pm, Keith Mitchell wrote:
} Subject: Re: Archive Viper and 3940UW (bad Drive?)
} > You shouldn't have to erase used tapes before using them either.
} 
} OK, that is what I thought originally, but then given that erasing them
} "seemingly" solved the timeout problem I don't know what to think.  What
} does erasing a tape actually do?  It doesn't take but a few seconds.  My
} guess is it basically erases the header info that says there is data there,
} but I really don't know.
} 
} > If the tape drive needs to look at the media before it can respond, then
} > the .5s timeouts are way too short.

I've got some type of QIC-150 drive on a Sun, and it seems to require
several attempts to figure out the tape format or align its tape head or
whatever when it first tries to read a newly inserted tape.  It definitely
grinds and groans for quite a while.  This could be a problem when used with
Amanda, since Amanda always wants to read the tape to check the label before
it overwrites the tape.  I suspect that erasing the tape might speed things
up since the drive may be able to quickly detect that the tape is blank.
When the tape is used the next time, it may still respond faster since
either the data format on the tape may match the drive's expected "probe"
order or the head alignment might be better matched to the tape.

I'm suprised that you see the erase operation only take a few seconds.
It's been my experience that these drives make one full pass through the
tape with the erase head turned on, which erases all the serpentine tracks
in parallel.

FYI, the SunOS st driver defaults to a 2 minute I/O timeout and a 60
minute space timeout.  I had to increase the I/O timeout to 10 minutes
in order to reliably use a HP1553 DAT drive that occasionally decides
to do a head scrub if its error rate starts getting too high.

			---  Truck

From owner-freebsd-scsi  Mon Jul  7 19:47:02 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id TAA23299
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 19:47:02 -0700 (PDT)
Received: from mail.ican.net (mail.ican.net [204.92.49.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA23287
          for <scsi@freebsd.org>; Mon, 7 Jul 1997 19:46:53 -0700 (PDT)
Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7])
	by mail.ican.net (8.8.6/8.8.6) with ESMTP id WAA24476;
	Mon, 7 Jul 1997 22:46:32 -0400 (EDT)
Received: (from josh@localhost)
	by oddjob.ican.net (8.8.6/8.8.6) id WAA11504;
	Mon, 7 Jul 1997 22:46:47 -0400 (EDT)
Message-ID: <19970707224647.13985@ican.net>
Date: Mon, 7 Jul 1997 22:46:47 -0400
From: Josh Tiefenbach <josh@ican.net>
To: Simon Shapiro <Shimon@i-Connect.Net>
Cc: scsi@freebsd.org
Subject: Prob w/DPT driver v1.1.6
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Booting up a RELENG_2_2 box with the latest DPT driver:

dpt0: BAD (0) CCB in SP (status = 1011 0000).

from config file:

options DPT_USE_SINTR=1
options DPT_TRACK_CCB_USAGE
options DPT_MEASURE_PERFORMANCE
options DPT_HANDLE_TIMEOUTS

Error also occurs with the last 3 options turned off.

Suggestions?

josh

-- 
Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net

From owner-freebsd-scsi  Mon Jul  7 20:07:10 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id UAA23826
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 20:07:10 -0700 (PDT)
Received: from mail.ican.net (mail.ican.net [204.92.49.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA23821
          for <scsi@freebsd.org>; Mon, 7 Jul 1997 20:07:00 -0700 (PDT)
Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7])
	by mail.ican.net (8.8.6/8.8.6) with ESMTP id XAA29575;
	Mon, 7 Jul 1997 23:06:31 -0400 (EDT)
Received: (from josh@localhost)
	by oddjob.ican.net (8.8.6/8.8.6) id XAA19391;
	Mon, 7 Jul 1997 23:06:47 -0400 (EDT)
Message-ID: <19970707230647.52460@ican.net>
Date: Mon, 7 Jul 1997 23:06:47 -0400
From: Josh Tiefenbach <josh@ican.net>
To: Simon Shapiro <Shimon@i-Connect.Net>
Cc: scsi@freebsd.org
Subject: More on the DPT hangs/errors
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Excerpts from console log:

dpt0: BAD (0) CCB in SP (status = 1110 0000 ).
dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after 11763232usec
dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232)
dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after 17962817usec
dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814)

Note: first occurance during massive writes to non-RAIDED disks, second
occurance during a newfs of the RAIDed disks.

In both occurances, things `hung' at the time corresponding to the `BAD CCB',
and `unhung' at the time corresponding to the `Destroying stale...' message.

josh

-- 
Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net

From owner-freebsd-scsi  Mon Jul  7 23:29:29 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id XAA00707
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 23:29:29 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id XAA00687
          for <scsi@freebsd.org>; Mon, 7 Jul 1997 23:29:23 -0700 (PDT)
Received: (qmail 13741 invoked by uid 1000); 8 Jul 1997 06:29:34 -0000
Message-ID: <XFMail.970707232934.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19970707224647.13985@ican.net>
Date: Mon, 07 Jul 1997 23:29:34 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: Josh Tiefenbach <josh@ican.net>
Subject: RE: Prob w/DPT driver v1.1.6
Cc: scsi@freebsd.org
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi Josh Tiefenbach;  On 08-Jul-97 you wrote: 
> Booting up a RELENG_2_2 box with the latest DPT driver:

Does it work at all, or does it die after exactly this one message?

> dpt0: BAD (0) CCB in SP (status = 1011 0000).

This means that the DPT interrupted after a command completion but the
results are far from perfect;  The command completed has invalid address
and the status is totally messed up.  this normally indicates hardware 
problems. Can you please tell me the board model, S/N, rev level, along
with the firmware (the boot prompt reports that).  Also, I need the Mfg,
model of the motherboard this thing goes into.  How many PCI slots?
Is the DPT in a ``secondary (behind a bridge)'' or primary PCI slot?

I will contact DPT development with this data and try to resolve it.

> from config file:
> 
> options DPT_USE_SINTR=1
> options DPT_TRACK_CCB_USAGE
> options DPT_MEASURE_PERFORMANCE
> options DPT_HANDLE_TIMEOUTS

If you are on 1.1.6, you must have:

options DPT_SINTR_SPLHIGH

as well, or the above will also happen (at least).

> Error also occurs with the last 3 options turned off.

Do not turn off any of the above, for a while.  The savings (if you do)
are on the order of 2-7 microseconds per command.  The DPT serves a cache
hit in 250-270us.  A typical SCSI command (that goes to disk) takes
5-25ms.  This on an idle system, with a single command issued.

> Suggestions?

See above.

Keep me informed and thank you.

Simon

From owner-freebsd-scsi  Mon Jul  7 23:29:31 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id XAA00722
          for freebsd-scsi-outgoing; Mon, 7 Jul 1997 23:29:31 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id XAA00686
          for <scsi@freebsd.org>; Mon, 7 Jul 1997 23:29:23 -0700 (PDT)
Received: (qmail 13745 invoked by uid 1000); 8 Jul 1997 06:29:34 -0000
Message-ID: <XFMail.970707232934.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19970707230647.52460@ican.net>
Date: Mon, 07 Jul 1997 23:29:34 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: Josh Tiefenbach <josh@ican.net>
Subject: RE: More on the DPT hangs/errors
Cc: scsi@freebsd.org
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi Josh Tiefenbach;  On 08-Jul-97 you wrote: 
> 
> Excerpts from console log:
> 
> dpt0: BAD (0) CCB in SP (status = 1110 0000 ).
> dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after
> 11763232usec
> dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232)
> dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
> dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after
> 17962817usec
> dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814)

This is exactly how i wanted the timeouts to behave;  Wait as long as sd.c
wants, multiplied by ``business factor''.  If still there after twice as
long, destroy it and tell sd.c ``sorry''.  If the command somehow 
completes before destruction, it will be salvaged.  If it arrives after
destruction, the log will tell you that too.

In your case the command actually completed (with bad status), so it will
never complete again.

> Note: first occurance during massive writes to non-RAIDED disks, second
> occurance during a newfs of the RAIDed disks.

Make SURE you have ``options DPT_SINTR_SPLHIGH'' in your kernel.
Justin has suggested a better (read correct:-) way of doing it.  As soon
as his patch arrives here, I will integrate it and get rid of this flag.

> In both occurances, things `hung' at the time corresponding to the `BAD
> CCB',
> and `unhung' at the time corresponding to the `Destroying stale...'
> message.

Not the whole system, just the program going to disk, I presume (this is
what I see here).  This is normal;

Your program issues read or write syscalls.  These eventually trnaslate
into calls to sd.c.  In case of raw device (newfs), the syscall actually 
will wait for the I/O to complete.  Since the DPT has completed, but the
driver could not make sense of it, it ``never'' completes.  The timeout
mehanism will get tired of this request and abort it.  your application 
will get I/O error and all is (almost) well.  This is a crude way of
describing things but you get the point.

Simon

P.S.

As you may have gathered, there are some problems with DPT controllers on
certain motherboards.  This is being worked on.

From owner-freebsd-scsi  Tue Jul  8 07:05:13 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id HAA17978
          for freebsd-scsi-outgoing; Tue, 8 Jul 1997 07:05:13 -0700 (PDT)
Received: from mail.ican.net (mail.ican.net [204.92.49.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA17969
          for <scsi@freebsd.org>; Tue, 8 Jul 1997 07:05:05 -0700 (PDT)
Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7])
	by mail.ican.net (8.8.6/8.8.6) with ESMTP id KAA08827;
	Tue, 8 Jul 1997 10:04:39 -0400 (EDT)
Received: (from josh@localhost)
	by oddjob.ican.net (8.8.6/8.8.6) id KAA23394;
	Tue, 8 Jul 1997 10:04:55 -0400 (EDT)
Message-ID: <19970708100455.34701@ican.net>
Date: Tue, 8 Jul 1997 10:04:55 -0400
From: Josh Tiefenbach <josh@ican.net>
To: Simon Shapiro <Shimon@i-Connect.Net>
Cc: scsi@freebsd.org
Subject: Re: Prob w/DPT driver v1.1.6
References: <19970707224647.13985@ican.net> <XFMail.970707232934.Shimon@i-Connect.Net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74
In-Reply-To: <XFMail.970707232934.Shimon@i-Connect.Net>; from Simon Shapiro on Mon, Jul 07, 1997 at 11:29:34PM -0700
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Mon, Jul 07, 1997 at 11:29:34PM -0700, Simon Shapiro wrote:
[-- Warning: iso-8859-8 is not compatible with your display.]

> Hi Josh Tiefenbach;  On 08-Jul-97 you wrote: 
> > Booting up a RELENG_2_2 box with the latest DPT driver:
> 
> Does it work at all, or does it die after exactly this one message?

Referenced in my other message, but yes it works. It just hangs and comes back
a lot.

> and the status is totally messed up.  this normally indicates hardware 
> problems. Can you please tell me the board model, S/N, rev level, along

PM3334-UW. s/n: 66-010378. Firmware: 007L0 (3E7). Not sure of the rev, but
dmesg claims a rev of 2 on bootup. There's also a sticker on the back saying
``HA-0851-006-A'' if that helps.

> with the firmware (the boot prompt reports that).  Also, I need the Mfg,
> model of the motherboard this thing goes into.  How many PCI slots?
> Is the DPT in a ``secondary (behind a bridge)'' or primary PCI slot?

It was in a Compaq Deskpro 6000, Pentium Pro. The box is a production machine
(mostly), so it's kinda hard to crack the top and check # of slots right now.
The DPT was in the same slot as the 2940 usually occupies, so I suspect it was
a primary slot, but dont hold me to it.

> If you are on 1.1.6, you must have:
> 
> options DPT_SINTR_SPLHIGH

Ok. I'll try that next. I'm in the process in testing the thing in a
development box I have lying around, so I'll keep you posted on the results
there.

> Keep me informed and thank you.

You're welcome :)

josh

-- 
Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net

From owner-freebsd-scsi  Tue Jul  8 10:41:57 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id KAA28259
          for freebsd-scsi-outgoing; Tue, 8 Jul 1997 10:41:57 -0700 (PDT)
Received: from mail.ican.net (mail.ican.net [204.92.49.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA28252
          for <scsi@freebsd.org>; Tue, 8 Jul 1997 10:41:51 -0700 (PDT)
Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7])
	by mail.ican.net (8.8.6/8.8.6) with ESMTP id NAA05309;
	Tue, 8 Jul 1997 13:41:30 -0400 (EDT)
Received: (from josh@localhost)
	by oddjob.ican.net (8.8.6/8.8.6) id NAA07644;
	Tue, 8 Jul 1997 13:41:35 -0400 (EDT)
Message-ID: <19970708134134.36830@ican.net>
Date: Tue, 8 Jul 1997 13:41:34 -0400
From: Josh Tiefenbach <josh@ican.net>
To: Simon Shapiro <Shimon@i-Connect.Net>
Cc: scsi@freebsd.org
Subject: Re: Prob w/DPT driver v1.1.6 (update)
References: <19970707224647.13985@ican.net> <XFMail.970707232934.Shimon@i-Connect.Net> <19970708100455.34701@ican.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74
In-Reply-To: <19970708100455.34701@ican.net>; from Josh Tiefenbach on Tue, Jul 08, 1997 at 10:04:55AM -0400
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Tue, Jul 08, 1997 at 10:04:55AM -0400, I wrote:
> 
> Ok. I'll try that next. I'm in the process in testing the thing in a
> development box I have lying around

Update:

I've stuck the DPT into a Pentium box (apparently Magitronic brand - VX
chipset, P-100, DPT board in PCI slot 1. Buslogic card in PCI slot 2), and
recompiled with the DPT_SINTR_SPLHIGH option.

I pounded on the RAID (4 disks - Atlas I's upgraded to firmware rev L915,
RAID-5 conf) for a while - multiple tar's of large file trees, multiple dd's,
and a scp of /usr/src from a remote machine.

Everything seemed fine, until ~80 minutes into the scp, the machine locked.
Solid. Required power cycle. Note: This was the same behavior we had observed
previously (w/ v1.1.0 of the driver) on our production box (a news feeder) -
things would trundle along fine for ~ an hour, and then <wham> locked solid.

josh

-- 
Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net

From owner-freebsd-scsi  Tue Jul  8 11:41:32 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id LAA01272
          for freebsd-scsi-outgoing; Tue, 8 Jul 1997 11:41:32 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id LAA01253
          for <scsi@freebsd.org>; Tue, 8 Jul 1997 11:41:23 -0700 (PDT)
Received: (qmail 24102 invoked by uid 1000); 8 Jul 1997 18:41:29 -0000
Message-ID: <XFMail.970708114129.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19970708134134.36830@ican.net>
Date: Tue, 08 Jul 1997 11:41:29 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: Josh Tiefenbach <josh@ican.net>
Subject: Re: Prob w/DPT driver v1.1.6 (update)
Cc: scsi@freebsd.org
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi Josh Tiefenbach;  On 08-Jul-97 you wrote: 
> On Tue, Jul 08, 1997 at 10:04:55AM -0400, I wrote:
> > 
> > Ok. I'll try that next. I'm in the process in testing the thing in a
> > development box I have lying around
> 
> Update:
> 
> I've stuck the DPT into a Pentium box (apparently Magitronic brand - VX
> chipset, P-100, DPT board in PCI slot 1. Buslogic card in PCI slot 2),
> and
> recompiled with the DPT_SINTR_SPLHIGH option.
> 
> I pounded on the RAID (4 disks - Atlas I's upgraded to firmware rev L915,
> RAID-5 conf) for a while - multiple tar's of large file trees, multiple
> dd's,
> and a scp of /usr/src from a remote machine.
> 
> Everything seemed fine, until ~80 minutes into the scp, the machine
> locked.
> Solid. Required power cycle. Note: This was the same behavior we had
> observed
> previously (w/ v1.1.0 of the driver) on our production box (a news
> feeder) -
> things would trundle along fine for ~ an hour, and then <wham> locked
> solid.

I just received Justin's fixes.  I also introduced a BUG into the system in
the software interrupts.  Will be working on both today.  1.1.7 should be
out in few short days.

Thanx!

Simon

From owner-freebsd-scsi  Tue Jul  8 16:46:09 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA17180
          for freebsd-scsi-outgoing; Tue, 8 Jul 1997 16:46:09 -0700 (PDT)
Received: from tok.qiv.com ([204.214.141.211])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA17165
          for <freebsd-scsi@freebsd.org>; Tue, 8 Jul 1997 16:46:03 -0700 (PDT)
Received: (from uucp@localhost)
	by tok.qiv.com (8.8.5/8.8.5) with UUCP id SAA27891
	for freebsd-scsi@freebsd.org; Tue, 8 Jul 1997 18:45:27 -0500 (CDT)
Received: from localhost (jdn@localhost)
	by acp.qiv.com (8.8.5/8.8.5) with SMTP id SAA00398
	for <freebsd-scsi@freebsd.org>; Tue, 8 Jul 1997 18:33:14 -0500 (CDT)
X-Authentication-Warning: acp.qiv.com: jdn owned process doing -bs
Date: Tue, 8 Jul 1997 18:33:13 -0500 (CDT)
From: "Jay D. Nelson" <jdn@qiv.com>
To: freebsd-scsi@freebsd.org
Subject: Which Viper firmware level?
Message-ID: <Pine.BSF.3.96.970708182816.377A-100000@acp.qiv.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

I've finally located a source for formware upgrades for Viper 2525 tapes.
The Archive products have ended up at TSSI who will happily sell me the
upgrade. Firmware levels available are 25462-01[134] with unusual caveats
about OS support (?!). Does anyone know which I should buy?

Otherwise, I'll go with 011 as per the handbook.

Thanks for any insight.

-- Jay


From owner-freebsd-scsi  Wed Jul  9 05:13:28 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id FAA11807
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 05:13:28 -0700 (PDT)
Received: from cabri.obs-besancon.fr (cabri.obs-besancon.fr [193.52.184.3])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id FAA11799
          for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 05:13:20 -0700 (PDT)
Received: by cabri.obs-besancon.fr (5.57/Ultrix3.0-C)
	id AA22669; Wed, 9 Jul 97 14:13:49 +0100
Date: Wed, 9 Jul 97 14:13:49 +0100
Message-Id: <9707091313.AA22669@cabri.obs-besancon.fr>
From: Jean-Marc Zucconi <jmz@cabri.obs-besancon.fr>
To: Janick.Taillandier@ratp.fr
Cc: freebsd-scsi@freebsd.org
In-Reply-To: <19970707204247.46321@fugue.noisy.ratp>
	(Janick.Taillandier@ratp.fr)
Subject: Re: Problem with worm in current
X-Mailer: Emacs
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>>>>> Janick Taillandier writes:

 > On Mon, Jul 07, 1997 at 04:49:15PM +0100, Jean-Marc Zucconi wrote:
 >> 
 >> > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC
 >> 
 >> This means "Command Now Not Valid". Can you turn the debugging on so
 >> that we can see what exactly happens?

 > Sure. Il will mail you the results.

 >> Do you try to write a data or an audio track?

 > It was an audio track.

Ok. It seems that the get capacity command returns a block size of
2048 bytes even if it was previously set to another value. Can you try
the patch below? 

Index: worm.c
===================================================================
RCS file: /home/ncvs/src/sys/scsi/worm.c,v
retrieving revision 1.42
diff -u -r1.42 worm.c
--- worm.c	1997/07/01 00:22:51	1.42
+++ worm.c	1997/07/08 17:49:40
@@ -228,10 +228,11 @@
 {
 	errval ret;
 	struct scsi_data *worm = sc_link->sd;
+	int blk_size;
 
 	SC_DEBUG(sc_link, SDEV_DB2, ("worm_size"));
 
-	worm->n_blks = scsi_read_capacity(sc_link, &worm->blk_size,
+	worm->n_blks = scsi_read_capacity(sc_link, &blk_size,
 					  flags);
 
 	/*

Jean-Marc
 _____________________________________________________________________________
 Jean-Marc Zucconi       Observatoire de Besancon       F 25010 Besancon cedex
                   PGP Key: finger jmz@cabri.obs-besancon.fr
 =============================================================================

From owner-freebsd-scsi  Wed Jul  9 11:37:05 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id LAA29968
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 11:37:05 -0700 (PDT)
Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id LAA29959
          for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 11:37:01 -0700 (PDT)
Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1])
          by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id UAA25631
          for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 20:36:55 +0200 (METDST)
Received: by arts.ratp.fr id UAA25613
	  for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 20:36:50 +0200 (DST)
Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA025611
	  for <freebsd-scsi@freebsd.org>; Wed Jul  9 20:36:42 1997
Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123])
 	  by minos.noisy.ratp  with ESMTP id UAA01058
 	  for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 20:36:42 +0200 (DST)
Received: by fugue.noisy.ratp id UAA00503
	  ; Wed, 9 Jul 1997 20:35:21 +0200 (DST)
From: Janick.Taillandier@ratp.fr (Janick Taillandier)
Message-ID: <19970709203516.04741@fugue.noisy.ratp>
Date: Wed, 9 Jul 1997 20:35:16 +0200
To: freebsd-scsi@freebsd.org
Subject: Re: Problem with worm in current
References: <19970707204247.46321@fugue.noisy.ratp> <9707091313.AA22669@cabri.obs-besancon.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.76e
In-Reply-To: <9707091313.AA22669@cabri.obs-besancon.fr>; from Jean-Marc Zucconi on Wed, Jul 09, 1997 at 02:13:49PM +0100
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Wed, Jul 09, 1997 at 02:13:49PM +0100, Jean-Marc Zucconi wrote:
> 
> Ok. It seems that the get capacity command returns a block size of
> 2048 bytes even if it was previously set to another value. Can you try
> the patch below? 
> 
> Index: worm.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/scsi/worm.c,v
> retrieving revision 1.42
> diff -u -r1.42 worm.c
> --- worm.c	1997/07/01 00:22:51	1.42
> +++ worm.c	1997/07/08 17:49:40
> @@ -228,10 +228,11 @@
>  {
>  	errval ret;
>  	struct scsi_data *worm = sc_link->sd;
> +	int blk_size;
>  
>  	SC_DEBUG(sc_link, SDEV_DB2, ("worm_size"));
>  
> -	worm->n_blks = scsi_read_capacity(sc_link, &worm->blk_size,
> +	worm->n_blks = scsi_read_capacity(sc_link, &blk_size,
>  					  flags);
>  
>  	/*
> 
> Jean-Marc

Well... same result : worm0: ILLEGAL REQUEST asc:2c,0 Command sequence error
I am sending you the trace in debug mode.

Janick

From owner-freebsd-scsi  Wed Jul  9 17:14:58 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id RAA22110
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 17:14:58 -0700 (PDT)
Received: from krystal.sge.net (firewall-user@krystal.sge.net [152.91.9.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA22096
          for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 17:14:50 -0700 (PDT)
From: Wayne.Farmer@mailhost.dpie.gov.au
Received: (from uucp@localhost)
	by krystal.sge.net (8.8.5/8.8.6) id KAA26904
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 10:14:48 +1000 (EST)
Received: from jade.sge.net(10.1.1.254) by krystal.sge.net via smap (3.2)
	id xma026870; Thu, 10 Jul 97 10:14:26 +1000
Received: from SMTP (pryites.sge.net [10.1.1.246])
	by jade.sge.net (8.8.5/8.8.5) with SMTP id KAA13697
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 10:14:25 +1000 (EST)
Received: from zirconia.sge.net ([10.1.1.6]) by 10.1.1.246 (Norton AntiVirus for Internet Email Gateways 1.0) ; Thu, 10 Jul 1997 00:13:25 0000 (GMT)
Received: (from uucp@localhost)
	by zirconia.sge.net (8.8.5/8.8.5) id KAA04302
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 10:14:23 +1000 (EST)
Received: from ns2.dpie.gov.au(152.91.195.1) by zirconia.sge.net via smap (3.2)
	id xma004290; Thu, 10 Jul 97 10:14:05 +1000
Received: from talmalmo.dpie.gov.au (talmalmo.dpie.gov.au [152.91.195.222]) by conargo.dpie.gov.au with ESMTP id KAA14339
  (8.6.11/IDA-1.6 for <freebsd-scsi@freebsd.org>); Thu, 10 Jul 1997 10:14:06 +1000
X-Organisation: Department of Primary Industries and Energy
X-Url: http://www.dpie.gov.au/
X-Notice: Views expressed by this message are not necessarily those of the Department of Primary Industries and Energy or of the Government of the Commonwealth of Australia.
Received: (from x400@localhost) by talmalmo.dpie.gov.au (8.8.3/8.8.3+worldtalk-4.1) id KAA27105 for freebsd-scsi@freebsd.org; Thu, 10 Jul 1997 10:11:56 +1000 (EST)
Received: from TELEMEMO; Thu, 10 Jul 1997 10:11:26 +1000
Date: Thu, 10 Jul 1997 10:11:26 +1000
Subject: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2
To: freebsd-scsi@freebsd.org (Reply Requested)
Message-Id: <"970710001128Z.WT27093.  0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS>
X-Mailer: Worldtalk (4.1)/MIME
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

I think I read recently that doing a "newfs" on a wide SCSI disk with an 
Adaptec 2940UW hangs the system (a deadlock ?)

I can concur with this and confirm that turning off wide negotiation in the 
SCSI setup seems to correct this.

3 questions :

1)  Does anyone have any more info on this
2)  Is there updated driver code I can include in the kernel build
3)  Having "newfs"-ed, would turning back on wide negotiation lead to more 
problems similar to the "newfs" problem

Thanks

Wayne


From owner-freebsd-scsi  Wed Jul  9 18:00:44 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id SAA26504
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 18:00:44 -0700 (PDT)
Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA26494
          for <freebsd-scsi@FreeBSD.ORG>; Wed, 9 Jul 1997 18:00:39 -0700 (PDT)
Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5])
          by mail.cdsnet.net (8.8.5/8.7.3) with SMTP id SAA13941;
          Wed, 9 Jul 1997 18:00:32 -0700 (PDT)
Date: Wed, 9 Jul 1997 18:00:22 -0700 (PDT)
From: Jaye Mathisen  <mrcpu@cdsnet.net>
To: Wayne.Farmer@mailhost.dpie.gov.au
cc: Reply Requested <freebsd-scsi@FreeBSD.ORG>
Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2
In-Reply-To: <"970710001128Z.WT27093.  0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS>
Message-ID: <Pine.NEB.3.95.970709180001.23446l-100000@mail.cdsnet.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


WOrks fine for me in 2.2.2 and greater, on literally dozens of disks.

On Thu, 10 Jul 1997 Wayne.Farmer@mailhost.dpie.gov.au wrote:

> I think I read recently that doing a "newfs" on a wide SCSI disk with an 
> Adaptec 2940UW hangs the system (a deadlock ?)
> 
> I can concur with this and confirm that turning off wide negotiation in the 
> SCSI setup seems to correct this.
> 
> 3 questions :
> 
> 1)  Does anyone have any more info on this
> 2)  Is there updated driver code I can include in the kernel build
> 3)  Having "newfs"-ed, would turning back on wide negotiation lead to more 
> problems similar to the "newfs" problem
> 
> Thanks
> 
> Wayne
> 


From owner-freebsd-scsi  Wed Jul  9 19:08:24 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id TAA29378
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 19:08:24 -0700 (PDT)
Received: from mail.ican.net (mail.ican.net [204.92.49.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA29357
          for <scsi@freebsd.org>; Wed, 9 Jul 1997 19:08:09 -0700 (PDT)
Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7])
	by mail.ican.net (8.8.6/8.8.6) with ESMTP id WAA19711;
	Wed, 9 Jul 1997 22:07:50 -0400 (EDT)
Received: (from josh@localhost)
	by oddjob.ican.net (8.8.6/8.8.6) id WAA16393;
	Wed, 9 Jul 1997 22:07:49 -0400 (EDT)
Message-ID: <19970709220749.25037@ican.net>
Date: Wed, 9 Jul 1997 22:07:49 -0400
From: Josh Tiefenbach <josh@ican.net>
To: Simon Shapiro <Shimon@i-Connect.Net>
Cc: scsi@freebsd.org
Subject: Yet Another DPT Update
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

More with the updates. We've stuck the DPT back into the production box
(Compaq PPro 200).

dmesg:

DPT:  PCI SCSI HBA Driver, version 1.1.6
dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 11 on pci0:18
dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port 1410
dpt0: Options: USE_SINTR, TRACK_CCB_STATES, MEASURE_PERFORMANCE, HANDLE_TIMEOUTS, SINTR_SPLHIGH
dpt0 waiting for scsi devices to settle
(dpt0:0:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
sd0(dpt0:0:0): Direct-Access 2050MB (4199759 512 byte sectors)
(dpt0:1:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
sd1(dpt0:1:0): Direct-Access 2050MB (4199759 512 byte sectors)
(dpt0:2:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2
sd2(dpt0:2:0): Direct-Access 8201MB (16796928 512 byte sectors)

The following happened during a newfs of the RAID drive:

dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
dpt0: Marking 27627 (Write (10) [6.1.18]) on c0b0t2l0 as late after 10042353usec
dpt0: Destroying stale 27627 (Write (10) [6.1.18]) on c0b0t2l0 (20042335)
dpt0: Request 99041 recieved with clear EOC.  Marking as LOST.
dpt0: BAD (0) CCB in SP (status = 1110 0000 ).
dpt0: Marking 99948 (Write (10) [6.1.18]) on c0b0t2l0 as late after 18290511usec
dpt0: BAD (0) CCB in SP (status = 1110 0000 ).
dpt0: Destroying stale 99041 (Write (10) [6.1.18]) on c0b0t2l0 (29450684)
dpt0: Destroying stale 99948 (Write (10) [6.1.18]) on c0b0t2l0 (28293024)
dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
dpt0: Marking 114193 (Write (10) [6.1.18]) on c0b0t2l0 as late after 19983892usec
dpt0: Destroying stale 114193 (Write (10) [6.1.18]) on c0b0t2l0 (29983890)
dpt0: Marking 125669 (Write (10) [6.1.18]) on c0b0t2l0 as late after 15260348usec
dpt0: Destroying stale 125669 (Write (10) [6.1.18]) on c0b0t2l0 (25258082)

And this while running diablo ( a news feeder program, *not* the game :)

dpt0: BAD (0) CCB in SP (status = 1100 0000 ).
dpt0: Marking 128087 (Read (10) [6.1.5]) on c0b0t1l0 as late after 10862820usec
dpt0: Destroying stale 128087 (Read (10) [6.1.5]) on c0b0t1l0 (20862822)
dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
dpt0: Marking 129491 (Write (10) [6.1.18]) on c0b0t1l0 as late after 12998830usec
dpt0: Marking 129583 (Write (10) [6.1.18]) on c0b0t1l0 as late after 11917207usec
dpt0: Destroying stale 129491 (Write (10) [6.1.18]) on c0b0t1l0 (22998832)
dpt0: Destroying stale 129583 (Write (10) [6.1.18]) on c0b0t1l0 (21919370)

Again. I should point out that the above errors *did not happen* when using
the card, v1.1.6 of the driver w/same options, in a Pentium-100 box.

Shimon: a) Any other data that you need? b) any ETA on v1.1.7 of the driver?

josh

-- 
Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net

From owner-freebsd-scsi  Wed Jul  9 20:14:10 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id UAA02908
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 20:14:10 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id UAA02902
          for <scsi@freebsd.org>; Wed, 9 Jul 1997 20:14:02 -0700 (PDT)
Received: (qmail 23298 invoked by uid 1000); 10 Jul 1997 03:14:12 -0000
Message-ID: <XFMail.970709201412.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19970709220749.25037@ican.net>
Date: Wed, 09 Jul 1997 20:14:12 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: Josh Tiefenbach <josh@ican.net>
Subject: RE: Yet Another DPT Update
Cc: scsi@freebsd.org
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi Josh Tiefenbach;  On 10-Jul-97 you wrote: 
> More with the updates. We've stuck the DPT back into the production box
> (Compaq PPro 200).
> 
> dmesg:
> 
> DPT:  PCI SCSI HBA Driver, version 1.1.6
> dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 11 on pci0:18
> dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port 1410
> dpt0: Options: USE_SINTR, TRACK_CCB_STATES, MEASURE_PERFORMANCE,
> HANDLE_TIMEOUTS, SINTR_SPLHIGH
> dpt0 waiting for scsi devices to settle
> (dpt0:0:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
> sd0(dpt0:0:0): Direct-Access 2050MB (4199759 512 byte sectors)
> (dpt0:1:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
> sd1(dpt0:1:0): Direct-Access 2050MB (4199759 512 byte sectors)
> (dpt0:2:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2
> sd2(dpt0:2:0): Direct-Access 8201MB (16796928 512 byte sectors)
> 
> The following happened during a newfs of the RAID drive:
> 
> dpt0: BAD (0) CCB in SP (status = 0000 0000 ).

This is clearly what we see here on certain systems.  In this case BOTH 
the status register and the CCB are bogus.  This is not the data we expect,
not can the DPT generate these.  The PCI bus or some hardware along the
line is eating it.

> dpt0: Marking 27627 (Write (10) [6.1.18]) on c0b0t2l0 as late after
> 10042353usec

Since we threw away the corrupt CCB (not knowing which one it is), the
real command simply times out.

> dpt0: Destroying stale 27627 (Write (10) [6.1.18]) on c0b0t2l0 (20042335)

Now we lost patience with this I/O request.  We are going to do it in.

> dpt0: Request 99041 recieved with clear EOC.  Marking as LOST.

This one is probably noise on the bus.  If this bit is off, it means no 
command completed.  We treat it as a loss, since we know (hope) what the
command was but have no confidence in its integrity.

...  more of the same ...

> And this while running diablo ( a news feeder program, *not* the game :)

... and yet more ...

> Again. I should point out that the above errors *did not happen* when
> using
> the card, v1.1.6 of the driver w/same options, in a Pentium-100 box.

Sort of proves the point...  :-(

> Shimon: a) Any other data that you need? b) any ETA on v1.1.7 of the
> driver?

I forwarded your message to my DPT contact.  The certification people 
there want specific hardware setups.  I think the FreeBSD driver may be 
a bit faster than usual and that is why this problem is not so visible
on other platforms.  We can always put some delays in dpt_intr() and see
if things improve.  You can add ``DELAY(xx);'' somewhere at the very top,
and see if it makes any difference.

Let me know if that helps.  Version 1.1.7 is a merge of Justin's code
review.  It makes the code cleaner, somewhat leaner and (hopefully) much
more acceptable.  I also reversed toe (reversed) priorities for the SCSI
software interrupts, putting them in line with bio, rahter than net.

I will release 1.1.7 either tonight or tomorrow.

Simon

From owner-freebsd-scsi  Wed Jul  9 21:05:48 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id VAA05062
          for freebsd-scsi-outgoing; Wed, 9 Jul 1997 21:05:48 -0700 (PDT)
Received: from krystal.sge.net (firewall-user@krystal.sge.net [152.91.9.1])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA05054
          for <freebsd-scsi@freebsd.org>; Wed, 9 Jul 1997 21:05:42 -0700 (PDT)
From: Wayne.Farmer@mailhost.dpie.gov.au
Received: (from uucp@localhost)
	by krystal.sge.net (8.8.5/8.8.6) id OAA18873
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 14:05:39 +1000 (EST)
Received: from jade.sge.net(10.1.1.254) by krystal.sge.net via smap (3.2)
	id xma018852; Thu, 10 Jul 97 14:05:35 +1000
Received: from SMTP (pryites.sge.net [10.1.1.246])
	by jade.sge.net (8.8.5/8.8.5) with SMTP id OAA25831
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 14:05:34 +1000 (EST)
Received: from zirconia.sge.net ([10.1.1.6]) by 10.1.1.246 (Norton AntiVirus for Internet Email Gateways 1.0) ; Thu, 10 Jul 1997 04:04:34 0000 (GMT)
Received: (from uucp@localhost)
	by zirconia.sge.net (8.8.5/8.8.5) id OAA01488
	for <freebsd-scsi@freebsd.org>; Thu, 10 Jul 1997 14:05:33 +1000 (EST)
Received: from ns2.dpie.gov.au(152.91.195.1) by zirconia.sge.net via smap (3.2)
	id xma001409; Thu, 10 Jul 97 14:05:07 +1000
Received: from talmalmo.dpie.gov.au (talmalmo.dpie.gov.au [152.91.195.222]) by conargo.dpie.gov.au with ESMTP id OAA21716
  (8.6.11/IDA-1.6 for <freebsd-scsi@freebsd.org>); Thu, 10 Jul 1997 14:05:08 +1000
X-Organisation: Department of Primary Industries and Energy
X-Url: http://www.dpie.gov.au/
X-Notice: Views expressed by this message are not necessarily those of the Department of Primary Industries and Energy or of the Government of the Commonwealth of Australia.
Received: (from x400@localhost) by talmalmo.dpie.gov.au (8.8.3/8.8.3+worldtalk-4.1) id OAA05866 for freebsd-scsi@freebsd.org; Thu, 10 Jul 1997 14:02:57 +1000 (EST)
Received: from TELEMEMO; Thu, 10 Jul 1997 14:02:16 +1000
Date: Thu, 10 Jul 1997 14:02:16 +1000
Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2
To: freebsd-scsi@freebsd.org (Reply Requested)
Message-Id: <"970710040210Z.WT05849.  0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS>
X-Mailer: Worldtalk (4.1)/MIME
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

I have had several responses indicating that no-one else has this problem.
I have gone back to square 1 again, souped up the SCSI card to do wide again 
and everything seems OK.

Maybe I did do a "newfs /dev/sd1?" instead of using the raw device (/dev/rsd1?)

Wayne

PS I will go back to the bunker.

---------------------- Forwarded by Wayne Farmer/CORPHQ on 10/07/97 14:00 
---------------------------


owner-freebsd-scsi#064#FreeBSD.ORG - SMTPGATE@WT400 on 10/07/97 10:33:30
To:	freebsd-scsi#064#FreeBSD.ORG - SMTPGATE@WT400
cc:	 
Subject:	Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2

I think I read recently that doing a "newfs" on a wide SCSI disk with an 
Adaptec 2940UW hangs the system (a deadlock ?)

I can concur with this and confirm that turning off wide negotiation in the 
SCSI setup seems to correct this.

3 questions :

1)  Does anyone have any more info on this
2)  Is there updated driver code I can include in the kernel build
3)  Having "newfs"-ed, would turning back on wide negotiation lead to more 
problems similar to the "newfs" problem

Thanks

Wayne


From owner-freebsd-scsi  Thu Jul 10 09:09:55 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id JAA09169
          for freebsd-scsi-outgoing; Thu, 10 Jul 1997 09:09:55 -0700 (PDT)
Received: from pluto.plutotech.com (root@[206.168.67.137])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA09160
          for <freebsd-scsi@FreeBSD.ORG>; Thu, 10 Jul 1997 09:09:48 -0700 (PDT)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.5/8.8.5) with ESMTP id KAA17853;
	Thu, 10 Jul 1997 10:08:04 -0600 (MDT)
Message-Id: <199707101608.KAA17853@pluto.plutotech.com>
X-Mailer: exmh version 2.0beta 12/23/96
To: Wayne.Farmer@mailhost.dpie.gov.au
cc: freebsd-scsi@FreeBSD.ORG (Reply Requested)
Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 
In-reply-to: Your message of "Thu, 10 Jul 1997 10:11:26 +1000."
             <"970710001128Z.WT27093. 0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 10 Jul 1997 10:08:04 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>I think I read recently that doing a "newfs" on a wide SCSI disk with an 
>Adaptec 2940UW hangs the system (a deadlock ?)

The problem has nothing to do with the Adaptec driver or controller.  This
was a buffer deadlock bug that could be triggered by using the block 
instead of raw device when performing a newfs (newfs sd0a instead of newfs 
rsd0a).  By disabling wide negotiation to the device, you are changing the
timing characteristics slightly and perhaps avoiding this deadlock.  My
guess is that if you perform a newfs on the raw partition it will work
just fine even if wide negotiation is turned on.

--
Justin T. Gibbs
===========================================
  FreeBSD: Turning PCs into workstations
===========================================


From owner-freebsd-scsi  Fri Jul 11 08:33:49 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id IAA15498
          for freebsd-scsi-outgoing; Fri, 11 Jul 1997 08:33:49 -0700 (PDT)
Received: from shell.futuresouth.com (shell.futuresouth.com [207.141.254.20])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA15493
          for <freebsd-scsi@freebsd.org>; Fri, 11 Jul 1997 08:33:46 -0700 (PDT)
Received: (from tim@localhost)
	by shell.futuresouth.com (8.8.5/8.8.5) id KAA24270;
	Fri, 11 Jul 1997 10:33:44 -0500 (CDT)
Message-ID: <19970711103344.05270@shell.futuresouth.com>
Date: Fri, 11 Jul 1997 10:33:44 -0500
From: Tim Tsai <tim@futuresouth.com>
To: freebsd-scsi@freebsd.org
Subject: help debugging this
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.74e
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

  What's the likely problem for this, the hard disk reset itself?

  Thanks,

  Tim

sd0(ahc0:0:0): UNIT ATTENTION asc:29,0
sd0(ahc0:0:0):  Power on, reset, or bus device reset occurred
, retries:4
sd1(ahc1:4:0): parity error during Command phase.
sd1(ahc1:4:0): SCB 0x0 - timed out in command phase, SCSISIGI == 0x56
SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17
sd1(ahc1:4:0): abort message in message buffer
sd1(ahc1:4:0): SCB 0x1 - timed out in command phase, SCSISIGI == 0x56
SEQADDR = 0x43 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17
ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
Clearing bus reset
Clearing 'in-reset' flag
sd1(ahc1:4:0): no longer in timeout
sd1(ahc1:4:0): UNIT ATTENTION asc:29,2 
, retries:3

From owner-freebsd-scsi  Fri Jul 11 10:02:59 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id KAA20563
          for freebsd-scsi-outgoing; Fri, 11 Jul 1997 10:02:59 -0700 (PDT)
Received: from feral-gw.feral.com (mjacob@feral.mauswerks.net [204.152.96.10])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA20558
          for <freebsd-scsi@FreeBSD.ORG>; Fri, 11 Jul 1997 10:02:57 -0700 (PDT)
Received: (from mjacob@localhost) by feral-gw.feral.com (8.8.6/8.7.3) id JAA06015; Fri, 11 Jul 1997 09:58:21 -0700
Date: Fri, 11 Jul 1997 09:58:21 -0700
From: Matthew Jacob <mjacob@feral.com>
Message-Id: <199707111658.JAA06015@feral-gw.feral.com>
To: freebsd-scsi@FreeBSD.ORG, tim@futuresouth.com
Subject: Re: help debugging this
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Cables, or possibly double termination.


From owner-freebsd-scsi  Fri Jul 11 12:12:16 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA28596
          for freebsd-scsi-outgoing; Fri, 11 Jul 1997 12:12:16 -0700 (PDT)
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA28591
          for <freebsd-scsi@FreeBSD.ORG>; Fri, 11 Jul 1997 12:12:12 -0700 (PDT)
Received: (from daemon@localhost)
	by alpo.whistle.com (8.8.5/8.8.5) id LAA04109;
	Fri, 11 Jul 1997 11:16:02 -0700 (PDT)
Received: from current1.whistle.com(207.76.205.22)
 via SMTP by alpo.whistle.com, id smtpd004102; Fri Jul 11 18:15:56 1997
Message-ID: <33C67781.6F5992E1@whistle.com>
Date: Fri, 11 Jul 1997 11:12:17 -0700
From: Julian Elischer <julian@whistle.com>
Organization: Whistle Communications
X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.2-CURRENT i386)
MIME-Version: 1.0
To: Tim Tsai <tim@futuresouth.com>
CC: freebsd-scsi@FreeBSD.ORG
Subject: Re: help debugging this
References: <19970711103344.05270@shell.futuresouth.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Tim Tsai wrote:
> 
>   What's the likely problem for this, the hard disk reset itself?
> 
>   Thanks,
> 
>   Tim
> 
> sd0(ahc0:0:0): UNIT ATTENTION asc:29,0
         ^^^^^
> sd0(ahc0:0:0):  Power on, reset, or bus device reset occurred
> , retries:4
> sd1(ahc1:4:0): parity error during Command phase.
> sd1(ahc1:4:0): SCB 0x0 - timed out in command phase, SCSISIGI == 0x56
> SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17
> sd1(ahc1:4:0): abort message in message buffer
> sd1(ahc1:4:0): SCB 0x1 - timed out in command phase, SCSISIGI == 0x56
> SEQADDR = 0x43 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17
> ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
> Clearing bus reset
> Clearing 'in-reset' flag
> sd1(ahc1:4:0): no longer in timeout
> sd1(ahc1:4:0): UNIT ATTENTION asc:29,2
         ^^^^^
> , retries:3

two differnt disks had problems..
on differnt SCSI busses too!

your power supply (or it's connectors) is bad.

From owner-freebsd-scsi  Fri Jul 11 14:40:49 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id OAA04973
          for freebsd-scsi-outgoing; Fri, 11 Jul 1997 14:40:49 -0700 (PDT)
Received: from misery.sdf.com (misery.sdf.com [204.244.210.193])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id OAA04965
          for <freebsd-scsi@freebsd.org>; Fri, 11 Jul 1997 14:40:44 -0700 (PDT)
Received: from tom by misery.sdf.com with smtp (Exim 1.62 #1)
	id 0wmnLn-0007J2-00; Fri, 11 Jul 1997 14:35:39 -0700
Date: Fri, 11 Jul 1997 14:35:38 -0700 (PDT)
From: Tom Samplonius <tom@sdf.com>
To: freebsd-scsi@freebsd.org
Subject: location for DPT driver?
Message-ID: <Pine.BSF.3.95q.970711143058.28057A-100000@misery.sdf.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


  Where is the newer DPT controller?  I'm looking at
ftp.i-connect.com/crash but it doesn't look like it has been updated
recently.

Tom


From owner-freebsd-scsi  Sat Jul 12 03:45:35 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id DAA01262
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 03:45:35 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id DAA01250
          for <freebsd-SCSI@freebsd.org>; Sat, 12 Jul 1997 03:45:25 -0700 (PDT)
Received: (qmail 624 invoked by uid 1000); 12 Jul 1997 10:38:55 -0000
Message-ID: <XFMail.970712033855.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <199707120421.VAA11648@ns2.yahoo.com>
Date: Sat, 12 Jul 1997 03:38:55 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: filo@yahoo.com, freebsd-SCSI@freebsd.org
Subject: Re: problems with reboot
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi David Filo;  On 12-Jul-97 you wrote:

> when running the latest 1.1.7 code i noticed that the command being
> "marked" and later "destroyed" during reboot was the "remove media"
> command.  so i removed the DPT_HANDLE_TIMEOUTS option and it works
> fine now.  the umount during reboot can take > 30 seconds which is
> beyond your max timeout for scsi commands (i think).  so looks like
> you need to be careful on which commands you timeout and destroy.

Yup.  This whole timeout thing is bogus.  I trust you understand it 
by now :-)  It is only necessary for hardware platforms that corrupt
DMA transfers between the DPT and the main memory.  Actually, it is 
very probable that this is simply a delay, out of sync delay, rather
that corruption.

If you can live without DPT_HANDLE_TIMEOUTS, do so.  I recommend so,
as I do that myself.  The DPT firmware handles timeouts much better.
There is no need for it in the kernel, except as a survival tool.

> the next test was to simply hit the "reset" button while the dpt was
> chugging away on lots of untars.  unfortunately the first time i did
> this, the machine got hung on reboot in the dpt bios - never got past
> "waiting for dpt" message (first led kept blinking).  hitting reset a
> second time worked and the machine booted.  of course the filesystems
> were hosed, but they fscked fine.  so this sounds like a DPT firmware
> bug.  i have yet to reproduce this one in a few tries.  do you have a
> suggestion of who to talk to at dpt about this, or should i just go
> through the normal support channel.

what you describe is sensible but not a bug;  When you forcefully reset
the machine, if you were writing to a RAID-{1,5}, it is very possible
you did so in mid-transaction.  The DPT, upon boot, will try to restore 
the array to consistent state.  This operation may take a very long
while.  Getting stuck is not correct.  Did the card emit any beeps?
these actually indicate what the problem is.
What version of the firmware is it running?  It is visible durin boot,
and also in the syslog.  Upgrade to 7L0, try again (without the reset :-)
and call support.

> the next time i tried to duplicate the dpt hang (by hitting reset
> again), it came up fine (after fsck of course).  however as i started
> the multiple untars again, the machine panicked with the message
> "panic: blkfree: freeing free frag".  i was seeing this same behavior
> when the reboots weren't happening cleanly (i.e. machine comes up,
> fsck works, but then panic when accessing fs).  i would assume this is
> a 2.2 filesystem bug, but i'm not sure.  have you seen anything like
> this or have any reason to believe it's associated with the dpt
> driver?  i don't have much experience with 2.2 so i don't know if this
> is common.

Depends on how much memory you have, you can destroy up to 64Mb of 
disk writes.  There are very few filesystems that can survive this kind
of assault.  Even good file systems like Veritas vxfs, or (yes) NotTested
ntfs will not survive that.  One of the most robust filesystems ever
created, is an Oracle RDBMS (they do nor necessarily view their RDBMS
as a filesystem), will not survice losing 64Mb of data it thinks already
was committed to disk.

Many years ago i raced cars that had turbochargers on them.  The best
way to destroy one (for something that spins 130,000-200,000 rpm, bolted
to a car engine and sucking gasoline, they are very reliable), is to
open full throttle, on a running engine, and kill the power.
Why am I telling you this?  Every engineered product has a sure way
of destroying it by doing something that is doable and not clearly 
marked ``DO NOT DO THAT''.  The DPT controller assumes that normally,
computers do not push the reset button.  They are designed to resist a
single point of failure (SPOF).  What you do is MMPOF :-)  Smoke will
be emitted.

In a truely critical application, where application-side integrity
is more important than speed consierations, do the following:

* configure the DPT for write-through caches
* disable the caches on ALL the disk drives.
* Pray :-)  Some disk drives will NOT disable their caches when you tell
  them to.

> we have a lot more experience with 2.1 and the filesystem appears to
> be very stable.  which brings up the question: will your stuff work
> under 2.1?  if you think it's feasible i'll probably try to get it
> working under 2.1-stable to see if this filesystem problem persists.

The problem is not in the filesystem.  Put a good UPS between the CPU 
and the wall socket, cut off the reset button and it will work fine.
There is an issue with FreeBSD shutdown not waiting for the DPt to flush
caches as it should.

> finally, you've asked about posting/forwarding my questions/comments
> to other places.  no problems - do whatever you'd like with anything i
> say..

I do not know about that, but think thatthis particular exchange will
help other.

Simon

From owner-freebsd-scsi  Sat Jul 12 04:12:28 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id EAA01918
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 04:12:28 -0700 (PDT)
Received: from implode.root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA01913
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 04:12:22 -0700 (PDT)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.5/8.8.5) with ESMTP id EAA15976;
	Sat, 12 Jul 1997 04:13:14 -0700 (PDT)
Message-Id: <199707121113.EAA15976@implode.root.com>
To: Simon Shapiro <Shimon@i-Connect.Net>
cc: filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG
Subject: Re: problems with reboot 
In-reply-to: Your message of "Sat, 12 Jul 1997 03:38:55 PDT."
             <XFMail.970712033855.Shimon@i-Connect.Net> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Sat, 12 Jul 1997 04:13:14 -0700
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>There is an issue with FreeBSD shutdown not waiting for the DPt to flush
>caches as it should.

   Should be easy to fix by adding a shutdown routine to the driver that waits
for the flushes to complete.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project

From owner-freebsd-scsi  Sat Jul 12 11:50:30 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id LAA14993
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 11:50:30 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id LAA14968
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 11:50:16 -0700 (PDT)
Received: (qmail 23804 invoked by uid 1000); 12 Jul 1997 18:50:05 -0000
Message-ID: <XFMail.970712115005.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <199707121113.EAA15976@implode.root.com>
Date: Sat, 12 Jul 1997 11:50:05 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: dg@root.com
Subject: Re: problems with reboot
Cc: filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Hi David Greenman;  On 12-Jul-97 you wrote: 
> >There is an issue with FreeBSD shutdown not waiting for the DPt to flush
> >caches as it should.
> 
>    Should be easy to fix by adding a shutdown routine to the driver that
> waits
> for the flushes to complete.

I have not checked the code in this area, but all that I think is necessary
is for the umount(2) syscall to wait and block shutdown until it returns.
Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'',
which the DPT blocks until it is done flushing and invalidating.
I personally never have this problem on any of our machines, but...

BTW, on early UnixWare, the /sbin/reboot was actually a call to another
prgram that took somemysterious arguments (foobar 1 2), which given
incorrectly, would cause Unix to execute a halt, without any synching and
thus produce similar results.  Can /sbin/reboot do saomething similar?

Simon

From owner-freebsd-scsi  Sat Jul 12 12:18:07 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA15817
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 12:18:07 -0700 (PDT)
Received: from cais.cais.com (root@cais.com [199.0.216.4])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA15812
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 12:18:02 -0700 (PDT)
Received: from earth.mat.net (root@earth.mat.net [205.252.122.1]) by cais.cais.com (8.8.5/CJKv1.99-CAIS) with SMTP id PAA13085; Sat, 12 Jul 1997 15:17:52 -0400 (EDT)
Received: from Journey2.mat.net (journey2.mat.net [205.252.122.116]) by earth.mat.net (8.6.12/8.6.12) with SMTP id PAA22521; Sat, 12 Jul 1997 15:17:49 -0400
Date: Sat, 12 Jul 1997 15:17:28 -0400 (EDT)
From: Chuck Robey <chuckr@glue.umd.edu>
X-Sender: chuckr@Journey2.mat.net
To: Simon Shapiro <Shimon@i-Connect.Net>
cc: dg@root.com, filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG
Subject: Re: problems with reboot
In-Reply-To: <XFMail.970712115005.Shimon@i-Connect.Net>
Message-ID: <Pine.BSF.3.96.970712151441.28420C-100000@Journey2.mat.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Sat, 12 Jul 1997, Simon Shapiro wrote:

> 
> Hi David Greenman;  On 12-Jul-97 you wrote: 
> > >There is an issue with FreeBSD shutdown not waiting for the DPt to flush
> > >caches as it should.
> > 
> >    Should be easy to fix by adding a shutdown routine to the driver that
> > waits
> > for the flushes to complete.
> 
> I have not checked the code in this area, but all that I think is necessary
> is for the umount(2) syscall to wait and block shutdown until it returns.
> Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'',
> which the DPT blocks until it is done flushing and invalidating.
> I personally never have this problem on any of our machines, but...

Is this always safe?  I've had some instances where a umount call simply
hung, and never returned.  I think they were either nfs or msdos mounts
that gave this trouble, but the umount call could not be kill'ed, and
making shutdown wait?  Would halt still work, as an emergency measure?
I know the FSs that were hung wouldn't be closed, but at least my ufs FSs
would be clean.

----------------------------+-----------------------------------------------
Chuck Robey                 | Interests include any kind of voice or data 
chuckr@eng.umd.edu          | communications topic, C programming, and Unix.
213 Lakeside Drive Apt T-1  |
Greenbelt, MD 20770         | I run Journey2 and picnic, both FreeBSD
(301) 220-2114              | version 3.0 current -- and great FUN!
----------------------------+-----------------------------------------------


From owner-freebsd-scsi  Sat Jul 12 14:26:47 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id OAA20444
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 14:26:47 -0700 (PDT)
Received: from silvia.HIP.Berkeley.EDU (ala-ca32-05.ix.netcom.com [199.35.209.69])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id OAA20439;
          Sat, 12 Jul 1997 14:26:43 -0700 (PDT)
Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.8.6/8.6.9) id OAA09547; Sat, 12 Jul 1997 14:26:25 -0700 (PDT)
Date: Sat, 12 Jul 1997 14:26:25 -0700 (PDT)
Message-Id: <199707122126.OAA09547@silvia.HIP.Berkeley.EDU>
To: crb@Glue.umd.edu
CC: gary@tbe.net, freebsd-scsi@freebsd.org, freebsd-isp@freebsd.org,
        freebsd-hardware@freebsd.org
In-reply-to: <Pine.SOL.3.95q.970712164235.2380A-100000@periodic.eng.umd.edu> (crb@Glue.umd.edu)
Subject: Re: NCR SCSI controllers
From: asami@cs.berkeley.edu (Satoshi Asami)
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

 * From: crb@Glue.umd.edu

Gee, I know someone with a very similar mail address! ;)

 * > Sorry about the cross-post, but I thought the question is appropriate to
 * > both lists...

Unless you are asking something about performance under a huge number
of disks and heavy load or something, I don't see the relevance with 
-isp.  And for -hardware...we have -scsi just for these kind of
discussions (;), the next person to follow please chop -isp and 
-hardware out of the CC: list.

 * > We were looking at the NCR 53C810 and -815 PCI SCSI controllers and were
 * > just wondering if anybody has experience/problems with them.  Also if
 * > anyone has happened to compare them to Adaptec controllers, I would be
 * > glad to hear how they turned out.  TIA!

I've had 810- and 825-based controllers, they have worked very well
over the years.  However, the Adaptec is very stable now too.  The
main difference is probably the price ($70 for 810, $120 for 875,
$200+ for 2940*) and configurability.  I'm not sure if the current
NCR's BIOSes let you change the Adapter's ID's, sync/wide negotiations
per device, etc. -- mine doesn't, in fact mine doesn't even have a
boot setup menu.

Also, I don't know how the NCR controllers perform under heavy load as
I never had more than two disks on them -- the Adaptec generally works
fine with 14 disks in 10MHz mode or 8 disks in 20MHz mode (cable
length problem).

 * I do have to admit, however, that I am not getting ultra-wide speeds out
 * of my Tekram even though I have an ultra-wide capable IBM UltraStar 2es
 * but I haven't really looked into it yet to see if this is just a configuration
 * problem or what.

Our NCR driver doesn't support it yet.  Stefan Esser (se) is working
on it.  Based on past experience, my guess is that it will start
working soon because se is working on it. :)

Satoshi

From owner-freebsd-scsi  Sat Jul 12 15:40:32 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id PAA23185
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 15:40:32 -0700 (PDT)
Received: from george.lbl.gov (george.lbl.gov [128.3.196.93])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id PAA23180;
          Sat, 12 Jul 1997 15:40:28 -0700 (PDT)
Received: (jin@localhost) by george.lbl.gov (8.6.10/8.6.5) id PAA21522; Sat, 12 Jul 1997 15:40:21 -0700
Date: Sat, 12 Jul 1997 15:40:21 -0700
From: "Jin Guojun[ITG]" <jin@george.lbl.gov>
Message-Id: <199707122240.PAA21522@george.lbl.gov>
To: asami@cs.berkeley.edu, crb@Glue.umd.edu
Subject: Re: NCR SCSI controllers
Cc: freebsd-hardware@FreeBSD.ORG, freebsd-isp@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG, gary@tbe.net
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>Also, I don't know how the NCR controllers perform under heavy load as
>I never had more than two disks on them -- the Adaptec generally works
>fine with 14 disks in 10MHz mode or 8 disks in 20MHz mode (cable
>length problem).

I have no problem with NCR at all. Specially under FreeBSD, It does not take
CPU time. Two disks or 14 disks is not the issue for SCSI controllers.
If you can saturate the SCSI bus with two disks (new tech can), then, putting
100 disks (assume ID is allowed), would not make any difference at all.

See Hardware performance guide for Pentium family (new) under
http://www-itg.lbl.gov/ISS/hardware (two years old and it will be updated
soon :-)

> * I do have to admit, however, that I am not getting ultra-wide speeds out
> * of my Tekram even though I have an ultra-wide capable IBM UltraStar 2es
> * but I haven't really looked into it yet to see if this is just a configuratio
>n
> * problem or what.
>
>Our NCR driver doesn't support it yet.  Stefan Esser (se) is working
>on it.  Based on past experience, my guess is that it will start
>working soon because se is working on it. :)

Does some one have tested any ultra-wide SCSI controllers to have at least
more than 20 MB throughput over a single controller with number of ultra-wide
disks? I posted such question a few month ago, and did not hear any respond.
I was wondering no one had it worked at that time.

-Jin


From owner-freebsd-scsi  Sat Jul 12 16:09:24 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id QAA24204
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 16:09:24 -0700 (PDT)
Received: from silvia.HIP.Berkeley.EDU (ala-ca32-05.ix.netcom.com [199.35.209.69])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA24199
          for <freebsd-scsi@FreeBSD.ORG>; Sat, 12 Jul 1997 16:09:18 -0700 (PDT)
Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.8.6/8.6.9) id QAA10000; Sat, 12 Jul 1997 16:09:07 -0700 (PDT)
Date: Sat, 12 Jul 1997 16:09:07 -0700 (PDT)
Message-Id: <199707122309.QAA10000@silvia.HIP.Berkeley.EDU>
To: jin@george.lbl.gov
CC: crb@Glue.umd.edu, freebsd-scsi@FreeBSD.ORG, gary@tbe.net
In-reply-to: <199707122240.PAA21522@george.lbl.gov> (jin@george.lbl.gov)
Subject: Re: NCR SCSI controllers
From: asami@cs.berkeley.edu (Satoshi Asami)
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

 * Cc: freebsd-hardware@FreeBSD.ORG, freebsd-isp@FreeBSD.ORG,
 *         freebsd-scsi@FreeBSD.ORG, gary@tbe.net

Please don't continue crossposting....

 * I have no problem with NCR at all. Specially under FreeBSD, It does not take
 * CPU time. Two disks or 14 disks is not the issue for SCSI controllers.
 * If you can saturate the SCSI bus with two disks (new tech can), then, putting
 * 100 disks (assume ID is allowed), would not make any difference at all.

You may want to note that there is more to performance than sequential
throughput.

 * Does some one have tested any ultra-wide SCSI controllers to have at least
 * more than 20 MB throughput over a single controller with number of ultra-wide
 * disks? 

I have seen over 30MB/s on one of the channels of an Adaptec 3940UW
with 6 or 7 of the newest IBM drives.

 * 	  I posted such question a few month ago, and did not hear any respond.
 * I was wondering no one had it worked at that time.

Maybe you asked in a wrong list? :)

Satoshi

From owner-freebsd-scsi  Sat Jul 12 18:21:26 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id SAA28249
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:21:26 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id SAA28232
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 18:21:07 -0700 (PDT)
Received: (qmail 27240 invoked by uid 1000); 13 Jul 1997 01:21:03 -0000
Message-ID: <XFMail.970712182103.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.3.96.970712151441.28420C-100000@Journey2.mat.net>
Date: Sat, 12 Jul 1997 18:21:03 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: Chuck Robey <chuckr@glue.umd.edu>
Subject: Re: problems with reboot
Cc: freebsd-SCSI@FreeBSD.ORG, filo@yahoo.com, dg@root.com
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Hi Chuck Robey;  On 12-Jul-97 you wrote: 

...

> > Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'',
> > which the DPT blocks until it is done flushing and invalidating.
> > I personally never have this problem on any of our machines, but...
> 
> Is this always safe?  I've had some instances where a umount call simply
> hung, and never returned.  I think they were either nfs or msdos mounts
> that gave this trouble, but the umount call could not be kill'ed, and
> making shutdown wait?  Would halt still work, as an emergency measure?
> I know the FSs that were hung wouldn't be closed, but at least my ufs FSs
> would be clean.

Network Failure system is a special case (i AM being nice :-);
It is supposedly stateless and the mount is a client and thus not governing
physical I/O.  a shutdown can (should) probably force a umount.  Even on a 
local system, a forced umount is OK.  It is a FS issue.  But if the fs 
layer calls a function that by definition blocks, it is ``none of the
caller's business'' how/what the callee does and how long it takes.
To assume anything on the nature if a callee's internals is not a good
idea.  Here we have a live exapmple (why it is a bad idea).

Simon

From owner-freebsd-scsi  Sat Jul 12 18:39:54 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id SAA28875
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:39:54 -0700 (PDT)
Received: from ns2.yahoo.com (ns2.yahoo.com [205.216.162.20])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA28870
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 18:39:52 -0700 (PDT)
Received: (from filo@localhost) by ns2.yahoo.com (8.8.5/8.6.12) id SAA14919; Sat, 12 Jul 1997 18:38:13 -0700 (PDT)
Date: Sat, 12 Jul 1997 18:38:13 -0700 (PDT)
Message-Id: <199707130138.SAA14919@ns2.yahoo.com>
From: David Filo <filo@yahoo.com>
To: Shimon@i-Connect.Net
cc: freebsd-SCSI@FreeBSD.ORG, dg@root.com
In-Reply-To: <XFMail.970712115005.Shimon@i-Connect.Net>
Subject: Re: problems with reboot
Reply-To: filo@yahoo.com
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> > >There is an issue with FreeBSD shutdown not waiting for the DPt to flush
> > >caches as it should.
> > > 
> >    Should be easy to fix by adding a shutdown routine to the driver that
> > waits
> > for the flushes to complete.
> 
> I have not checked the code in this area, but all that I think is necessary
> is for the umount(2) syscall to wait and block shutdown until it returns.
> Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'',
> which the DPT blocks until it is done flushing and invalidating.
> I personally never have this problem on any of our machines, but...

umount(2) does wait correctly.  The problem in this case was that the
DPT driver was timing out the "ALLOW MEDIA REMOVAL" command sent to
the controller before it had a chance to finish flushing its cache.
The problem went away when I removed "options DPT_HANDLE_TIMEOUTS"
from the kernel config.  The result of this was that the "ALLOW MEDIA
REMOVAL" command was allowed to complete, umount waited around, and
everything shutdown cleanly.

If this explanation is correct, the DPT driver should be changed to
not timeout the "ALLOW MEDIA REMOVAL" when the DPT_HANDLE_TIMEOUTS
option is being used.

From owner-freebsd-scsi  Sat Jul 12 18:53:25 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id SAA29460
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:53:25 -0700 (PDT)
Received: from misery.sdf.com (misery.sdf.com [204.244.210.193])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id SAA29454
          for <freebsd-scsi@freebsd.org>; Sat, 12 Jul 1997 18:53:21 -0700 (PDT)
Received: from tom by misery.sdf.com with smtp (Exim 1.62 #1)
	id 0wnDls-0000nK-00; Sat, 12 Jul 1997 18:48:20 -0700
Date: Sat, 12 Jul 1997 18:48:19 -0700 (PDT)
From: Tom Samplonius <tom@sdf.com>
To: "Jin Guojun[ITG]" <jin@george.lbl.gov>
cc: asami@cs.berkeley.edu, crb@Glue.umd.edu, freebsd-scsi@freebsd.org,
        gary@tbe.net
Subject: Re: NCR SCSI controllers
In-Reply-To: <199707122240.PAA21522@george.lbl.gov>
Message-ID: <Pine.BSF.3.95q.970712184459.2861A-100000@misery.sdf.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


On Sat, 12 Jul 1997, Jin Guojun[ITG] wrote:

> I have no problem with NCR at all. Specially under FreeBSD, It does not take
> CPU time. Two disks or 14 disks is not the issue for SCSI controllers.
> If you can saturate the SCSI bus with two disks (new tech can), then, putting
> 100 disks (assume ID is allowed), would not make any difference at all.

  There is a difference.  Each SCSI channel has some transactional
limitation.

...
> Does some one have tested any ultra-wide SCSI controllers to have at least
> more than 20 MB throughput over a single controller with number of ultra-wide
> disks? I posted such question a few month ago, and did not hear any respond.
> I was wondering no one had it worked at that time.

  Not a problem.  I used 11 disks on a 3940UW, and was able to max out
both channels.  With drives being able to sustain 7MB writing, this is
getting easier to do.

> -Jin

Tom


From owner-freebsd-scsi  Sat Jul 12 21:03:42 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id VAA03937
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 21:03:42 -0700 (PDT)
Received: from george.lbl.gov (george.lbl.gov [128.3.196.93])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id VAA03928
          for <freebsd-scsi@FreeBSD.ORG>; Sat, 12 Jul 1997 21:03:36 -0700 (PDT)
Received: (jin@localhost) by george.lbl.gov (8.6.10/8.6.5) id VAA24519; Sat, 12 Jul 1997 21:03:29 -0700
Date: Sat, 12 Jul 1997 21:03:29 -0700
From: "Jin Guojun[ITG]" <jin@george.lbl.gov>
Message-Id: <199707130403.VAA24519@george.lbl.gov>
To: tom@sdf.com
Subject: Re: NCR SCSI controllers
Cc: asami@cs.berkeley.edu, crb@Glue.umd.edu, freebsd-scsi@FreeBSD.ORG,
        gary@tbe.net
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

}> Does some one have tested any ultra-wide SCSI controllers to have at least
}> more than 20 MB throughput over a single controller with number of ultra-wide
}> disks? I posted such question a few month ago, and did not hear any respond.
}> I was wondering no one had it worked at that time.
}
}  Not a problem.  I used 11 disks on a 3940UW, and was able to max out
}both channels.  With drives being able to sustain 7MB writing, this is
}getting easier to do.

When you say to sustain 7MB writing, do you mean using a single disk? I guess.
Because I can get 15MB writing over three disks via single NCR SCSI channel
(just wide, not ultra-wide). So, 3940UW supposes to have 30MB in writing and
35MB in reading. This will be seen in NCR-875 when the driver is ready (S.E).
Otherwise, the 7MB writing rate sounds not right.

-Jin


From owner-freebsd-scsi  Sat Jul 12 22:11:24 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id WAA05883
          for freebsd-scsi-outgoing; Sat, 12 Jul 1997 22:11:24 -0700 (PDT)
Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id WAA05873
          for <freebsd-SCSI@FreeBSD.ORG>; Sat, 12 Jul 1997 22:11:09 -0700 (PDT)
Received: (qmail 28877 invoked by uid 1000); 13 Jul 1997 05:11:04 -0000
Message-ID: <XFMail.970712221104.Shimon@i-Connect.Net>
X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD
Content-Type: text/plain; charset=iso-8859-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <199707130138.SAA14919@ns2.yahoo.com>
Date: Sat, 12 Jul 1997 22:11:04 -0700 (PDT)
Organization: Atlas Telecom
From: Simon Shapiro <Shimon@i-Connect.Net>
To: filo@yahoo.com
Subject: Re: problems with reboot
Cc: freebsd-SCSI@FreeBSD.ORG, dg@root.com
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Hi David Filo;  On 13-Jul-97 you wrote: 

...

> umount(2) does wait correctly.  The problem in this case was that the
> DPT driver was timing out the "ALLOW MEDIA REMOVAL" command sent to
> the controller before it had a chance to finish flushing its cache.
> The problem went away when I removed "options DPT_HANDLE_TIMEOUTS"
> from the kernel config.  The result of this was that the "ALLOW MEDIA
> REMOVAL" command was allowed to complete, umount waited around, and
> everything shutdown cleanly.

Ah...  Work from incomplete dataset and you are asured bad results...
This is probably why it ``does not happen here'' (hate that expresion).

> If this explanation is correct, the DPT driver should be changed to
> not timeout the "ALLOW MEDIA REMOVAL" when the DPT_HANDLE_TIMEOUTS
> option is being used.

What should be done is disable DPT_HANDLE_TIMEOUTS as a default.
The DPT firmware knows how to timeout better than you and me.
This is what we pay for :-)  The DPT_HANDLE_TIMEOUTS option is there
only to allow broken hardware to install, so that testing can be
conducted.

I had a report form a user who loaded the card to a max, pressed the
reset button only to find corrupt filesystemsupon reboot.  You simply
CANNOT do that with a standard DPT configuration.  We are building a
non-stop FreeBSD based transaction processor here.  To acomplish this 
level of reliability, you need to: Disable the DPT from resetting
when the CPU resets, setup all the caches as write-through (including
those on the disk drives), and assure an N+1 power to the CPU.

In a stand-alone PC environment, you will get a very high degree of 
reliability if you simply have a descent UPS protecting the AC to your
computer and stay away from the reset button.

Simon