From owner-freebsd-scsi Sun Jul 6 00:51:40 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id AAA13090 for freebsd-scsi-outgoing; Sun, 6 Jul 1997 00:51:40 -0700 (PDT) Received: from sax.sax.de (sax.sax.de [193.175.26.33]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id AAA13085 for ; Sun, 6 Jul 1997 00:51:37 -0700 (PDT) Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id JAA20078; Sun, 6 Jul 1997 09:51:29 +0200 Received: (from j@localhost) by uriah.heep.sax.de (8.8.5/8.8.5) id JAA15819; Sun, 6 Jul 1997 09:30:53 +0200 (MET DST) Message-ID: <19970706093053.ZG59677@uriah.heep.sax.de> Date: Sun, 6 Jul 1997 09:30:53 +0200 From: j@uriah.heep.sax.de (J Wunsch) To: scsi@FreeBSD.ORG Cc: kmitch@weenix.guru.org (Keith Mitchell) Subject: Re: Archive Viper and 3940UW (bad Drive?) References: <199707052152.PAA26449@pluto.plutotech.com> <199707060106.VAA12128@weenix.guru.org> X-Mailer: Mutt 0.60_p2-3,5,8-9 Mime-Version: 1.0 X-Phone: +49-351-2012 669 X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F 93 21 E0 7D F9 12 D6 4E Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk As Keith Mitchell wrote: > OK, that is what I thought originally, but then given that erasing them > "seemingly" solved the timeout problem I don't know what to think. What > does erasing a tape actually do? It doesn't take but a few seconds. This sounds wrong. For me, it makes an entire pass over the medium. This is also what i'm expecting. (Tandberg TDC4222, arbitrary QIC-150 cartridge.) I know erasing a tape takes forever on a DAT medium. QICs are faster here, since the erase head is really a quarter-inch head, erasing all the parallel tracks at once. QIC tapes are normally being erased before writing track 1 (i.e., while writing from the very beginning). -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) From owner-freebsd-scsi Sun Jul 6 00:51:58 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id AAA13136 for freebsd-scsi-outgoing; Sun, 6 Jul 1997 00:51:58 -0700 (PDT) Received: from sax.sax.de (sax.sax.de [193.175.26.33]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id AAA13126 for ; Sun, 6 Jul 1997 00:51:53 -0700 (PDT) Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id JAA20080; Sun, 6 Jul 1997 09:51:46 +0200 Received: (from j@localhost) by uriah.heep.sax.de (8.8.5/8.8.5) id JAA15872; Sun, 6 Jul 1997 09:45:31 +0200 (MET DST) Message-ID: <19970706094530.PU64236@uriah.heep.sax.de> Date: Sun, 6 Jul 1997 09:45:30 +0200 From: j@uriah.heep.sax.de (J Wunsch) To: freebsd-scsi@FreeBSD.ORG Cc: Janick.Taillandier@ratp.fr (Janick Taillandier) Subject: Re: Problem with worm in current References: <19970706081144.27908@fugue.noisy.ratp> X-Mailer: Mutt 0.60_p2-3,5,8-9 Mime-Version: 1.0 X-Phone: +49-351-2012 669 X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F 93 21 E0 7D F9 12 D6 4E Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) In-Reply-To: <19970706081144.27908@fugue.noisy.ratp>; from Janick Taillandier on Jul 6, 1997 08:11:44 +0200 Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk (Moved to freebsd-scsi.) As Janick Taillandier wrote: > But when I am trying to burn a CD I get these messages : > > |Jul 5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC Hmm, i don't have the Philips/HP manual handy. Somebody with the manual might decode it. When does it happen? > What is the status of this problem ? Do I need to return to > 2.2.2 ? This very likely won't help you at all. -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) From owner-freebsd-scsi Sun Jul 6 01:26:19 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id BAA14528 for freebsd-scsi-outgoing; Sun, 6 Jul 1997 01:26:19 -0700 (PDT) Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA14523 for ; Sun, 6 Jul 1997 01:26:16 -0700 (PDT) Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1]) by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id KAA09435 for ; Sun, 6 Jul 1997 10:26:14 +0200 (METDST) Received: by arts.ratp.fr id KAA02899 for ; Sun, 6 Jul 1997 10:26:11 +0200 (DST) Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA002897 for ; Sun Jul 6 10:25:43 1997 Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123]) by minos.noisy.ratp with ESMTP id KAA03590 for ; Sun, 6 Jul 1997 10:25:42 +0200 (DST) Received: by fugue.noisy.ratp id KAA00656 ; Sun, 6 Jul 1997 10:24:42 +0200 (DST) From: Janick.Taillandier@ratp.fr (Janick Taillandier) Message-ID: <19970706102441.37074@fugue.noisy.ratp> Date: Sun, 6 Jul 1997 10:24:41 +0200 To: freebsd-scsi@freebsd.org Subject: Re: Problem with worm in current References: <19970706081144.27908@fugue.noisy.ratp> <19970706094530.PU64236@uriah.heep.sax.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.76e In-Reply-To: <19970706094530.PU64236@uriah.heep.sax.de>; from J Wunsch on Sun, Jul 06, 1997 at 09:45:30AM +0200 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Sun, Jul 06, 1997 at 09:45:30AM +0200, J Wunsch wrote: > > But when I am trying to burn a CD I get these messages : > > > > |Jul 5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC > > Hmm, i don't have the Philips/HP manual handy. Somebody with the > manual might decode it. When does it happen? When I am trying to write to the disk, after initializing it, with (for example) : rtprio 5 team -v 1m 5 < /mnt/jt/track01.pcm | rtprio 5 dd of=/dev/rworm0 obs=20k I get : worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC worm0: ILLEGAL REQUEST asc:2c,0 Command sequence error Janick Taillandier From owner-freebsd-scsi Sun Jul 6 16:37:12 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA11446 for freebsd-scsi-outgoing; Sun, 6 Jul 1997 16:37:12 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id QAA11418 for ; Sun, 6 Jul 1997 16:36:40 -0700 (PDT) Received: (qmail 3287 invoked by uid 1000); 6 Jul 1997 23:36:19 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Sun, 06 Jul 1997 16:36:18 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: freebsd-SCSI@freebsd.org Subject: New Release - DPT RAID Controllers Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Y'all SCSIers, ftp.i-connect.net and/or sendero-ppp.i-connect.net now have version 1.1.6 of the FreeBSd driver for the DPT PCI SCSI RAId controllers. These are in the /pub/crash and /crash directory. I will also upload to freefall a copy of the patch. This is against RELENG_2_2 as of today. New in this release: * Several new config options. See sys/i386/conf/LINT for details. See more below on HANDLE_TIMEOUTS. * Got rid of an annoying bug that caused biodone panics. * SCSI software interrupts are now tested under heavy load (512 processed) and seem to be very healthy. Patch 1.1.6 includes all these changes. On SCSI Software Interrupts: I was asked by few of you what we have done here. so here it goes: We simply mirrored the net ISR code, and put it one noth below the netowrk priority. To use, do the following: #include Pick an interrupt. I normally use SCSISR_DPT (bit 0). There are 32 bits in the mask. For every interrupt, use a separate interrupt bit. For some strange reasons, the netisr code does not permit more than one interrupt per source file. As Justin Gibbs pointed out to me, you really do not need more than that. Next, write a routing that will execute every time that particular interrupt happens. Say, you call it foo_isr: static void foo_isr(void); Is a good declaration, and the function should be written to match. For this example, let us assume you want it to be associated with bit 7 of the SCSI software interrupts mask. Remember: when the function is called, it will be at a very high priority (appears higher than splbio(). We really do not know why yet, but it is under investigation. In any case, minimize your critical section. See dpt_scsi.c for details. Early in your code, put the following: SCSI_SET(SCSISR_7, foo_isr); then, at any point in your code, where you want foo_isr to execute, ASYNCHRONOUSLY with your code, call: schedscsisr(SCSISR_7); Once the kernel goes back to splzero, any request thus scheduled, will be called, in high priority! What is it good for? Just like in the networking code, it allows you to (essentially) start another thread of execution in the kernel. For example, the normal SCSI HBA driver receives a request for I/O, tinkers with it a bit (S/G, etc.) and then sends it to the hardware. This last action involved I/O bus operations and a moderate amount of polling. Instead, the DPT driver (almost) always puts the request in a queue and imeediately tells the SCSI system ``queued successfully''. It then schedules a software interrupt. The interrupt routine runs whenever it runs and processes the queue. This allows I/O requests to never block on (or be paced by) hardware. Under moderate I/O loads, it is a waste of time. Under heavy loads, it really makes a difference. What difference? With 512 processes concurrently reading and writing raw devices, the load average goes down from 280 to 0.03 (it went down to 20 with NET software interrupts). Yes, the system is still heavily loaded; Disk I/O can take as long as 13 seconds to complete. But, networking code, user code, etc. is still unhampered. Actually, even asynch I/O (buffered) improves dramatically. The maximum wait goes down to 85us waiting for the controller and 30us past the interrupt service. On that test load, the best interrupt latency is 3us and the worst 37us. This is within 10us of an idle system. BTW, these numbers are with a queue of 64 commands on the DPT hardware. Future release of the firmware will increase that to 256, 1024, and 8192. On DPT_HANDLE_TIMEOUTS: Normally, the DPT driver has no timeout mechanism in it, nor does it need one; the firmware on the controler does all the I/O management, re-tries, ECC, and other good stuff. With this option, commands will timeout after a while. The timeout mechanism works as follows: Once booted, every ten seconds, dpt_handle_timeouts() will be called. This function scans all submitted commands (sent to the DPT and not done yet). If a SCSI command is older than what the SCSI upper layer wants it to be (times the current number of requests on the controller), it is tagged. Tagged commands are given that much time again, to get done. If not, they are destroyed, and the upper layer is notified of the failure. this manifests itself (in functions that examine read/write syscalls results :-) as an I/O error to the program. Nothing more. If a command is completed during this grace period, it will be handled as if nothing happened 9except for a console message). If the command completes after destruction, the results are tossed away. We simulated, carefully, all these condsitions and it all appears to work. Why bother? Well, try to put a DPT behind certain PCI bridges. What happens then is that, on accasion, an interrupt will reach the DPT interrupt service routine sooner than the DMA transfer of the data stabilized across the bridge (the DPT always does a DMA of a status struct followed by an interrupt). The driver reacts to this nonsense by promptly tossing the whole completion report (we have NO way of telling what the cirrupt mailbox-struct should have been). While we so smartly tossed away the corrupt message, the DPT has no way of sending it again (4us behind it will be another DMa nad another interrupt), and the upper layer is still waiting for an event that will never happen. the timeout hack allows the application to be told about the failure ad releases all the resources associated - preventing a hang. This is it for now. you feedbabck is very welcome. Simon From owner-freebsd-scsi Mon Jul 7 07:49:21 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id HAA16817 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 07:49:21 -0700 (PDT) Received: from cabri.obs-besancon.fr (cabri.obs-besancon.fr [193.52.184.3]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id HAA16762 for ; Mon, 7 Jul 1997 07:48:38 -0700 (PDT) Received: by cabri.obs-besancon.fr (5.57/Ultrix3.0-C) id AA05195; Mon, 7 Jul 97 16:49:15 +0100 Date: Mon, 7 Jul 97 16:49:15 +0100 Message-Id: <9707071549.AA05195@cabri.obs-besancon.fr> From: Jean-Marc Zucconi To: Janick.Taillandier@ratp.fr Cc: freebsd-scsi@freebsd.org In-Reply-To: <19970706102441.37074@fugue.noisy.ratp> (Janick.Taillandier@ratp.fr) Subject: Re: Problem with worm in current X-Mailer: Emacs Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >>>>> Janick Taillandier writes: > On Sun, Jul 06, 1997 at 09:45:30AM +0200, J Wunsch wrote: >> > But when I am trying to burn a CD I get these messages : >> > >> > |Jul 5 23:02:25 chaconne /kernel: worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC >> >> Hmm, i don't have the Philips/HP manual handy. Somebody with the >> manual might decode it. When does it happen? > When I am trying to write to the disk, after initializing it, with > (for example) : > rtprio 5 team -v 1m 5 < /mnt/jt/track01.pcm | rtprio 5 dd of=/dev/rworm0 obs=20k > I get : > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC This means "Command Now Not Valid". Can you turn the debugging on so that we can see what exactly happens? Do you try to write a data or an audio track? Jean-Marc _____________________________________________________________________________ Jean-Marc Zucconi Observatoire de Besancon F 25010 Besancon cedex PGP Key: finger jmz@cabri.obs-besancon.fr ============================================================================= From owner-freebsd-scsi Mon Jul 7 11:44:32 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id LAA00384 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 11:44:32 -0700 (PDT) Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id LAA00379 for ; Mon, 7 Jul 1997 11:44:29 -0700 (PDT) Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1]) by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id UAA06496 for ; Mon, 7 Jul 1997 20:44:22 +0200 (METDST) Received: by arts.ratp.fr id UAA13387 for ; Mon, 7 Jul 1997 20:44:18 +0200 (DST) Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA013385 for ; Mon Jul 7 20:43:58 1997 Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123]) by minos.noisy.ratp with ESMTP id UAA21194 for ; Mon, 7 Jul 1997 20:43:57 +0200 (DST) Received: by fugue.noisy.ratp id UAA00437 ; Mon, 7 Jul 1997 20:42:48 +0200 (DST) From: Janick.Taillandier@ratp.fr (Janick Taillandier) Message-ID: <19970707204247.46321@fugue.noisy.ratp> Date: Mon, 7 Jul 1997 20:42:47 +0200 To: freebsd-scsi@freebsd.org Subject: Re: Problem with worm in current References: <19970706102441.37074@fugue.noisy.ratp> <9707071549.AA05195@cabri.obs-besancon.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.76e In-Reply-To: <9707071549.AA05195@cabri.obs-besancon.fr>; from Jean-Marc Zucconi on Mon, Jul 07, 1997 at 04:49:15PM +0100 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Mon, Jul 07, 1997 at 04:49:15PM +0100, Jean-Marc Zucconi wrote: > > > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC > > This means "Command Now Not Valid". Can you turn the debugging on so > that we can see what exactly happens? Sure. Il will mail you the results. > Do you try to write a data or an audio track? It was an audio track. Janick Taillandier From owner-freebsd-scsi Mon Jul 7 17:34:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA17670 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 17:34:10 -0700 (PDT) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA17665 for ; Mon, 7 Jul 1997 17:34:07 -0700 (PDT) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id RAA22214; Mon, 7 Jul 1997 17:33:52 -0700 (PDT) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA09389; Mon, 7 Jul 1997 17:33:51 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA09068; Mon, 7 Jul 1997 17:33:45 -0700 (PDT) From: Don Lewis Message-Id: <199707080033.RAA09068@salsa.gv.tsc.tdk.com> Date: Mon, 7 Jul 1997 17:33:45 -0700 In-Reply-To: Keith Mitchell "Re: Archive Viper and 3940UW (bad Drive?)" (Jul 5, 9:06pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: Keith Mitchell , gibbs@plutotech.com (Justin T. Gibbs) Subject: Re: Archive Viper and 3940UW (bad Drive?) Cc: scsi@FreeBSD.ORG Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Jul 5, 9:06pm, Keith Mitchell wrote: } Subject: Re: Archive Viper and 3940UW (bad Drive?) } > You shouldn't have to erase used tapes before using them either. } } OK, that is what I thought originally, but then given that erasing them } "seemingly" solved the timeout problem I don't know what to think. What } does erasing a tape actually do? It doesn't take but a few seconds. My } guess is it basically erases the header info that says there is data there, } but I really don't know. } } > If the tape drive needs to look at the media before it can respond, then } > the .5s timeouts are way too short. I've got some type of QIC-150 drive on a Sun, and it seems to require several attempts to figure out the tape format or align its tape head or whatever when it first tries to read a newly inserted tape. It definitely grinds and groans for quite a while. This could be a problem when used with Amanda, since Amanda always wants to read the tape to check the label before it overwrites the tape. I suspect that erasing the tape might speed things up since the drive may be able to quickly detect that the tape is blank. When the tape is used the next time, it may still respond faster since either the data format on the tape may match the drive's expected "probe" order or the head alignment might be better matched to the tape. I'm suprised that you see the erase operation only take a few seconds. It's been my experience that these drives make one full pass through the tape with the erase head turned on, which erases all the serpentine tracks in parallel. FYI, the SunOS st driver defaults to a 2 minute I/O timeout and a 60 minute space timeout. I had to increase the I/O timeout to 10 minutes in order to reliably use a HP1553 DAT drive that occasionally decides to do a head scrub if its error rate starts getting too high. --- Truck From owner-freebsd-scsi Mon Jul 7 19:47:02 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id TAA23299 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 19:47:02 -0700 (PDT) Received: from mail.ican.net (mail.ican.net [204.92.49.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA23287 for ; Mon, 7 Jul 1997 19:46:53 -0700 (PDT) Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7]) by mail.ican.net (8.8.6/8.8.6) with ESMTP id WAA24476; Mon, 7 Jul 1997 22:46:32 -0400 (EDT) Received: (from josh@localhost) by oddjob.ican.net (8.8.6/8.8.6) id WAA11504; Mon, 7 Jul 1997 22:46:47 -0400 (EDT) Message-ID: <19970707224647.13985@ican.net> Date: Mon, 7 Jul 1997 22:46:47 -0400 From: Josh Tiefenbach To: Simon Shapiro Cc: scsi@freebsd.org Subject: Prob w/DPT driver v1.1.6 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Booting up a RELENG_2_2 box with the latest DPT driver: dpt0: BAD (0) CCB in SP (status = 1011 0000). from config file: options DPT_USE_SINTR=1 options DPT_TRACK_CCB_USAGE options DPT_MEASURE_PERFORMANCE options DPT_HANDLE_TIMEOUTS Error also occurs with the last 3 options turned off. Suggestions? josh -- Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net From owner-freebsd-scsi Mon Jul 7 20:07:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id UAA23826 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 20:07:10 -0700 (PDT) Received: from mail.ican.net (mail.ican.net [204.92.49.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA23821 for ; Mon, 7 Jul 1997 20:07:00 -0700 (PDT) Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7]) by mail.ican.net (8.8.6/8.8.6) with ESMTP id XAA29575; Mon, 7 Jul 1997 23:06:31 -0400 (EDT) Received: (from josh@localhost) by oddjob.ican.net (8.8.6/8.8.6) id XAA19391; Mon, 7 Jul 1997 23:06:47 -0400 (EDT) Message-ID: <19970707230647.52460@ican.net> Date: Mon, 7 Jul 1997 23:06:47 -0400 From: Josh Tiefenbach To: Simon Shapiro Cc: scsi@freebsd.org Subject: More on the DPT hangs/errors Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Excerpts from console log: dpt0: BAD (0) CCB in SP (status = 1110 0000 ). dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after 11763232usec dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232) dpt0: BAD (0) CCB in SP (status = 0000 0000 ). dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after 17962817usec dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814) Note: first occurance during massive writes to non-RAIDED disks, second occurance during a newfs of the RAIDed disks. In both occurances, things `hung' at the time corresponding to the `BAD CCB', and `unhung' at the time corresponding to the `Destroying stale...' message. josh -- Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net From owner-freebsd-scsi Mon Jul 7 23:29:29 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id XAA00707 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 23:29:29 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id XAA00687 for ; Mon, 7 Jul 1997 23:29:23 -0700 (PDT) Received: (qmail 13741 invoked by uid 1000); 8 Jul 1997 06:29:34 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19970707224647.13985@ican.net> Date: Mon, 07 Jul 1997 23:29:34 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: Josh Tiefenbach Subject: RE: Prob w/DPT driver v1.1.6 Cc: scsi@freebsd.org Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Josh Tiefenbach; On 08-Jul-97 you wrote: > Booting up a RELENG_2_2 box with the latest DPT driver: Does it work at all, or does it die after exactly this one message? > dpt0: BAD (0) CCB in SP (status = 1011 0000). This means that the DPT interrupted after a command completion but the results are far from perfect; The command completed has invalid address and the status is totally messed up. this normally indicates hardware problems. Can you please tell me the board model, S/N, rev level, along with the firmware (the boot prompt reports that). Also, I need the Mfg, model of the motherboard this thing goes into. How many PCI slots? Is the DPT in a ``secondary (behind a bridge)'' or primary PCI slot? I will contact DPT development with this data and try to resolve it. > from config file: > > options DPT_USE_SINTR=1 > options DPT_TRACK_CCB_USAGE > options DPT_MEASURE_PERFORMANCE > options DPT_HANDLE_TIMEOUTS If you are on 1.1.6, you must have: options DPT_SINTR_SPLHIGH as well, or the above will also happen (at least). > Error also occurs with the last 3 options turned off. Do not turn off any of the above, for a while. The savings (if you do) are on the order of 2-7 microseconds per command. The DPT serves a cache hit in 250-270us. A typical SCSI command (that goes to disk) takes 5-25ms. This on an idle system, with a single command issued. > Suggestions? See above. Keep me informed and thank you. Simon From owner-freebsd-scsi Mon Jul 7 23:29:31 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id XAA00722 for freebsd-scsi-outgoing; Mon, 7 Jul 1997 23:29:31 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id XAA00686 for ; Mon, 7 Jul 1997 23:29:23 -0700 (PDT) Received: (qmail 13745 invoked by uid 1000); 8 Jul 1997 06:29:34 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19970707230647.52460@ican.net> Date: Mon, 07 Jul 1997 23:29:34 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: Josh Tiefenbach Subject: RE: More on the DPT hangs/errors Cc: scsi@freebsd.org Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Josh Tiefenbach; On 08-Jul-97 you wrote: > > Excerpts from console log: > > dpt0: BAD (0) CCB in SP (status = 1110 0000 ). > dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after > 11763232usec > dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232) > dpt0: BAD (0) CCB in SP (status = 0000 0000 ). > dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after > 17962817usec > dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814) This is exactly how i wanted the timeouts to behave; Wait as long as sd.c wants, multiplied by ``business factor''. If still there after twice as long, destroy it and tell sd.c ``sorry''. If the command somehow completes before destruction, it will be salvaged. If it arrives after destruction, the log will tell you that too. In your case the command actually completed (with bad status), so it will never complete again. > Note: first occurance during massive writes to non-RAIDED disks, second > occurance during a newfs of the RAIDed disks. Make SURE you have ``options DPT_SINTR_SPLHIGH'' in your kernel. Justin has suggested a better (read correct:-) way of doing it. As soon as his patch arrives here, I will integrate it and get rid of this flag. > In both occurances, things `hung' at the time corresponding to the `BAD > CCB', > and `unhung' at the time corresponding to the `Destroying stale...' > message. Not the whole system, just the program going to disk, I presume (this is what I see here). This is normal; Your program issues read or write syscalls. These eventually trnaslate into calls to sd.c. In case of raw device (newfs), the syscall actually will wait for the I/O to complete. Since the DPT has completed, but the driver could not make sense of it, it ``never'' completes. The timeout mehanism will get tired of this request and abort it. your application will get I/O error and all is (almost) well. This is a crude way of describing things but you get the point. Simon P.S. As you may have gathered, there are some problems with DPT controllers on certain motherboards. This is being worked on. From owner-freebsd-scsi Tue Jul 8 07:05:13 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id HAA17978 for freebsd-scsi-outgoing; Tue, 8 Jul 1997 07:05:13 -0700 (PDT) Received: from mail.ican.net (mail.ican.net [204.92.49.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA17969 for ; Tue, 8 Jul 1997 07:05:05 -0700 (PDT) Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7]) by mail.ican.net (8.8.6/8.8.6) with ESMTP id KAA08827; Tue, 8 Jul 1997 10:04:39 -0400 (EDT) Received: (from josh@localhost) by oddjob.ican.net (8.8.6/8.8.6) id KAA23394; Tue, 8 Jul 1997 10:04:55 -0400 (EDT) Message-ID: <19970708100455.34701@ican.net> Date: Tue, 8 Jul 1997 10:04:55 -0400 From: Josh Tiefenbach To: Simon Shapiro Cc: scsi@freebsd.org Subject: Re: Prob w/DPT driver v1.1.6 References: <19970707224647.13985@ican.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74 In-Reply-To: ; from Simon Shapiro on Mon, Jul 07, 1997 at 11:29:34PM -0700 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Mon, Jul 07, 1997 at 11:29:34PM -0700, Simon Shapiro wrote: [-- Warning: iso-8859-8 is not compatible with your display.] > Hi Josh Tiefenbach; On 08-Jul-97 you wrote: > > Booting up a RELENG_2_2 box with the latest DPT driver: > > Does it work at all, or does it die after exactly this one message? Referenced in my other message, but yes it works. It just hangs and comes back a lot. > and the status is totally messed up. this normally indicates hardware > problems. Can you please tell me the board model, S/N, rev level, along PM3334-UW. s/n: 66-010378. Firmware: 007L0 (3E7). Not sure of the rev, but dmesg claims a rev of 2 on bootup. There's also a sticker on the back saying ``HA-0851-006-A'' if that helps. > with the firmware (the boot prompt reports that). Also, I need the Mfg, > model of the motherboard this thing goes into. How many PCI slots? > Is the DPT in a ``secondary (behind a bridge)'' or primary PCI slot? It was in a Compaq Deskpro 6000, Pentium Pro. The box is a production machine (mostly), so it's kinda hard to crack the top and check # of slots right now. The DPT was in the same slot as the 2940 usually occupies, so I suspect it was a primary slot, but dont hold me to it. > If you are on 1.1.6, you must have: > > options DPT_SINTR_SPLHIGH Ok. I'll try that next. I'm in the process in testing the thing in a development box I have lying around, so I'll keep you posted on the results there. > Keep me informed and thank you. You're welcome :) josh -- Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net From owner-freebsd-scsi Tue Jul 8 10:41:57 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id KAA28259 for freebsd-scsi-outgoing; Tue, 8 Jul 1997 10:41:57 -0700 (PDT) Received: from mail.ican.net (mail.ican.net [204.92.49.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA28252 for ; Tue, 8 Jul 1997 10:41:51 -0700 (PDT) Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7]) by mail.ican.net (8.8.6/8.8.6) with ESMTP id NAA05309; Tue, 8 Jul 1997 13:41:30 -0400 (EDT) Received: (from josh@localhost) by oddjob.ican.net (8.8.6/8.8.6) id NAA07644; Tue, 8 Jul 1997 13:41:35 -0400 (EDT) Message-ID: <19970708134134.36830@ican.net> Date: Tue, 8 Jul 1997 13:41:34 -0400 From: Josh Tiefenbach To: Simon Shapiro Cc: scsi@freebsd.org Subject: Re: Prob w/DPT driver v1.1.6 (update) References: <19970707224647.13985@ican.net> <19970708100455.34701@ican.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74 In-Reply-To: <19970708100455.34701@ican.net>; from Josh Tiefenbach on Tue, Jul 08, 1997 at 10:04:55AM -0400 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Tue, Jul 08, 1997 at 10:04:55AM -0400, I wrote: > > Ok. I'll try that next. I'm in the process in testing the thing in a > development box I have lying around Update: I've stuck the DPT into a Pentium box (apparently Magitronic brand - VX chipset, P-100, DPT board in PCI slot 1. Buslogic card in PCI slot 2), and recompiled with the DPT_SINTR_SPLHIGH option. I pounded on the RAID (4 disks - Atlas I's upgraded to firmware rev L915, RAID-5 conf) for a while - multiple tar's of large file trees, multiple dd's, and a scp of /usr/src from a remote machine. Everything seemed fine, until ~80 minutes into the scp, the machine locked. Solid. Required power cycle. Note: This was the same behavior we had observed previously (w/ v1.1.0 of the driver) on our production box (a news feeder) - things would trundle along fine for ~ an hour, and then locked solid. josh -- Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net From owner-freebsd-scsi Tue Jul 8 11:41:32 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id LAA01272 for freebsd-scsi-outgoing; Tue, 8 Jul 1997 11:41:32 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id LAA01253 for ; Tue, 8 Jul 1997 11:41:23 -0700 (PDT) Received: (qmail 24102 invoked by uid 1000); 8 Jul 1997 18:41:29 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19970708134134.36830@ican.net> Date: Tue, 08 Jul 1997 11:41:29 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: Josh Tiefenbach Subject: Re: Prob w/DPT driver v1.1.6 (update) Cc: scsi@freebsd.org Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Josh Tiefenbach; On 08-Jul-97 you wrote: > On Tue, Jul 08, 1997 at 10:04:55AM -0400, I wrote: > > > > Ok. I'll try that next. I'm in the process in testing the thing in a > > development box I have lying around > > Update: > > I've stuck the DPT into a Pentium box (apparently Magitronic brand - VX > chipset, P-100, DPT board in PCI slot 1. Buslogic card in PCI slot 2), > and > recompiled with the DPT_SINTR_SPLHIGH option. > > I pounded on the RAID (4 disks - Atlas I's upgraded to firmware rev L915, > RAID-5 conf) for a while - multiple tar's of large file trees, multiple > dd's, > and a scp of /usr/src from a remote machine. > > Everything seemed fine, until ~80 minutes into the scp, the machine > locked. > Solid. Required power cycle. Note: This was the same behavior we had > observed > previously (w/ v1.1.0 of the driver) on our production box (a news > feeder) - > things would trundle along fine for ~ an hour, and then locked > solid. I just received Justin's fixes. I also introduced a BUG into the system in the software interrupts. Will be working on both today. 1.1.7 should be out in few short days. Thanx! Simon From owner-freebsd-scsi Tue Jul 8 16:46:09 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA17180 for freebsd-scsi-outgoing; Tue, 8 Jul 1997 16:46:09 -0700 (PDT) Received: from tok.qiv.com ([204.214.141.211]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA17165 for ; Tue, 8 Jul 1997 16:46:03 -0700 (PDT) Received: (from uucp@localhost) by tok.qiv.com (8.8.5/8.8.5) with UUCP id SAA27891 for freebsd-scsi@freebsd.org; Tue, 8 Jul 1997 18:45:27 -0500 (CDT) Received: from localhost (jdn@localhost) by acp.qiv.com (8.8.5/8.8.5) with SMTP id SAA00398 for ; Tue, 8 Jul 1997 18:33:14 -0500 (CDT) X-Authentication-Warning: acp.qiv.com: jdn owned process doing -bs Date: Tue, 8 Jul 1997 18:33:13 -0500 (CDT) From: "Jay D. Nelson" To: freebsd-scsi@freebsd.org Subject: Which Viper firmware level? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've finally located a source for formware upgrades for Viper 2525 tapes. The Archive products have ended up at TSSI who will happily sell me the upgrade. Firmware levels available are 25462-01[134] with unusual caveats about OS support (?!). Does anyone know which I should buy? Otherwise, I'll go with 011 as per the handbook. Thanks for any insight. -- Jay From owner-freebsd-scsi Wed Jul 9 05:13:28 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id FAA11807 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 05:13:28 -0700 (PDT) Received: from cabri.obs-besancon.fr (cabri.obs-besancon.fr [193.52.184.3]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id FAA11799 for ; Wed, 9 Jul 1997 05:13:20 -0700 (PDT) Received: by cabri.obs-besancon.fr (5.57/Ultrix3.0-C) id AA22669; Wed, 9 Jul 97 14:13:49 +0100 Date: Wed, 9 Jul 97 14:13:49 +0100 Message-Id: <9707091313.AA22669@cabri.obs-besancon.fr> From: Jean-Marc Zucconi To: Janick.Taillandier@ratp.fr Cc: freebsd-scsi@freebsd.org In-Reply-To: <19970707204247.46321@fugue.noisy.ratp> (Janick.Taillandier@ratp.fr) Subject: Re: Problem with worm in current X-Mailer: Emacs Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >>>>> Janick Taillandier writes: > On Mon, Jul 07, 1997 at 04:49:15PM +0100, Jean-Marc Zucconi wrote: >> >> > worm0: ILLEGAL REQUEST asc:82,0 Vendor Specific ASC >> >> This means "Command Now Not Valid". Can you turn the debugging on so >> that we can see what exactly happens? > Sure. Il will mail you the results. >> Do you try to write a data or an audio track? > It was an audio track. Ok. It seems that the get capacity command returns a block size of 2048 bytes even if it was previously set to another value. Can you try the patch below? Index: worm.c =================================================================== RCS file: /home/ncvs/src/sys/scsi/worm.c,v retrieving revision 1.42 diff -u -r1.42 worm.c --- worm.c 1997/07/01 00:22:51 1.42 +++ worm.c 1997/07/08 17:49:40 @@ -228,10 +228,11 @@ { errval ret; struct scsi_data *worm = sc_link->sd; + int blk_size; SC_DEBUG(sc_link, SDEV_DB2, ("worm_size")); - worm->n_blks = scsi_read_capacity(sc_link, &worm->blk_size, + worm->n_blks = scsi_read_capacity(sc_link, &blk_size, flags); /* Jean-Marc _____________________________________________________________________________ Jean-Marc Zucconi Observatoire de Besancon F 25010 Besancon cedex PGP Key: finger jmz@cabri.obs-besancon.fr ============================================================================= From owner-freebsd-scsi Wed Jul 9 11:37:05 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id LAA29968 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 11:37:05 -0700 (PDT) Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id LAA29959 for ; Wed, 9 Jul 1997 11:37:01 -0700 (PDT) Received: from arts.ratp.fr (arts.ratp.fr [193.106.40.1]) by soleil.uvsq.fr (8.8.6/jtpda-5.2) with ESMTP id UAA25631 for ; Wed, 9 Jul 1997 20:36:55 +0200 (METDST) Received: by arts.ratp.fr id UAA25613 for ; Wed, 9 Jul 1997 20:36:50 +0200 (DST) Received: from minos.noisy.ratp by arts.ratp.fr with SMTP id SAA025611 for ; Wed Jul 9 20:36:42 1997 Received: from fugue.noisy.ratp (taillandier.rtc.ratp [192.25.83.123]) by minos.noisy.ratp with ESMTP id UAA01058 for ; Wed, 9 Jul 1997 20:36:42 +0200 (DST) Received: by fugue.noisy.ratp id UAA00503 ; Wed, 9 Jul 1997 20:35:21 +0200 (DST) From: Janick.Taillandier@ratp.fr (Janick Taillandier) Message-ID: <19970709203516.04741@fugue.noisy.ratp> Date: Wed, 9 Jul 1997 20:35:16 +0200 To: freebsd-scsi@freebsd.org Subject: Re: Problem with worm in current References: <19970707204247.46321@fugue.noisy.ratp> <9707091313.AA22669@cabri.obs-besancon.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.76e In-Reply-To: <9707091313.AA22669@cabri.obs-besancon.fr>; from Jean-Marc Zucconi on Wed, Jul 09, 1997 at 02:13:49PM +0100 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Wed, Jul 09, 1997 at 02:13:49PM +0100, Jean-Marc Zucconi wrote: > > Ok. It seems that the get capacity command returns a block size of > 2048 bytes even if it was previously set to another value. Can you try > the patch below? > > Index: worm.c > =================================================================== > RCS file: /home/ncvs/src/sys/scsi/worm.c,v > retrieving revision 1.42 > diff -u -r1.42 worm.c > --- worm.c 1997/07/01 00:22:51 1.42 > +++ worm.c 1997/07/08 17:49:40 > @@ -228,10 +228,11 @@ > { > errval ret; > struct scsi_data *worm = sc_link->sd; > + int blk_size; > > SC_DEBUG(sc_link, SDEV_DB2, ("worm_size")); > > - worm->n_blks = scsi_read_capacity(sc_link, &worm->blk_size, > + worm->n_blks = scsi_read_capacity(sc_link, &blk_size, > flags); > > /* > > Jean-Marc Well... same result : worm0: ILLEGAL REQUEST asc:2c,0 Command sequence error I am sending you the trace in debug mode. Janick From owner-freebsd-scsi Wed Jul 9 17:14:58 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id RAA22110 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 17:14:58 -0700 (PDT) Received: from krystal.sge.net (firewall-user@krystal.sge.net [152.91.9.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id RAA22096 for ; Wed, 9 Jul 1997 17:14:50 -0700 (PDT) From: Wayne.Farmer@mailhost.dpie.gov.au Received: (from uucp@localhost) by krystal.sge.net (8.8.5/8.8.6) id KAA26904 for ; Thu, 10 Jul 1997 10:14:48 +1000 (EST) Received: from jade.sge.net(10.1.1.254) by krystal.sge.net via smap (3.2) id xma026870; Thu, 10 Jul 97 10:14:26 +1000 Received: from SMTP (pryites.sge.net [10.1.1.246]) by jade.sge.net (8.8.5/8.8.5) with SMTP id KAA13697 for ; Thu, 10 Jul 1997 10:14:25 +1000 (EST) Received: from zirconia.sge.net ([10.1.1.6]) by 10.1.1.246 (Norton AntiVirus for Internet Email Gateways 1.0) ; Thu, 10 Jul 1997 00:13:25 0000 (GMT) Received: (from uucp@localhost) by zirconia.sge.net (8.8.5/8.8.5) id KAA04302 for ; Thu, 10 Jul 1997 10:14:23 +1000 (EST) Received: from ns2.dpie.gov.au(152.91.195.1) by zirconia.sge.net via smap (3.2) id xma004290; Thu, 10 Jul 97 10:14:05 +1000 Received: from talmalmo.dpie.gov.au (talmalmo.dpie.gov.au [152.91.195.222]) by conargo.dpie.gov.au with ESMTP id KAA14339 (8.6.11/IDA-1.6 for ); Thu, 10 Jul 1997 10:14:06 +1000 X-Organisation: Department of Primary Industries and Energy X-Url: http://www.dpie.gov.au/ X-Notice: Views expressed by this message are not necessarily those of the Department of Primary Industries and Energy or of the Government of the Commonwealth of Australia. Received: (from x400@localhost) by talmalmo.dpie.gov.au (8.8.3/8.8.3+worldtalk-4.1) id KAA27105 for freebsd-scsi@freebsd.org; Thu, 10 Jul 1997 10:11:56 +1000 (EST) Received: from TELEMEMO; Thu, 10 Jul 1997 10:11:26 +1000 Date: Thu, 10 Jul 1997 10:11:26 +1000 Subject: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 To: freebsd-scsi@freebsd.org (Reply Requested) Message-Id: <"970710001128Z.WT27093. 0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS> X-Mailer: Worldtalk (4.1)/MIME Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I think I read recently that doing a "newfs" on a wide SCSI disk with an Adaptec 2940UW hangs the system (a deadlock ?) I can concur with this and confirm that turning off wide negotiation in the SCSI setup seems to correct this. 3 questions : 1) Does anyone have any more info on this 2) Is there updated driver code I can include in the kernel build 3) Having "newfs"-ed, would turning back on wide negotiation lead to more problems similar to the "newfs" problem Thanks Wayne From owner-freebsd-scsi Wed Jul 9 18:00:44 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA26504 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 18:00:44 -0700 (PDT) Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA26494 for ; Wed, 9 Jul 1997 18:00:39 -0700 (PDT) Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5]) by mail.cdsnet.net (8.8.5/8.7.3) with SMTP id SAA13941; Wed, 9 Jul 1997 18:00:32 -0700 (PDT) Date: Wed, 9 Jul 1997 18:00:22 -0700 (PDT) From: Jaye Mathisen To: Wayne.Farmer@mailhost.dpie.gov.au cc: Reply Requested Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 In-Reply-To: <"970710001128Z.WT27093. 0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk WOrks fine for me in 2.2.2 and greater, on literally dozens of disks. On Thu, 10 Jul 1997 Wayne.Farmer@mailhost.dpie.gov.au wrote: > I think I read recently that doing a "newfs" on a wide SCSI disk with an > Adaptec 2940UW hangs the system (a deadlock ?) > > I can concur with this and confirm that turning off wide negotiation in the > SCSI setup seems to correct this. > > 3 questions : > > 1) Does anyone have any more info on this > 2) Is there updated driver code I can include in the kernel build > 3) Having "newfs"-ed, would turning back on wide negotiation lead to more > problems similar to the "newfs" problem > > Thanks > > Wayne > From owner-freebsd-scsi Wed Jul 9 19:08:24 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id TAA29378 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 19:08:24 -0700 (PDT) Received: from mail.ican.net (mail.ican.net [204.92.49.5]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA29357 for ; Wed, 9 Jul 1997 19:08:09 -0700 (PDT) Received: from oddjob.ican.net (oddjob.ican.net [204.92.49.7]) by mail.ican.net (8.8.6/8.8.6) with ESMTP id WAA19711; Wed, 9 Jul 1997 22:07:50 -0400 (EDT) Received: (from josh@localhost) by oddjob.ican.net (8.8.6/8.8.6) id WAA16393; Wed, 9 Jul 1997 22:07:49 -0400 (EDT) Message-ID: <19970709220749.25037@ican.net> Date: Wed, 9 Jul 1997 22:07:49 -0400 From: Josh Tiefenbach To: Simon Shapiro Cc: scsi@freebsd.org Subject: Yet Another DPT Update Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk More with the updates. We've stuck the DPT back into the production box (Compaq PPro 200). dmesg: DPT: PCI SCSI HBA Driver, version 1.1.6 dpt0 rev 2 int a irq 11 on pci0:18 dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port 1410 dpt0: Options: USE_SINTR, TRACK_CCB_STATES, MEASURE_PERFORMANCE, HANDLE_TIMEOUTS, SINTR_SPLHIGH dpt0 waiting for scsi devices to settle (dpt0:0:0): "Quantum XP32150W L915" type 0 fixed SCSI 2 sd0(dpt0:0:0): Direct-Access 2050MB (4199759 512 byte sectors) (dpt0:1:0): "Quantum XP32150W L915" type 0 fixed SCSI 2 sd1(dpt0:1:0): Direct-Access 2050MB (4199759 512 byte sectors) (dpt0:2:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2 sd2(dpt0:2:0): Direct-Access 8201MB (16796928 512 byte sectors) The following happened during a newfs of the RAID drive: dpt0: BAD (0) CCB in SP (status = 0000 0000 ). dpt0: Marking 27627 (Write (10) [6.1.18]) on c0b0t2l0 as late after 10042353usec dpt0: Destroying stale 27627 (Write (10) [6.1.18]) on c0b0t2l0 (20042335) dpt0: Request 99041 recieved with clear EOC. Marking as LOST. dpt0: BAD (0) CCB in SP (status = 1110 0000 ). dpt0: Marking 99948 (Write (10) [6.1.18]) on c0b0t2l0 as late after 18290511usec dpt0: BAD (0) CCB in SP (status = 1110 0000 ). dpt0: Destroying stale 99041 (Write (10) [6.1.18]) on c0b0t2l0 (29450684) dpt0: Destroying stale 99948 (Write (10) [6.1.18]) on c0b0t2l0 (28293024) dpt0: BAD (0) CCB in SP (status = 0000 0000 ). dpt0: Marking 114193 (Write (10) [6.1.18]) on c0b0t2l0 as late after 19983892usec dpt0: Destroying stale 114193 (Write (10) [6.1.18]) on c0b0t2l0 (29983890) dpt0: Marking 125669 (Write (10) [6.1.18]) on c0b0t2l0 as late after 15260348usec dpt0: Destroying stale 125669 (Write (10) [6.1.18]) on c0b0t2l0 (25258082) And this while running diablo ( a news feeder program, *not* the game :) dpt0: BAD (0) CCB in SP (status = 1100 0000 ). dpt0: Marking 128087 (Read (10) [6.1.5]) on c0b0t1l0 as late after 10862820usec dpt0: Destroying stale 128087 (Read (10) [6.1.5]) on c0b0t1l0 (20862822) dpt0: BAD (0) CCB in SP (status = 0000 0000 ). dpt0: BAD (0) CCB in SP (status = 0000 0000 ). dpt0: Marking 129491 (Write (10) [6.1.18]) on c0b0t1l0 as late after 12998830usec dpt0: Marking 129583 (Write (10) [6.1.18]) on c0b0t1l0 as late after 11917207usec dpt0: Destroying stale 129491 (Write (10) [6.1.18]) on c0b0t1l0 (22998832) dpt0: Destroying stale 129583 (Write (10) [6.1.18]) on c0b0t1l0 (21919370) Again. I should point out that the above errors *did not happen* when using the card, v1.1.6 of the driver w/same options, in a Pentium-100 box. Shimon: a) Any other data that you need? b) any ETA on v1.1.7 of the driver? josh -- Josh Tiefenbach - Assistant Gopher - ACC TelEnterprises - josh@ican.net From owner-freebsd-scsi Wed Jul 9 20:14:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id UAA02908 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 20:14:10 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id UAA02902 for ; Wed, 9 Jul 1997 20:14:02 -0700 (PDT) Received: (qmail 23298 invoked by uid 1000); 10 Jul 1997 03:14:12 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19970709220749.25037@ican.net> Date: Wed, 09 Jul 1997 20:14:12 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: Josh Tiefenbach Subject: RE: Yet Another DPT Update Cc: scsi@freebsd.org Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Josh Tiefenbach; On 10-Jul-97 you wrote: > More with the updates. We've stuck the DPT back into the production box > (Compaq PPro 200). > > dmesg: > > DPT: PCI SCSI HBA Driver, version 1.1.6 > dpt0 rev 2 int a irq 11 on pci0:18 > dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port 1410 > dpt0: Options: USE_SINTR, TRACK_CCB_STATES, MEASURE_PERFORMANCE, > HANDLE_TIMEOUTS, SINTR_SPLHIGH > dpt0 waiting for scsi devices to settle > (dpt0:0:0): "Quantum XP32150W L915" type 0 fixed SCSI 2 > sd0(dpt0:0:0): Direct-Access 2050MB (4199759 512 byte sectors) > (dpt0:1:0): "Quantum XP32150W L915" type 0 fixed SCSI 2 > sd1(dpt0:1:0): Direct-Access 2050MB (4199759 512 byte sectors) > (dpt0:2:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2 > sd2(dpt0:2:0): Direct-Access 8201MB (16796928 512 byte sectors) > > The following happened during a newfs of the RAID drive: > > dpt0: BAD (0) CCB in SP (status = 0000 0000 ). This is clearly what we see here on certain systems. In this case BOTH the status register and the CCB are bogus. This is not the data we expect, not can the DPT generate these. The PCI bus or some hardware along the line is eating it. > dpt0: Marking 27627 (Write (10) [6.1.18]) on c0b0t2l0 as late after > 10042353usec Since we threw away the corrupt CCB (not knowing which one it is), the real command simply times out. > dpt0: Destroying stale 27627 (Write (10) [6.1.18]) on c0b0t2l0 (20042335) Now we lost patience with this I/O request. We are going to do it in. > dpt0: Request 99041 recieved with clear EOC. Marking as LOST. This one is probably noise on the bus. If this bit is off, it means no command completed. We treat it as a loss, since we know (hope) what the command was but have no confidence in its integrity. ... more of the same ... > And this while running diablo ( a news feeder program, *not* the game :) ... and yet more ... > Again. I should point out that the above errors *did not happen* when > using > the card, v1.1.6 of the driver w/same options, in a Pentium-100 box. Sort of proves the point... :-( > Shimon: a) Any other data that you need? b) any ETA on v1.1.7 of the > driver? I forwarded your message to my DPT contact. The certification people there want specific hardware setups. I think the FreeBSD driver may be a bit faster than usual and that is why this problem is not so visible on other platforms. We can always put some delays in dpt_intr() and see if things improve. You can add ``DELAY(xx);'' somewhere at the very top, and see if it makes any difference. Let me know if that helps. Version 1.1.7 is a merge of Justin's code review. It makes the code cleaner, somewhat leaner and (hopefully) much more acceptable. I also reversed toe (reversed) priorities for the SCSI software interrupts, putting them in line with bio, rahter than net. I will release 1.1.7 either tonight or tomorrow. Simon From owner-freebsd-scsi Wed Jul 9 21:05:48 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id VAA05062 for freebsd-scsi-outgoing; Wed, 9 Jul 1997 21:05:48 -0700 (PDT) Received: from krystal.sge.net (firewall-user@krystal.sge.net [152.91.9.1]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA05054 for ; Wed, 9 Jul 1997 21:05:42 -0700 (PDT) From: Wayne.Farmer@mailhost.dpie.gov.au Received: (from uucp@localhost) by krystal.sge.net (8.8.5/8.8.6) id OAA18873 for ; Thu, 10 Jul 1997 14:05:39 +1000 (EST) Received: from jade.sge.net(10.1.1.254) by krystal.sge.net via smap (3.2) id xma018852; Thu, 10 Jul 97 14:05:35 +1000 Received: from SMTP (pryites.sge.net [10.1.1.246]) by jade.sge.net (8.8.5/8.8.5) with SMTP id OAA25831 for ; Thu, 10 Jul 1997 14:05:34 +1000 (EST) Received: from zirconia.sge.net ([10.1.1.6]) by 10.1.1.246 (Norton AntiVirus for Internet Email Gateways 1.0) ; Thu, 10 Jul 1997 04:04:34 0000 (GMT) Received: (from uucp@localhost) by zirconia.sge.net (8.8.5/8.8.5) id OAA01488 for ; Thu, 10 Jul 1997 14:05:33 +1000 (EST) Received: from ns2.dpie.gov.au(152.91.195.1) by zirconia.sge.net via smap (3.2) id xma001409; Thu, 10 Jul 97 14:05:07 +1000 Received: from talmalmo.dpie.gov.au (talmalmo.dpie.gov.au [152.91.195.222]) by conargo.dpie.gov.au with ESMTP id OAA21716 (8.6.11/IDA-1.6 for ); Thu, 10 Jul 1997 14:05:08 +1000 X-Organisation: Department of Primary Industries and Energy X-Url: http://www.dpie.gov.au/ X-Notice: Views expressed by this message are not necessarily those of the Department of Primary Industries and Energy or of the Government of the Commonwealth of Australia. Received: (from x400@localhost) by talmalmo.dpie.gov.au (8.8.3/8.8.3+worldtalk-4.1) id OAA05866 for freebsd-scsi@freebsd.org; Thu, 10 Jul 1997 14:02:57 +1000 (EST) Received: from TELEMEMO; Thu, 10 Jul 1997 14:02:16 +1000 Date: Thu, 10 Jul 1997 14:02:16 +1000 Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 To: freebsd-scsi@freebsd.org (Reply Requested) Message-Id: <"970710040210Z.WT05849. 0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS> X-Mailer: Worldtalk (4.1)/MIME Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I have had several responses indicating that no-one else has this problem. I have gone back to square 1 again, souped up the SCSI card to do wide again and everything seems OK. Maybe I did do a "newfs /dev/sd1?" instead of using the raw device (/dev/rsd1?) Wayne PS I will go back to the bunker. ---------------------- Forwarded by Wayne Farmer/CORPHQ on 10/07/97 14:00 --------------------------- owner-freebsd-scsi#064#FreeBSD.ORG - SMTPGATE@WT400 on 10/07/97 10:33:30 To: freebsd-scsi#064#FreeBSD.ORG - SMTPGATE@WT400 cc: Subject: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 I think I read recently that doing a "newfs" on a wide SCSI disk with an Adaptec 2940UW hangs the system (a deadlock ?) I can concur with this and confirm that turning off wide negotiation in the SCSI setup seems to correct this. 3 questions : 1) Does anyone have any more info on this 2) Is there updated driver code I can include in the kernel build 3) Having "newfs"-ed, would turning back on wide negotiation lead to more problems similar to the "newfs" problem Thanks Wayne From owner-freebsd-scsi Thu Jul 10 09:09:55 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id JAA09169 for freebsd-scsi-outgoing; Thu, 10 Jul 1997 09:09:55 -0700 (PDT) Received: from pluto.plutotech.com (root@[206.168.67.137]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA09160 for ; Thu, 10 Jul 1997 09:09:48 -0700 (PDT) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.5/8.8.5) with ESMTP id KAA17853; Thu, 10 Jul 1997 10:08:04 -0600 (MDT) Message-Id: <199707101608.KAA17853@pluto.plutotech.com> X-Mailer: exmh version 2.0beta 12/23/96 To: Wayne.Farmer@mailhost.dpie.gov.au cc: freebsd-scsi@FreeBSD.ORG (Reply Requested) Subject: Re: Adaptec 2940UW hang with multiple Wide SCSI Disks - FreeBSD 2.2.2 In-reply-to: Your message of "Thu, 10 Jul 1997 10:11:26 +1000." <"970710001128Z.WT27093. 0*/PN=Wayne.Farmer/OU=CORPHQ/O=DPIE/PRMD=AUSGOVDPIE/ADMD=TELEMEMO/C=AU/"@MHS> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 10 Jul 1997 10:08:04 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >I think I read recently that doing a "newfs" on a wide SCSI disk with an >Adaptec 2940UW hangs the system (a deadlock ?) The problem has nothing to do with the Adaptec driver or controller. This was a buffer deadlock bug that could be triggered by using the block instead of raw device when performing a newfs (newfs sd0a instead of newfs rsd0a). By disabling wide negotiation to the device, you are changing the timing characteristics slightly and perhaps avoiding this deadlock. My guess is that if you perform a newfs on the raw partition it will work just fine even if wide negotiation is turned on. -- Justin T. Gibbs =========================================== FreeBSD: Turning PCs into workstations =========================================== From owner-freebsd-scsi Fri Jul 11 08:33:49 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id IAA15498 for freebsd-scsi-outgoing; Fri, 11 Jul 1997 08:33:49 -0700 (PDT) Received: from shell.futuresouth.com (shell.futuresouth.com [207.141.254.20]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA15493 for ; Fri, 11 Jul 1997 08:33:46 -0700 (PDT) Received: (from tim@localhost) by shell.futuresouth.com (8.8.5/8.8.5) id KAA24270; Fri, 11 Jul 1997 10:33:44 -0500 (CDT) Message-ID: <19970711103344.05270@shell.futuresouth.com> Date: Fri, 11 Jul 1997 10:33:44 -0500 From: Tim Tsai To: freebsd-scsi@freebsd.org Subject: help debugging this Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.74e Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk What's the likely problem for this, the hard disk reset itself? Thanks, Tim sd0(ahc0:0:0): UNIT ATTENTION asc:29,0 sd0(ahc0:0:0): Power on, reset, or bus device reset occurred , retries:4 sd1(ahc1:4:0): parity error during Command phase. sd1(ahc1:4:0): SCB 0x0 - timed out in command phase, SCSISIGI == 0x56 SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17 sd1(ahc1:4:0): abort message in message buffer sd1(ahc1:4:0): SCB 0x1 - timed out in command phase, SCSISIGI == 0x56 SEQADDR = 0x43 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17 ahc1: Issued Channel A Bus Reset. 2 SCBs aborted Clearing bus reset Clearing 'in-reset' flag sd1(ahc1:4:0): no longer in timeout sd1(ahc1:4:0): UNIT ATTENTION asc:29,2 , retries:3 From owner-freebsd-scsi Fri Jul 11 10:02:59 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id KAA20563 for freebsd-scsi-outgoing; Fri, 11 Jul 1997 10:02:59 -0700 (PDT) Received: from feral-gw.feral.com (mjacob@feral.mauswerks.net [204.152.96.10]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA20558 for ; Fri, 11 Jul 1997 10:02:57 -0700 (PDT) Received: (from mjacob@localhost) by feral-gw.feral.com (8.8.6/8.7.3) id JAA06015; Fri, 11 Jul 1997 09:58:21 -0700 Date: Fri, 11 Jul 1997 09:58:21 -0700 From: Matthew Jacob Message-Id: <199707111658.JAA06015@feral-gw.feral.com> To: freebsd-scsi@FreeBSD.ORG, tim@futuresouth.com Subject: Re: help debugging this Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Cables, or possibly double termination. From owner-freebsd-scsi Fri Jul 11 12:12:16 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA28596 for freebsd-scsi-outgoing; Fri, 11 Jul 1997 12:12:16 -0700 (PDT) Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA28591 for ; Fri, 11 Jul 1997 12:12:12 -0700 (PDT) Received: (from daemon@localhost) by alpo.whistle.com (8.8.5/8.8.5) id LAA04109; Fri, 11 Jul 1997 11:16:02 -0700 (PDT) Received: from current1.whistle.com(207.76.205.22) via SMTP by alpo.whistle.com, id smtpd004102; Fri Jul 11 18:15:56 1997 Message-ID: <33C67781.6F5992E1@whistle.com> Date: Fri, 11 Jul 1997 11:12:17 -0700 From: Julian Elischer Organization: Whistle Communications X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.2-CURRENT i386) MIME-Version: 1.0 To: Tim Tsai CC: freebsd-scsi@FreeBSD.ORG Subject: Re: help debugging this References: <19970711103344.05270@shell.futuresouth.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Tim Tsai wrote: > > What's the likely problem for this, the hard disk reset itself? > > Thanks, > > Tim > > sd0(ahc0:0:0): UNIT ATTENTION asc:29,0 ^^^^^ > sd0(ahc0:0:0): Power on, reset, or bus device reset occurred > , retries:4 > sd1(ahc1:4:0): parity error during Command phase. > sd1(ahc1:4:0): SCB 0x0 - timed out in command phase, SCSISIGI == 0x56 > SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17 > sd1(ahc1:4:0): abort message in message buffer > sd1(ahc1:4:0): SCB 0x1 - timed out in command phase, SCSISIGI == 0x56 > SEQADDR = 0x43 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x17 > ahc1: Issued Channel A Bus Reset. 2 SCBs aborted > Clearing bus reset > Clearing 'in-reset' flag > sd1(ahc1:4:0): no longer in timeout > sd1(ahc1:4:0): UNIT ATTENTION asc:29,2 ^^^^^ > , retries:3 two differnt disks had problems.. on differnt SCSI busses too! your power supply (or it's connectors) is bad. From owner-freebsd-scsi Fri Jul 11 14:40:49 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id OAA04973 for freebsd-scsi-outgoing; Fri, 11 Jul 1997 14:40:49 -0700 (PDT) Received: from misery.sdf.com (misery.sdf.com [204.244.210.193]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id OAA04965 for ; Fri, 11 Jul 1997 14:40:44 -0700 (PDT) Received: from tom by misery.sdf.com with smtp (Exim 1.62 #1) id 0wmnLn-0007J2-00; Fri, 11 Jul 1997 14:35:39 -0700 Date: Fri, 11 Jul 1997 14:35:38 -0700 (PDT) From: Tom Samplonius To: freebsd-scsi@freebsd.org Subject: location for DPT driver? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Where is the newer DPT controller? I'm looking at ftp.i-connect.com/crash but it doesn't look like it has been updated recently. Tom From owner-freebsd-scsi Sat Jul 12 03:45:35 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA01262 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 03:45:35 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id DAA01250 for ; Sat, 12 Jul 1997 03:45:25 -0700 (PDT) Received: (qmail 624 invoked by uid 1000); 12 Jul 1997 10:38:55 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <199707120421.VAA11648@ns2.yahoo.com> Date: Sat, 12 Jul 1997 03:38:55 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: filo@yahoo.com, freebsd-SCSI@freebsd.org Subject: Re: problems with reboot Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi David Filo; On 12-Jul-97 you wrote: > when running the latest 1.1.7 code i noticed that the command being > "marked" and later "destroyed" during reboot was the "remove media" > command. so i removed the DPT_HANDLE_TIMEOUTS option and it works > fine now. the umount during reboot can take > 30 seconds which is > beyond your max timeout for scsi commands (i think). so looks like > you need to be careful on which commands you timeout and destroy. Yup. This whole timeout thing is bogus. I trust you understand it by now :-) It is only necessary for hardware platforms that corrupt DMA transfers between the DPT and the main memory. Actually, it is very probable that this is simply a delay, out of sync delay, rather that corruption. If you can live without DPT_HANDLE_TIMEOUTS, do so. I recommend so, as I do that myself. The DPT firmware handles timeouts much better. There is no need for it in the kernel, except as a survival tool. > the next test was to simply hit the "reset" button while the dpt was > chugging away on lots of untars. unfortunately the first time i did > this, the machine got hung on reboot in the dpt bios - never got past > "waiting for dpt" message (first led kept blinking). hitting reset a > second time worked and the machine booted. of course the filesystems > were hosed, but they fscked fine. so this sounds like a DPT firmware > bug. i have yet to reproduce this one in a few tries. do you have a > suggestion of who to talk to at dpt about this, or should i just go > through the normal support channel. what you describe is sensible but not a bug; When you forcefully reset the machine, if you were writing to a RAID-{1,5}, it is very possible you did so in mid-transaction. The DPT, upon boot, will try to restore the array to consistent state. This operation may take a very long while. Getting stuck is not correct. Did the card emit any beeps? these actually indicate what the problem is. What version of the firmware is it running? It is visible durin boot, and also in the syslog. Upgrade to 7L0, try again (without the reset :-) and call support. > the next time i tried to duplicate the dpt hang (by hitting reset > again), it came up fine (after fsck of course). however as i started > the multiple untars again, the machine panicked with the message > "panic: blkfree: freeing free frag". i was seeing this same behavior > when the reboots weren't happening cleanly (i.e. machine comes up, > fsck works, but then panic when accessing fs). i would assume this is > a 2.2 filesystem bug, but i'm not sure. have you seen anything like > this or have any reason to believe it's associated with the dpt > driver? i don't have much experience with 2.2 so i don't know if this > is common. Depends on how much memory you have, you can destroy up to 64Mb of disk writes. There are very few filesystems that can survive this kind of assault. Even good file systems like Veritas vxfs, or (yes) NotTested ntfs will not survive that. One of the most robust filesystems ever created, is an Oracle RDBMS (they do nor necessarily view their RDBMS as a filesystem), will not survice losing 64Mb of data it thinks already was committed to disk. Many years ago i raced cars that had turbochargers on them. The best way to destroy one (for something that spins 130,000-200,000 rpm, bolted to a car engine and sucking gasoline, they are very reliable), is to open full throttle, on a running engine, and kill the power. Why am I telling you this? Every engineered product has a sure way of destroying it by doing something that is doable and not clearly marked ``DO NOT DO THAT''. The DPT controller assumes that normally, computers do not push the reset button. They are designed to resist a single point of failure (SPOF). What you do is MMPOF :-) Smoke will be emitted. In a truely critical application, where application-side integrity is more important than speed consierations, do the following: * configure the DPT for write-through caches * disable the caches on ALL the disk drives. * Pray :-) Some disk drives will NOT disable their caches when you tell them to. > we have a lot more experience with 2.1 and the filesystem appears to > be very stable. which brings up the question: will your stuff work > under 2.1? if you think it's feasible i'll probably try to get it > working under 2.1-stable to see if this filesystem problem persists. The problem is not in the filesystem. Put a good UPS between the CPU and the wall socket, cut off the reset button and it will work fine. There is an issue with FreeBSD shutdown not waiting for the DPt to flush caches as it should. > finally, you've asked about posting/forwarding my questions/comments > to other places. no problems - do whatever you'd like with anything i > say.. I do not know about that, but think thatthis particular exchange will help other. Simon From owner-freebsd-scsi Sat Jul 12 04:12:28 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id EAA01918 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 04:12:28 -0700 (PDT) Received: from implode.root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA01913 for ; Sat, 12 Jul 1997 04:12:22 -0700 (PDT) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.5/8.8.5) with ESMTP id EAA15976; Sat, 12 Jul 1997 04:13:14 -0700 (PDT) Message-Id: <199707121113.EAA15976@implode.root.com> To: Simon Shapiro cc: filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG Subject: Re: problems with reboot In-reply-to: Your message of "Sat, 12 Jul 1997 03:38:55 PDT." From: David Greenman Reply-To: dg@root.com Date: Sat, 12 Jul 1997 04:13:14 -0700 Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >There is an issue with FreeBSD shutdown not waiting for the DPt to flush >caches as it should. Should be easy to fix by adding a shutdown routine to the driver that waits for the flushes to complete. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-scsi Sat Jul 12 11:50:30 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id LAA14993 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 11:50:30 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id LAA14968 for ; Sat, 12 Jul 1997 11:50:16 -0700 (PDT) Received: (qmail 23804 invoked by uid 1000); 12 Jul 1997 18:50:05 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <199707121113.EAA15976@implode.root.com> Date: Sat, 12 Jul 1997 11:50:05 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: dg@root.com Subject: Re: problems with reboot Cc: filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi David Greenman; On 12-Jul-97 you wrote: > >There is an issue with FreeBSD shutdown not waiting for the DPt to flush > >caches as it should. > > Should be easy to fix by adding a shutdown routine to the driver that > waits > for the flushes to complete. I have not checked the code in this area, but all that I think is necessary is for the umount(2) syscall to wait and block shutdown until it returns. Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'', which the DPT blocks until it is done flushing and invalidating. I personally never have this problem on any of our machines, but... BTW, on early UnixWare, the /sbin/reboot was actually a call to another prgram that took somemysterious arguments (foobar 1 2), which given incorrectly, would cause Unix to execute a halt, without any synching and thus produce similar results. Can /sbin/reboot do saomething similar? Simon From owner-freebsd-scsi Sat Jul 12 12:18:07 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA15817 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 12:18:07 -0700 (PDT) Received: from cais.cais.com (root@cais.com [199.0.216.4]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA15812 for ; Sat, 12 Jul 1997 12:18:02 -0700 (PDT) Received: from earth.mat.net (root@earth.mat.net [205.252.122.1]) by cais.cais.com (8.8.5/CJKv1.99-CAIS) with SMTP id PAA13085; Sat, 12 Jul 1997 15:17:52 -0400 (EDT) Received: from Journey2.mat.net (journey2.mat.net [205.252.122.116]) by earth.mat.net (8.6.12/8.6.12) with SMTP id PAA22521; Sat, 12 Jul 1997 15:17:49 -0400 Date: Sat, 12 Jul 1997 15:17:28 -0400 (EDT) From: Chuck Robey X-Sender: chuckr@Journey2.mat.net To: Simon Shapiro cc: dg@root.com, filo@yahoo.com, freebsd-SCSI@FreeBSD.ORG Subject: Re: problems with reboot In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Sat, 12 Jul 1997, Simon Shapiro wrote: > > Hi David Greenman; On 12-Jul-97 you wrote: > > >There is an issue with FreeBSD shutdown not waiting for the DPt to flush > > >caches as it should. > > > > Should be easy to fix by adding a shutdown routine to the driver that > > waits > > for the flushes to complete. > > I have not checked the code in this area, but all that I think is necessary > is for the umount(2) syscall to wait and block shutdown until it returns. > Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'', > which the DPT blocks until it is done flushing and invalidating. > I personally never have this problem on any of our machines, but... Is this always safe? I've had some instances where a umount call simply hung, and never returned. I think they were either nfs or msdos mounts that gave this trouble, but the umount call could not be kill'ed, and making shutdown wait? Would halt still work, as an emergency measure? I know the FSs that were hung wouldn't be closed, but at least my ufs FSs would be clean. ----------------------------+----------------------------------------------- Chuck Robey | Interests include any kind of voice or data chuckr@eng.umd.edu | communications topic, C programming, and Unix. 213 Lakeside Drive Apt T-1 | Greenbelt, MD 20770 | I run Journey2 and picnic, both FreeBSD (301) 220-2114 | version 3.0 current -- and great FUN! ----------------------------+----------------------------------------------- From owner-freebsd-scsi Sat Jul 12 14:26:47 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id OAA20444 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 14:26:47 -0700 (PDT) Received: from silvia.HIP.Berkeley.EDU (ala-ca32-05.ix.netcom.com [199.35.209.69]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id OAA20439; Sat, 12 Jul 1997 14:26:43 -0700 (PDT) Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.8.6/8.6.9) id OAA09547; Sat, 12 Jul 1997 14:26:25 -0700 (PDT) Date: Sat, 12 Jul 1997 14:26:25 -0700 (PDT) Message-Id: <199707122126.OAA09547@silvia.HIP.Berkeley.EDU> To: crb@Glue.umd.edu CC: gary@tbe.net, freebsd-scsi@freebsd.org, freebsd-isp@freebsd.org, freebsd-hardware@freebsd.org In-reply-to: (crb@Glue.umd.edu) Subject: Re: NCR SCSI controllers From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk * From: crb@Glue.umd.edu Gee, I know someone with a very similar mail address! ;) * > Sorry about the cross-post, but I thought the question is appropriate to * > both lists... Unless you are asking something about performance under a huge number of disks and heavy load or something, I don't see the relevance with -isp. And for -hardware...we have -scsi just for these kind of discussions (;), the next person to follow please chop -isp and -hardware out of the CC: list. * > We were looking at the NCR 53C810 and -815 PCI SCSI controllers and were * > just wondering if anybody has experience/problems with them. Also if * > anyone has happened to compare them to Adaptec controllers, I would be * > glad to hear how they turned out. TIA! I've had 810- and 825-based controllers, they have worked very well over the years. However, the Adaptec is very stable now too. The main difference is probably the price ($70 for 810, $120 for 875, $200+ for 2940*) and configurability. I'm not sure if the current NCR's BIOSes let you change the Adapter's ID's, sync/wide negotiations per device, etc. -- mine doesn't, in fact mine doesn't even have a boot setup menu. Also, I don't know how the NCR controllers perform under heavy load as I never had more than two disks on them -- the Adaptec generally works fine with 14 disks in 10MHz mode or 8 disks in 20MHz mode (cable length problem). * I do have to admit, however, that I am not getting ultra-wide speeds out * of my Tekram even though I have an ultra-wide capable IBM UltraStar 2es * but I haven't really looked into it yet to see if this is just a configuration * problem or what. Our NCR driver doesn't support it yet. Stefan Esser (se) is working on it. Based on past experience, my guess is that it will start working soon because se is working on it. :) Satoshi From owner-freebsd-scsi Sat Jul 12 15:40:32 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id PAA23185 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 15:40:32 -0700 (PDT) Received: from george.lbl.gov (george.lbl.gov [128.3.196.93]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id PAA23180; Sat, 12 Jul 1997 15:40:28 -0700 (PDT) Received: (jin@localhost) by george.lbl.gov (8.6.10/8.6.5) id PAA21522; Sat, 12 Jul 1997 15:40:21 -0700 Date: Sat, 12 Jul 1997 15:40:21 -0700 From: "Jin Guojun[ITG]" Message-Id: <199707122240.PAA21522@george.lbl.gov> To: asami@cs.berkeley.edu, crb@Glue.umd.edu Subject: Re: NCR SCSI controllers Cc: freebsd-hardware@FreeBSD.ORG, freebsd-isp@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG, gary@tbe.net Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >Also, I don't know how the NCR controllers perform under heavy load as >I never had more than two disks on them -- the Adaptec generally works >fine with 14 disks in 10MHz mode or 8 disks in 20MHz mode (cable >length problem). I have no problem with NCR at all. Specially under FreeBSD, It does not take CPU time. Two disks or 14 disks is not the issue for SCSI controllers. If you can saturate the SCSI bus with two disks (new tech can), then, putting 100 disks (assume ID is allowed), would not make any difference at all. See Hardware performance guide for Pentium family (new) under http://www-itg.lbl.gov/ISS/hardware (two years old and it will be updated soon :-) > * I do have to admit, however, that I am not getting ultra-wide speeds out > * of my Tekram even though I have an ultra-wide capable IBM UltraStar 2es > * but I haven't really looked into it yet to see if this is just a configuratio >n > * problem or what. > >Our NCR driver doesn't support it yet. Stefan Esser (se) is working >on it. Based on past experience, my guess is that it will start >working soon because se is working on it. :) Does some one have tested any ultra-wide SCSI controllers to have at least more than 20 MB throughput over a single controller with number of ultra-wide disks? I posted such question a few month ago, and did not hear any respond. I was wondering no one had it worked at that time. -Jin From owner-freebsd-scsi Sat Jul 12 16:09:24 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA24204 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 16:09:24 -0700 (PDT) Received: from silvia.HIP.Berkeley.EDU (ala-ca32-05.ix.netcom.com [199.35.209.69]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA24199 for ; Sat, 12 Jul 1997 16:09:18 -0700 (PDT) Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.8.6/8.6.9) id QAA10000; Sat, 12 Jul 1997 16:09:07 -0700 (PDT) Date: Sat, 12 Jul 1997 16:09:07 -0700 (PDT) Message-Id: <199707122309.QAA10000@silvia.HIP.Berkeley.EDU> To: jin@george.lbl.gov CC: crb@Glue.umd.edu, freebsd-scsi@FreeBSD.ORG, gary@tbe.net In-reply-to: <199707122240.PAA21522@george.lbl.gov> (jin@george.lbl.gov) Subject: Re: NCR SCSI controllers From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk * Cc: freebsd-hardware@FreeBSD.ORG, freebsd-isp@FreeBSD.ORG, * freebsd-scsi@FreeBSD.ORG, gary@tbe.net Please don't continue crossposting.... * I have no problem with NCR at all. Specially under FreeBSD, It does not take * CPU time. Two disks or 14 disks is not the issue for SCSI controllers. * If you can saturate the SCSI bus with two disks (new tech can), then, putting * 100 disks (assume ID is allowed), would not make any difference at all. You may want to note that there is more to performance than sequential throughput. * Does some one have tested any ultra-wide SCSI controllers to have at least * more than 20 MB throughput over a single controller with number of ultra-wide * disks? I have seen over 30MB/s on one of the channels of an Adaptec 3940UW with 6 or 7 of the newest IBM drives. * I posted such question a few month ago, and did not hear any respond. * I was wondering no one had it worked at that time. Maybe you asked in a wrong list? :) Satoshi From owner-freebsd-scsi Sat Jul 12 18:21:26 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA28249 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:21:26 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id SAA28232 for ; Sat, 12 Jul 1997 18:21:07 -0700 (PDT) Received: (qmail 27240 invoked by uid 1000); 13 Jul 1997 01:21:03 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Sat, 12 Jul 1997 18:21:03 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: Chuck Robey Subject: Re: problems with reboot Cc: freebsd-SCSI@FreeBSD.ORG, filo@yahoo.com, dg@root.com Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi Chuck Robey; On 12-Jul-97 you wrote: ... > > Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'', > > which the DPT blocks until it is done flushing and invalidating. > > I personally never have this problem on any of our machines, but... > > Is this always safe? I've had some instances where a umount call simply > hung, and never returned. I think they were either nfs or msdos mounts > that gave this trouble, but the umount call could not be kill'ed, and > making shutdown wait? Would halt still work, as an emergency measure? > I know the FSs that were hung wouldn't be closed, but at least my ufs FSs > would be clean. Network Failure system is a special case (i AM being nice :-); It is supposedly stateless and the mount is a client and thus not governing physical I/O. a shutdown can (should) probably force a umount. Even on a local system, a forced umount is OK. It is a FS issue. But if the fs layer calls a function that by definition blocks, it is ``none of the caller's business'' how/what the callee does and how long it takes. To assume anything on the nature if a callee's internals is not a good idea. Here we have a live exapmple (why it is a bad idea). Simon From owner-freebsd-scsi Sat Jul 12 18:39:54 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA28875 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:39:54 -0700 (PDT) Received: from ns2.yahoo.com (ns2.yahoo.com [205.216.162.20]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA28870 for ; Sat, 12 Jul 1997 18:39:52 -0700 (PDT) Received: (from filo@localhost) by ns2.yahoo.com (8.8.5/8.6.12) id SAA14919; Sat, 12 Jul 1997 18:38:13 -0700 (PDT) Date: Sat, 12 Jul 1997 18:38:13 -0700 (PDT) Message-Id: <199707130138.SAA14919@ns2.yahoo.com> From: David Filo To: Shimon@i-Connect.Net cc: freebsd-SCSI@FreeBSD.ORG, dg@root.com In-Reply-To: Subject: Re: problems with reboot Reply-To: filo@yahoo.com Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > >There is an issue with FreeBSD shutdown not waiting for the DPt to flush > > >caches as it should. > > > > > Should be easy to fix by adding a shutdown routine to the driver that > > waits > > for the flushes to complete. > > I have not checked the code in this area, but all that I think is necessary > is for the umount(2) syscall to wait and block shutdown until it returns. > Under normal operation, it generates the SCSI ``ALLOW MEDIA REMOVAL'', > which the DPT blocks until it is done flushing and invalidating. > I personally never have this problem on any of our machines, but... umount(2) does wait correctly. The problem in this case was that the DPT driver was timing out the "ALLOW MEDIA REMOVAL" command sent to the controller before it had a chance to finish flushing its cache. The problem went away when I removed "options DPT_HANDLE_TIMEOUTS" from the kernel config. The result of this was that the "ALLOW MEDIA REMOVAL" command was allowed to complete, umount waited around, and everything shutdown cleanly. If this explanation is correct, the DPT driver should be changed to not timeout the "ALLOW MEDIA REMOVAL" when the DPT_HANDLE_TIMEOUTS option is being used. From owner-freebsd-scsi Sat Jul 12 18:53:25 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA29460 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 18:53:25 -0700 (PDT) Received: from misery.sdf.com (misery.sdf.com [204.244.210.193]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id SAA29454 for ; Sat, 12 Jul 1997 18:53:21 -0700 (PDT) Received: from tom by misery.sdf.com with smtp (Exim 1.62 #1) id 0wnDls-0000nK-00; Sat, 12 Jul 1997 18:48:20 -0700 Date: Sat, 12 Jul 1997 18:48:19 -0700 (PDT) From: Tom Samplonius To: "Jin Guojun[ITG]" cc: asami@cs.berkeley.edu, crb@Glue.umd.edu, freebsd-scsi@freebsd.org, gary@tbe.net Subject: Re: NCR SCSI controllers In-Reply-To: <199707122240.PAA21522@george.lbl.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Sat, 12 Jul 1997, Jin Guojun[ITG] wrote: > I have no problem with NCR at all. Specially under FreeBSD, It does not take > CPU time. Two disks or 14 disks is not the issue for SCSI controllers. > If you can saturate the SCSI bus with two disks (new tech can), then, putting > 100 disks (assume ID is allowed), would not make any difference at all. There is a difference. Each SCSI channel has some transactional limitation. ... > Does some one have tested any ultra-wide SCSI controllers to have at least > more than 20 MB throughput over a single controller with number of ultra-wide > disks? I posted such question a few month ago, and did not hear any respond. > I was wondering no one had it worked at that time. Not a problem. I used 11 disks on a 3940UW, and was able to max out both channels. With drives being able to sustain 7MB writing, this is getting easier to do. > -Jin Tom From owner-freebsd-scsi Sat Jul 12 21:03:42 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id VAA03937 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 21:03:42 -0700 (PDT) Received: from george.lbl.gov (george.lbl.gov [128.3.196.93]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id VAA03928 for ; Sat, 12 Jul 1997 21:03:36 -0700 (PDT) Received: (jin@localhost) by george.lbl.gov (8.6.10/8.6.5) id VAA24519; Sat, 12 Jul 1997 21:03:29 -0700 Date: Sat, 12 Jul 1997 21:03:29 -0700 From: "Jin Guojun[ITG]" Message-Id: <199707130403.VAA24519@george.lbl.gov> To: tom@sdf.com Subject: Re: NCR SCSI controllers Cc: asami@cs.berkeley.edu, crb@Glue.umd.edu, freebsd-scsi@FreeBSD.ORG, gary@tbe.net Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk }> Does some one have tested any ultra-wide SCSI controllers to have at least }> more than 20 MB throughput over a single controller with number of ultra-wide }> disks? I posted such question a few month ago, and did not hear any respond. }> I was wondering no one had it worked at that time. } } Not a problem. I used 11 disks on a 3940UW, and was able to max out }both channels. With drives being able to sustain 7MB writing, this is }getting easier to do. When you say to sustain 7MB writing, do you mean using a single disk? I guess. Because I can get 15MB writing over three disks via single NCR SCSI channel (just wide, not ultra-wide). So, 3940UW supposes to have 30MB in writing and 35MB in reading. This will be seen in NCR-875 when the driver is ready (S.E). Otherwise, the 7MB writing rate sounds not right. -Jin From owner-freebsd-scsi Sat Jul 12 22:11:24 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id WAA05883 for freebsd-scsi-outgoing; Sat, 12 Jul 1997 22:11:24 -0700 (PDT) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id WAA05873 for ; Sat, 12 Jul 1997 22:11:09 -0700 (PDT) Received: (qmail 28877 invoked by uid 1000); 13 Jul 1997 05:11:04 -0000 Message-ID: X-Mailer: XFMail 1.2-alpha [p0] on FreeBSD Content-Type: text/plain; charset=iso-8859-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <199707130138.SAA14919@ns2.yahoo.com> Date: Sat, 12 Jul 1997 22:11:04 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: filo@yahoo.com Subject: Re: problems with reboot Cc: freebsd-SCSI@FreeBSD.ORG, dg@root.com Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi David Filo; On 13-Jul-97 you wrote: ... > umount(2) does wait correctly. The problem in this case was that the > DPT driver was timing out the "ALLOW MEDIA REMOVAL" command sent to > the controller before it had a chance to finish flushing its cache. > The problem went away when I removed "options DPT_HANDLE_TIMEOUTS" > from the kernel config. The result of this was that the "ALLOW MEDIA > REMOVAL" command was allowed to complete, umount waited around, and > everything shutdown cleanly. Ah... Work from incomplete dataset and you are asured bad results... This is probably why it ``does not happen here'' (hate that expresion). > If this explanation is correct, the DPT driver should be changed to > not timeout the "ALLOW MEDIA REMOVAL" when the DPT_HANDLE_TIMEOUTS > option is being used. What should be done is disable DPT_HANDLE_TIMEOUTS as a default. The DPT firmware knows how to timeout better than you and me. This is what we pay for :-) The DPT_HANDLE_TIMEOUTS option is there only to allow broken hardware to install, so that testing can be conducted. I had a report form a user who loaded the card to a max, pressed the reset button only to find corrupt filesystemsupon reboot. You simply CANNOT do that with a standard DPT configuration. We are building a non-stop FreeBSD based transaction processor here. To acomplish this level of reliability, you need to: Disable the DPT from resetting when the CPU resets, setup all the caches as write-through (including those on the disk drives), and assure an N+1 power to the CPU. In a stand-alone PC environment, you will get a very high degree of reliability if you simply have a descent UPS protecting the AC to your computer and stay away from the reset button. Simon