From owner-freebsd-scsi Fri Oct 23 18:31:46 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id SAA01653 for freebsd-scsi-outgoing; Fri, 23 Oct 1998 18:31:46 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from opus.cts.cwu.edu (opus.cts.cwu.edu [198.104.92.71]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA01647 for ; Fri, 23 Oct 1998 18:31:42 -0700 (PDT) (envelope-from skynyrd@opus.cts.cwu.edu) Received: from localhost (skynyrd@localhost) by opus.cts.cwu.edu (8.9.1/8.9.1) with SMTP id SAA02064 for ; Fri, 23 Oct 1998 18:31:11 -0700 (PDT) Date: Fri, 23 Oct 1998 18:31:10 -0700 (PDT) From: Chris Timmons To: freebsd-scsi@FreeBSD.ORG Subject: Thrashing CAM on SMP Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I tried recently to reproduce the problems Mark Murray has with CAM & SMP (panic with X going and lots of filesystem activity.) I couldn't panic, but I did have the machine wedge with recurring, non-recoverable device tiemouts on the system and swap disks. The machine is a server and doesn't have a workstation video card. Of course, I forgot BREAK_TO_DEBUGGER, so I couldn't get a dump. Using an SMP -CURRENT from just before the 3.0 release, I set up 3 256M bonnies on different spindles, an md5 of a 280MB file, and a 'make -j 12 buildworld' - all in loops to repeat over and over. The buildworld also unmounted, newfs-ed and remounted /usr/obj after each turn. The machine is a dual-PII 266 tyan tiger. The system lasted for a couple days with a load average between 5 and 12. The activity lights on the 3 bonnie drives were almost always solid green and the box sounded like a popcorn popper. at scbus0 target 0 lun 0 (pass0,da0) at scbus0 target 1 lun 0 (pass1,da1) at scbus1 target 0 lun 0 (pass2,da2) at scbus1 target 1 lun 0 (pass3,da3) at scbus1 target 4 lun 0 (pass4,da4) During the time it was alive, the bonnies were running on da2, da3, and da4. The only trouble I had were device timeouts on the firmware-buggy Atlas-II, and an occasional hiccup on the SEAGATES. I'm using 40MHZ xfer rates and adaptec cables with the active terminators - drive termination off. midtest3:/root#> grep BDR /var/log/messages Oct 21 04:16:13 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 21 05:27:29 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 21 14:44:18 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 21 17:10:31 midtest3 /kernel: (da3:ahc1:0:1:0): BDR message in message buffer Oct 21 17:11:31 midtest3 /kernel: (da3:ahc1:0:1:0): BDR message in message buffer Oct 21 17:12:30 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 21 19:47:07 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 21 20:04:24 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 01:38:54 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 02:51:36 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 04:10:12 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 05:51:51 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 07:41:04 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 07:47:00 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 09:22:32 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 10:50:59 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 11:06:40 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 13:34:20 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 15:12:07 midtest3 /kernel: (da2:ahc1:0:0:0): BDR message in message buffer Oct 22 15:13:07 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 15:28:40 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 15:43:34 midtest3 /kernel: (da2:ahc1:0:0:0): BDR message in message buffer Oct 22 15:44:34 midtest3 /kernel: (da2:ahc1:0:0:0): BDR message in message buffer Oct 22 15:45:34 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 16:18:28 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB Oct 22 17:06:45 midtest3 /kernel: (da4:ahc1:0:4:0): Queuing a BDR SCB When it finally died, I'd swear it was telling me that da0 and/or da1 kept timing out - messages to the serial console which I of course didn't trap. The machine would respond to pings and print out the BDR timeout messages, but would not do anything else, so it was apparantly stuck at a fairly high spl. I'm getting up-to-date, noticing Ken's mega-commit recently. I'll be able to break in with ddb now, and can take a dump if the situation re-occurs. The system is in a mega rack-mount case with multiple cooling fans blowing directly on the drives which were cool to the touch during the middle of the run, so I don't think we overheated. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message