Date: Tue, 18 Mar 2003 10:27:16 -0800 (PST) From: Nate Lawson <nate@root.org> To: Matthew Reimer <mreimer@vpop.net> Cc: scsi@freebsd.org Subject: Re: Help interpreting SCSI errors Message-ID: <Pine.BSF.4.21.0303181013560.10496-100000@root.org> In-Reply-To: <3E77594A.7020702@vpop.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Mar 2003, Matthew Reimer wrote: > swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 608, > size: 4096 > swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 7568, > size: 4096 sys/vm/vm_swap.c shows that waiting for a read of swapped out page to complete is timing out (20 seconds). > Running "raidutil -e nonrecov d0" shows several sequences like the > following (separated by varying amounts of time). Every time, the > initial "Bad SCSI Status - Check Condition" comes from id 5. > > Can anyone interpret the initial "bad scsi status" that kicks off the > bus reset, etc.? Whatever it is, it isn't degrading the volume, but if a > disk is going bad or if there's some other problem I would like to know. > ---- > > 03/13/2003 13:37:48 Level 3 > HBA=0 BUS=0 ID=5 LUN=0 > Bad SCSI Status - Check Condition > 28 00 02 5C EE BF 00 00 20 00 00 00 A read_10 to 5 failed. > 03/13/2003 13:37:48 Level 3 > HBA=0 BUS=0 ID=5 LUN=0 > Request Sense > 70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 > No Sense Why did it fail? Sorry, I'm not going to tell you. > 03/13/2003 13:38:24 Level 3 > Bus reset occurred on channel 0 - Command watchdog time-out caused the > bus to be reset A command timed out (didn't complete before watchdog went off) > 03/13/2003 13:38:24 Level 3 > HBA=0 BUS=0 ID=5 LUN=0 > HBA Error - SCSI Bus Reset Something wrong, reset bus. > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=5 LUN=0 > Bad SCSI Status - Check Condition > 28 00 02 5C EA 7F 00 00 01 00 00 00 Another read from 5 failed. > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=5 LUN=0 > Request Sense > 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 > Unit Attention Why? The drive is lonely and needs attention because a bus reset occurred (29, 02). > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=4 LUN=0 > Bad SCSI Status - Check Condition > 2A 00 02 5C E8 7F 00 00 01 00 00 00 A write to 4 failed. > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=4 LUN=0 > Request Sense > 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 > Unit Attention Now 4 wants to tell you about the bus reset. > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=3 LUN=0 > Bad SCSI Status - Check Condition > 2A 00 02 7D 8E 9F 00 00 20 00 00 00 Write to 3 failed. > 03/13/2003 13:38:27 Level 3 > HBA=0 BUS=0 ID=3 LUN=0 > Request Sense > 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 > Unit Attention Yes, it also wants to tell you that a bus reset occurred. The problem is that swap is timing out reading from 5 and the rest of the errors are just harmless cascades from that one. Check your termination, cabling, etc. on 5. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0303181013560.10496-100000>