Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Mar 2003 10:27:16 -0800 (PST)
From:      Nate Lawson <nate@root.org>
To:        Matthew Reimer <mreimer@vpop.net>
Cc:        scsi@freebsd.org
Subject:   Re: Help interpreting SCSI errors
Message-ID:  <Pine.BSF.4.21.0303181013560.10496-100000@root.org>
In-Reply-To: <3E77594A.7020702@vpop.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Mar 2003, Matthew Reimer wrote:
> swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 608, 
> size: 4096
> swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 7568, 
> size: 4096

sys/vm/vm_swap.c shows that waiting for a read of swapped out page to
complete is timing out (20 seconds).
 
> Running "raidutil -e nonrecov d0" shows several sequences like the 
> following (separated by varying amounts of time). Every time, the 
> initial "Bad SCSI Status - Check Condition" comes from id 5.
> 
> Can anyone interpret the initial "bad scsi status" that kicks off the 
> bus reset, etc.? Whatever it is, it isn't degrading the volume, but if a 
> disk is going bad or if there's some other problem I would like to know.
> ----
> 
> 03/13/2003  13:37:48   Level 3
> HBA=0 BUS=0 ID=5 LUN=0
> Bad SCSI Status - Check Condition
> 28 00 02 5C EE BF 00 00 20 00 00 00

A read_10 to 5 failed.
 
> 03/13/2003  13:37:48   Level 3
> HBA=0 BUS=0 ID=5 LUN=0
> Request Sense
> 70 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00
> No Sense

Why did it fail?  Sorry, I'm not going to tell you.
 
> 03/13/2003  13:38:24   Level 3
> Bus reset occurred on channel 0 - Command watchdog time-out caused the 
> bus to be reset

A command timed out (didn't complete before watchdog went off)

> 03/13/2003  13:38:24   Level 3
> HBA=0 BUS=0 ID=5 LUN=0
> HBA Error - SCSI Bus Reset

Something wrong, reset bus.
 
> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=5 LUN=0
> Bad SCSI Status - Check Condition
> 28 00 02 5C EA 7F 00 00 01 00 00 00

Another read from 5 failed.
 
> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=5 LUN=0
> Request Sense
> 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00
> Unit Attention

Why?  The drive is lonely and needs attention because a bus reset occurred
(29, 02).
 
> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=4 LUN=0
> Bad SCSI Status - Check Condition
> 2A 00 02 5C E8 7F 00 00 01 00 00 00

A write to 4 failed. 
 
> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=4 LUN=0
> Request Sense
> 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00
> Unit Attention

Now 4 wants to tell you about the bus reset.

> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=3 LUN=0
> Bad SCSI Status - Check Condition
> 2A 00 02 7D 8E 9F 00 00 20 00 00 00

Write to 3 failed.

> 03/13/2003  13:38:27   Level 3
> HBA=0 BUS=0 ID=3 LUN=0
> Request Sense
> 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00
> Unit Attention

Yes, it also wants to tell you that a bus reset occurred.

The problem is that swap is timing out reading from 5 and the rest of the
errors are just harmless cascades from that one.  Check your termination,
cabling, etc. on 5.

-Nate


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0303181013560.10496-100000>