Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Apr 1999 16:18:04 -0700 (PDT)
From:      asami@FreeBSD.ORG (Satoshi - Ports Wraith - Asami)
To:        ken@plutotech.com
Cc:        scsi@FreeBSD.ORG
Subject:   Re: timed out while idle?
Message-ID:  <199904132318.QAA49274@silvia.hip.berkeley.edu>
In-Reply-To: <199904131623.KAA03308@panzer.plutotech.com> (ken@plutotech.com)
References:   <199904131623.KAA03308@panzer.plutotech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
 * From: "Kenneth D. Merry" <ken@plutotech.com>

 * The timed out while idle message means that the drive took longer than the
 * timeout (60 seconds) to respond to a read or write request, and nothing was
 * going on on the bus at the time.  In other words, your drive went out to
 * lunch, and we hit it with a BDR to get it to come back.

Wow, 60 seconds?  That's indeed a pretty good lunch. :)

 * Yep.  There's a timeout for each transaction.  If the transaction doesn't
 * complete in the specified period of time (60 seconds for disk
 * reads/writes), the timeout fires, a BDR is sent and all transactions that
 * were queued to the disk are requeued.

I see.  By the way, can any of these cause panics?  Here's an example:

===
Mar 31 08:46:29 m0 /kernel: (da14:ahc0:0:14:0): SCB 0x31 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Mar 31 08:46:30 m0 /kernel: SEQADDR == 0x8
Mar 31 08:46:30 m0 /kernel: SSTAT1 == 0xa
Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): Queuing a BDR SCB
Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): Bus Device Reset Message Sent
Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): no longer in timeout, status = 34b
Mar 31 08:46:30 m0 /kernel: ahc0: Bus Device Reset on A:14. 1 SCBs aborted
Mar 31 08:47:33 m0 /kernel: (da10:ahc0:0:10:0): SCB 0x23 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Mar 31 08:47:33 m0 /kernel: SEQADDR == 0x8
Mar 31 08:47:33 m0 /kernel: SSTAT1 == 0xa
Mar 31 08:47:33 m0 /kernel: (da10:ahc0:0:10:0): Queuing a BDR SCB
Mar 31 08:47:33 m0 /kernel: (da10:ahc0:0:10:0): Bus Device Reset Message Sent
Mar 31 08:47:33 m0 /kernel: (da10:ahc0:0:10:0): no longer in timeout, status = 34b
Mar 31 08:47:33 m0 /kernel: ahc0: Bus Device Reset on A:10. 1 SCBs aborted
<a few more da10's deleted>
Mar 31 09:09:00 m0 /kernel: (da10:ahc0:0:10:0): SCB 0x31 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Mar 31 09:09:01 m0 /kernel: SEQADDR == 0x8
Mar 31 09:09:01 m0 /kernel: SSTAT1 == 0xa
Mar 31 09:09:01 m0 /kernel: (da10:ahc0:0:10:0): Queuing a BDR SCB
Mar 31 09:09:01 m0 /kernel: (da10:ahc0:0:10:0): Bus Device Reset Message Sent
Mar 31 09:09:01 m0 /kernel: (da10:ahc0:0:10:0): no longer in timeout, status = 34b
Mar 31 09:09:01 m0 /kernel: ahc0: Bus Device Reset on A:10. 1 SCBs aborted
Mar 31 09:33:07 m0 /kernel: (da2:ahc0:0:2:0): SCB 0x31 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Mar 31 09:33:07 m0 /kernel: SEQADDR == 0x8
Mar 31 09:33:07 m0 /kernel: SSTAT1 == 0xa
Mar 31 09:33:07 m0 /kernel: (da2:ahc0:0:2:0): Queuing a BDR SCB
Mar 31 09:33:07 m0 /kernel: (da2:ahc0:0:2:0): Bus Device Reset Message Sent
Mar 31 09:33:07 m0 /kernel: (da2:ahc0:0:2:0): no longer in timeout, status = 34b
Mar 31 09:33:07 m0 /kernel: ahc0: Bus Device Reset on A:2. 1 SCBs aborted
===

which leads to a panic a few minutes later:

===
## gdb -aout -k *.67
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd), 
Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)...
IdlePTD 2334720
initial pcb at 215ba8
panicstr: integer divide fault
panic messages:
---
Fatal trap 18: integer divide fault while in kernel mode
instruction pointer     = 0x8:0xf011c3c6
stack pointer           = 0x10:0xf4f74c84
frame pointer           = 0x10:0xf4f74cbc
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13423 (md5)
interrupt mask          = bio 
trap number             = 18
panic: integer divide fault

syncing disks... 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 giving up

dumping to dev 1, offset 853068
dump 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  0xf0134b0b in boot ()
(kgdb) bt
#0  0xf0134b0b in boot ()
#1  0xf0134e27 in panic ()
#2  0xf01e606d in trap_fatal ()
#3  0xf01e588a in trap ()
#4  0xf011c3c6 in ccdbuffer ()
#5  0xf011c204 in ccdstart ()
#6  0xf011c184 in ccdstrategy ()
#7  0xf01625f2 in spec_strategy ()
#8  0xf0161d29 in spec_vnoperate ()
#9  0xf01cac65 in ufs_vnoperatespec ()
#10 0xf01ca63f in ufs_strategy ()
#11 0xf01cac35 in ufs_vnoperate ()
#12 0xf0152c6d in cluster_read ()
#13 0xf01c3765 in ffs_read ()
#14 0xf015c211 in vn_read ()
#15 0xf013d3a9 in read ()
#16 0xf01e629f in syscall ()
#17 0xf01dc56c in Xint0x80_syscall ()
#18 0x80481c8 in ?? ()
#19 0x80480ca in ?? ()
===

Hmm, so it's dying in ccdbuffer.  Integer divide fault...divide by
zero?  I looked in ccdbuffer(), but the only divisions are by
cs->sc_ileave or ii->ii_ndisk (we're using simple striping), I don't
see how this could possibly cause integer divide faults....

 * Those are SCSI bus phases.  If you're seeing timeouts in datain phase or
 * command phase, that often indicates a termination or cabling problem.
 * 
 * Just look at the SCSI specs if you want to find out about the different
 * SCSI bus phases.

Oh yes, the SCSI specs.  That was my other question. ;)

There used to be SCSI specs on-line at

http://scitexdv.com:8080/SCSI2/

but it doesn't seem to exist anymore.  Do you know of some other
place?  (I did a little search but couldn't find any....)

Thanks
Satoshi


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904132318.QAA49274>