Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Apr 1999 17:25:34 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        asami@FreeBSD.ORG (Satoshi - Ports Wraith - Asami)
Cc:        scsi@FreeBSD.ORG
Subject:   Re: timed out while idle?
Message-ID:  <199904132325.RAA05814@panzer.plutotech.com>
In-Reply-To: <199904132318.QAA49274@silvia.hip.berkeley.edu> from Satoshi - Ports Wraith - Asami at "Apr 13, 1999  4:18: 4 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Satoshi - Ports Wraith - Asami wrote...
>  * From: "Kenneth D. Merry" <ken@plutotech.com>
> 
>  * The timed out while idle message means that the drive took longer than the
>  * timeout (60 seconds) to respond to a read or write request, and nothing was
>  * going on on the bus at the time.  In other words, your drive went out to
>  * lunch, and we hit it with a BDR to get it to come back.
> 
> Wow, 60 seconds?  That's indeed a pretty good lunch. :)

Yep.

>  * Yep.  There's a timeout for each transaction.  If the transaction doesn't
>  * complete in the specified period of time (60 seconds for disk
>  * reads/writes), the timeout fires, a BDR is sent and all transactions that
>  * were queued to the disk are requeued.
> 
> I see.  By the way, can any of these cause panics?  Here's an example:

I don't think the timeouts in and of themselves will cause panics, unless
maybe the drive never recovers.  

> ===
> Mar 31 08:46:29 m0 /kernel: (da14:ahc0:0:14:0): SCB 0x31 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
> Mar 31 08:46:30 m0 /kernel: SEQADDR == 0x8
> Mar 31 08:46:30 m0 /kernel: SSTAT1 == 0xa
> Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): Queuing a BDR SCB
> Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): Bus Device Reset Message Sent
> Mar 31 08:46:30 m0 /kernel: (da14:ahc0:0:14:0): no longer in timeout, status = 34b
> Mar 31 08:46:30 m0 /kernel: ahc0: Bus Device Reset on A:14. 1 SCBs aborted

[ ... ]

> 
> which leads to a panic a few minutes later:
> 
> ===
> ## gdb -aout -k *.67
> GDB is free software and you are welcome to distribute copies of it
>  under certain conditions; type "show copying" to see the conditions.
> There is absolutely no warranty for GDB; type "show warranty" for details.
> GDB 4.16 (i386-unknown-freebsd), 
> Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)...
> IdlePTD 2334720
> initial pcb at 215ba8
> panicstr: integer divide fault
> panic messages:
> ---
> Fatal trap 18: integer divide fault while in kernel mode
> instruction pointer     = 0x8:0xf011c3c6
> stack pointer           = 0x10:0xf4f74c84
> frame pointer           = 0x10:0xf4f74cbc
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 13423 (md5)
> interrupt mask          = bio 
> trap number             = 18
> panic: integer divide fault
> 
> syncing disks... 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 giving up
> 
> dumping to dev 1, offset 853068
> dump 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
> ---
> #0  0xf0134b0b in boot ()
> (kgdb) bt
> #0  0xf0134b0b in boot ()
> #1  0xf0134e27 in panic ()
> #2  0xf01e606d in trap_fatal ()
> #3  0xf01e588a in trap ()
> #4  0xf011c3c6 in ccdbuffer ()
> #5  0xf011c204 in ccdstart ()
> #6  0xf011c184 in ccdstrategy ()
> #7  0xf01625f2 in spec_strategy ()
> #8  0xf0161d29 in spec_vnoperate ()
> #9  0xf01cac65 in ufs_vnoperatespec ()
> #10 0xf01ca63f in ufs_strategy ()
> #11 0xf01cac35 in ufs_vnoperate ()
> #12 0xf0152c6d in cluster_read ()
> #13 0xf01c3765 in ffs_read ()
> #14 0xf015c211 in vn_read ()
> #15 0xf013d3a9 in read ()
> #16 0xf01e629f in syscall ()
> #17 0xf01dc56c in Xint0x80_syscall ()
> #18 0x80481c8 in ?? ()
> #19 0x80480ca in ?? ()
> ===
> 
> Hmm, so it's dying in ccdbuffer.  Integer divide fault...divide by
> zero?  I looked in ccdbuffer(), but the only divisions are by
> cs->sc_ileave or ii->ii_ndisk (we're using simple striping), I don't
> see how this could possibly cause integer divide faults....

I dunno what's going on there.  It could be indirectly caused by the
timeout, but I really don't know how that could happen.

>  * Those are SCSI bus phases.  If you're seeing timeouts in datain phase or
>  * command phase, that often indicates a termination or cabling problem.
>  * 
>  * Just look at the SCSI specs if you want to find out about the different
>  * SCSI bus phases.
> 
> Oh yes, the SCSI specs.  That was my other question. ;)
> 
> There used to be SCSI specs on-line at
> 
> http://scitexdv.com:8080/SCSI2/
> 
> but it doesn't seem to exist anymore.  Do you know of some other
> place?  (I did a little search but couldn't find any....)

Try:

http://www.symbios.com/x3t10/

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904132325.RAA05814>