Date: Sun, 17 Oct 1999 22:00:23 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: sos@freebsd.org Cc: alpha@freebsd.org, "Erik H. Bakke" <erik@habatech.no> Subject: workaround for ata driver woes on alpha Message-ID: <14346.31193.248797.237477@grits.cs.duke.edu>
next in thread | raw e-mail | index | archive | help
Søren,
There's a problem with the ata-driver on alphas. Under heavy disk
load, the machine will complain "ad_timeout: lost disk contact -
resetting" and then promptly panic & leave something like the
following stack trace:
panic: trap
#0 0xfffffc0000386c2c in boot (howto=260) at ../../kern/kern_shutdown.c:278
278 savectx(&dumppcb);
(kgdb) bt
#0 0xfffffc0000386c2c in boot (howto=260) at ../../kern/kern_shutdown.c:278
#1 0xfffffc0000344530 in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0x0)
at ../../ddb/db_command.c:532
#2 0xfffffc00003441a4 in db_command (last_cmdp=0xfffffc00005b1a60,
cmd_table=0x0, aux_cmd_tablep=0xfffffc00005d6990)
at ../../ddb/db_command.c:333
#3 0xfffffc0000344320 in db_command_loop () at ../../ddb/db_command.c:455
#4 0xfffffc0000347ff8 in db_trap (type=0, code=0) at ../../ddb/db_trap.c:71
#5 0xfffffc00005051c8 in kdb_trap (a0=1, a1=1, a2=9600, entry=3,
regs=0xfffffe0011955500) at ../../alpha/alpha/db_interface.c:194
#6 0xfffffc0000512d58 in trap (a0=1, a1=15, a2=9600, entry=3,
framep=0xfffffe0011955500) at ../../alpha/alpha/trap.c:285
#7 0xfffffc0000505ad0 in XentIF () at ../../alpha/alpha/exception.s:63
#8 0xfffffc000050538c in Debugger (msg=0x0)
at ../../alpha/alpha/db_interface.c:256
#9 0xfffffc0000387354 in panic (fmt=0xfffffc00005a76fc "trap")
at ../../kern/kern_shutdown.c:528
#10 0xfffffc00005131ec in trap (a0=40, a1=1, a2=0, entry=2,
framep=0xfffffe0011955740) at ../../alpha/alpha/trap.c:530
#11 0xfffffc0000505b2c in XentMM () at ../../alpha/alpha/exception.s:94
#12 0xfffffc0000523b04 in ad_transfer (request=0xfffffe00087e3c00)
at ../../dev/ata/ata-disk.c:431
#13 0xfffffc0000521d38 in ata_start (scp=0xfffffe0008713400)
at ../../dev/ata/ata-all.c:583
#14 0xfffffc0000522338 in ata_reinit (scp=0xfffffe0008713400)
at ../../dev/ata/ata-all.c:716
#15 0xfffffc000052448c in ad_timeout (request=0xfffffe00087e3c00)
at ../../dev/ata/ata-disk.c:648
#16 0xfffffc000039025c in softclock () at ../../kern/kern_timeout.c:131
#17 0xfffffc0000376d70 in hardclock (frame=0xfffffe00119559e0)
at ../../kern/kern_clock.c:253
#18 0xfffffc000051564c in handleclock (arg=0xfffffe00119559e0)
at ../../alpha/alpha/clock.c:266
#19 0xfffffc0000513e34 in interrupt (a0=0, a1=1536, a2=18446739675668704635,
framep=0xfffffe00119559e0) at ../../alpha/alpha/interrupt.c:101
#20 0xfffffc0000505afc in XentInt () at ../../alpha/alpha/exception.s:78
I admit to not understanding callouts, so you might want to take this
theory with a grain of salt:
I believe what is happening is that ad_timeout() gets called (quite
prematurely) at spl0. While ad_timout() is executing, the interrupt
comes in for the request in question. The interrupt handler frees the
request that the ad_timeout() call chain is currently operating on (or
otherwise messes with it). The request is then corrupted, and chaos
(machine check, or a trap for an invalid access) ensues. I'm tempted
to wrap ad_timeout() in splbio() but there is still a window when
ad_callout() is being called that we'll be at spl0 (is this right, is
it called at spl0? this is what I don't know..)
Anyway, we see this on the alpha because the timeout is hardcoded to
fire after 300 ticks. This is a little under 3 seconds on an x86
(typically hz<=128) but it is less than 1/3 of a second on an alpha
(typically hz>=1024). The following patch levels the playing field &
seems to "fixe" the problem on alpha. (at least I'm now able to untar
ports & then rm -rf the tree).
Index: sys/dev/ata/ata-disk.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v
retrieving revision 1.31
diff -u -r1.31 ata-disk.c
--- ata-disk.c 1999/10/10 18:08:36 1.31
+++ ata-disk.c 1999/10/18 01:13:48
@@ -417,7 +417,7 @@
if (request->donecount == 0) {
/* start timeout for this transfer */
- request->timeout_handle = timeout((timeout_t*)ad_timeout, request, 300);
+ request->timeout_handle = timeout((timeout_t*)ad_timeout, request, 3*hz);
/* setup transfer parameters */
count = howmany(request->bytecount, DEV_BSIZE);
Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
Duke University Email: gallatin@cs.duke.edu
Department of Computer Science Phone: (919) 660-6590
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14346.31193.248797.237477>
