Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Oct 1999 22:00:23 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        sos@freebsd.org
Cc:        alpha@freebsd.org, "Erik H. Bakke" <erik@habatech.no>
Subject:   workaround for ata driver woes on alpha 
Message-ID:  <14346.31193.248797.237477@grits.cs.duke.edu>

next in thread | raw e-mail | index | archive | help


Søren,

There's a problem with the ata-driver on alphas.  Under heavy disk
load, the machine will complain "ad_timeout: lost disk contact -
resetting" and then promptly panic & leave something like the
following stack trace:

panic: trap
#0  0xfffffc0000386c2c in boot (howto=260) at ../../kern/kern_shutdown.c:278
278                     savectx(&dumppcb);
(kgdb) bt
#0  0xfffffc0000386c2c in boot (howto=260) at ../../kern/kern_shutdown.c:278
#1  0xfffffc0000344530 in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0x0)
    at ../../ddb/db_command.c:532
#2  0xfffffc00003441a4 in db_command (last_cmdp=0xfffffc00005b1a60, 
    cmd_table=0x0, aux_cmd_tablep=0xfffffc00005d6990)
    at ../../ddb/db_command.c:333
#3  0xfffffc0000344320 in db_command_loop () at ../../ddb/db_command.c:455
#4  0xfffffc0000347ff8 in db_trap (type=0, code=0) at ../../ddb/db_trap.c:71
#5  0xfffffc00005051c8 in kdb_trap (a0=1, a1=1, a2=9600, entry=3, 
    regs=0xfffffe0011955500) at ../../alpha/alpha/db_interface.c:194
#6  0xfffffc0000512d58 in trap (a0=1, a1=15, a2=9600, entry=3, 
    framep=0xfffffe0011955500) at ../../alpha/alpha/trap.c:285
#7  0xfffffc0000505ad0 in XentIF () at ../../alpha/alpha/exception.s:63
#8  0xfffffc000050538c in Debugger (msg=0x0)
    at ../../alpha/alpha/db_interface.c:256
#9  0xfffffc0000387354 in panic (fmt=0xfffffc00005a76fc "trap")
    at ../../kern/kern_shutdown.c:528
#10 0xfffffc00005131ec in trap (a0=40, a1=1, a2=0, entry=2, 
    framep=0xfffffe0011955740) at ../../alpha/alpha/trap.c:530
#11 0xfffffc0000505b2c in XentMM () at ../../alpha/alpha/exception.s:94
#12 0xfffffc0000523b04 in ad_transfer (request=0xfffffe00087e3c00)
    at ../../dev/ata/ata-disk.c:431
#13 0xfffffc0000521d38 in ata_start (scp=0xfffffe0008713400)
    at ../../dev/ata/ata-all.c:583
#14 0xfffffc0000522338 in ata_reinit (scp=0xfffffe0008713400)
    at ../../dev/ata/ata-all.c:716
#15 0xfffffc000052448c in ad_timeout (request=0xfffffe00087e3c00)
    at ../../dev/ata/ata-disk.c:648
#16 0xfffffc000039025c in softclock () at ../../kern/kern_timeout.c:131
#17 0xfffffc0000376d70 in hardclock (frame=0xfffffe00119559e0)
    at ../../kern/kern_clock.c:253
#18 0xfffffc000051564c in handleclock (arg=0xfffffe00119559e0)
    at ../../alpha/alpha/clock.c:266
#19 0xfffffc0000513e34 in interrupt (a0=0, a1=1536, a2=18446739675668704635, 
    framep=0xfffffe00119559e0) at ../../alpha/alpha/interrupt.c:101
#20 0xfffffc0000505afc in XentInt () at ../../alpha/alpha/exception.s:78

I admit to not understanding callouts, so you might want to take this
theory with a grain of salt:

I believe what is happening is that ad_timeout() gets called (quite
prematurely) at spl0.  While ad_timout() is executing, the interrupt
comes in for the request in question.  The interrupt handler frees the
request that the ad_timeout() call chain is currently operating on (or
otherwise messes with it).  The request is then corrupted, and chaos
(machine check, or a trap for an invalid access) ensues.  I'm tempted
to wrap ad_timeout() in splbio() but there is still a window when
ad_callout() is being called that we'll be at spl0 (is this right, is
it called at spl0? this is what I don't know..)

Anyway, we see this on the alpha because the timeout is hardcoded to
fire after 300 ticks.  This is a little under 3 seconds on an x86
(typically hz<=128) but it is less than 1/3 of a second on an alpha
(typically hz>=1024).  The following patch levels the playing field &
seems to "fixe" the problem on alpha. (at least I'm now able to untar
ports & then rm -rf the tree).

Index: sys/dev/ata/ata-disk.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v
retrieving revision 1.31
diff -u -r1.31 ata-disk.c
--- ata-disk.c  1999/10/10 18:08:36     1.31
+++ ata-disk.c  1999/10/18 01:13:48
@@ -417,7 +417,7 @@
     if (request->donecount == 0) {
 
        /* start timeout for this transfer */
-       request->timeout_handle = timeout((timeout_t*)ad_timeout, request, 300);
+       request->timeout_handle = timeout((timeout_t*)ad_timeout, request, 3*hz);
 
        /* setup transfer parameters */
        count = howmany(request->bytecount, DEV_BSIZE);


Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14346.31193.248797.237477>