Date: Mon, 18 Oct 1999 09:37:04 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: mjacob@feral.com Cc: alpha@freebsd.org Subject: Re: workaround for ata driver woes on alpha Message-ID: <14347.6330.820928.627692@grits.cs.duke.edu> In-Reply-To: <Pine.BSF.4.10.9910171902490.77636-100000@beppo.feral.com> References: <14346.31193.248797.237477@grits.cs.duke.edu> <Pine.BSF.4.10.9910171902490.77636-100000@beppo.feral.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Jacob writes: > > actually, this happens right away for me w/o even heavy disk load. > > callouts should not be happening at spl0, should they? shouldn't they > happen at the lowest level that is *not* non-interrupt context? At any > rate, callouts have to always protect the resources they might share with > interrupts (whether via spls or locks) like any other shared resource > problem, no? > > -matt <I hope you don't mind me CC'ing this to -alpha, but I want to hear what other people have to say> Sorry. spl0 was a typo. I'd meant to say softclock. Eg, ALPHA_PSL_IPL_SOFT. hardclock() calls (void)splsoftclock(); to set the ipl to ALPHA_PSL_IPL_SOFT just before calling softclock. Softclock itself itself goes to splhigh() while it processes the callouts. But just before it calls the timeout function it lowers the ipl back to ALPHA_PSL_IPL_SOFT: 130 splx(s); 131 c_func(c_arg); 132 s = splhigh(); According to alpha/include/alpha_cpu.h: #define ALPHA_PSL_IPL_0 0x0000 /* all interrupts enabled */ #define ALPHA_PSL_IPL_SOFT 0x0001 /* software ints disabled */ #define ALPHA_PSL_IPL_IO 0x0004 /* I/O dev ints disabled */ #define ALPHA_PSL_IPL_CLOCK 0x0005 /* clock ints disabled */ #define ALPHA_PSL_IPL_HIGH 0x0006 /* all but mchecks disabled */ So we should be open to device interrupts when the ad_timeout() routine is called. I'm pretty sure this is what is happening for the following reason: I bzero() the request structure in ad_interrupt() just prior to freeing it (around line 593 of ata-disk.c). Then at the top of ad_timeout() I print out some values from the request. When ad_timeout() is called, I see non-zero values for the request fields that I have printed, but my panic changes from a machine check to a memory access fault when attempting to deref a pointer in the request struct that was bzeroed. If, after the crash, I examine the fields that I'd printed out, they are now zero. However, all the fields are not zero -- request->retries == 1, for example. I got the crashdump via 'call boot(RB_NOSYNC|RB_DUMP)' in the debugger, so I do not think that sync'ing disks changed anything. I think there is a serious problem with the ad_timeout() function in the case where the request has actually completed & the timeout was too short. ad_timeout() has no way to know if the request it has been passed is still valid, or has been deallocated. Wrapping the function in splbio() will only narrow the race, not close it because we're still going to be at splsoftclock when the function is called. I think setting the timeout to a reasonable value is a good workaround, but I'm still concerned about very slow hardware.. Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14347.6330.820928.627692>