Date: Thu, 5 Aug 2004 19:11:22 +0200 From: "Daniel Eriksson" <daniel_k_eriksson@telia.com> To: =?iso-8859-1?Q?'S=F8ren_Schmidt'?= <sos@DeepCore.dk> Cc: 'Ville-Pertti Keinonen' <will+freebsd-current@will.iki.fi> Subject: RE: ATA driver races with interrupts Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1PdsKAAAAQAAAAMYTGmVxmR0Oc6P/8t/P6dgEAAAAA@telia.com> In-Reply-To: <411127F0.6080407@DeepCore.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
S=F8ren Schmidt wrote: > > I just applied your patch to clean sources dated=20 > 2004.08.04.13.00.00 and ran > > some tests. Everything seems to be working as it should=20 > (just like after the > > serialization patch from Ville-Pertti that I tried=20 > earlier). I will continue > > running with this patch applied to see if it stays stable. >=20 > Good! please keep me posted! Unfortunately the machine disconnected one of the SATA discs earlier = today. It did so out-of-the-blue, because there was no activity at all on = either of the two discs other than the SMART monitor. Aug 5 11:45:47 fortify kernel: ad20: WARNING - removed from = configuration Aug 5 11:45:47 fortify kernel: ata10-master: FAILURE - unknown CMD = (0xb0) timed out Aug 5 11:45:47 fortify smartd[882]: Device: /dev/ad20, not capable of = SMART self-check No other interesting messages in the log. The channel was, as usual, completely locked after this and it took an extended power-off (2 min) = to unlock it (I really don't know what is up with that). Once the channel was unlocked it booted up but page-faulted in the = middle of detecting the attached discs (another reboot took care of that problem, = not sure if the page fault info is interesting at all, but here it is): [...] ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master = UDMA100 ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66 ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address =3D 0x24 fault code =3D supervisor read, page not present instruction pointer =3D 0x8:0xc0580904 stack pointer =3D 0x10:0xdd6e5c1c frame pointer =3D 0x10:0xdd6e5c44 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, def32 1, gran 1 processor eflags =3D resume, IOPL =3D 0 current process =3D 35 (swi5: clock sio) [thread 100036] Stopped at propagate_priority+0x84: movl 0x24(%eax),%eax db> trace propagate_priority(c2734420,c078a9a0,c056f8a9,c0790780,c26e47d0) at propagate_priority+0x84 turnstile_wait(c2735bc0,c078e960,c078a9a0,0,c27440ac) at turnstile_wait+0x31c _mtx_lock_sleep(c078e960,c2734420,0,0,0) at _mtx_lock_sleep+0xe8 softclock(0,0,ffffffff,ffffbfff,ffffffff) at softclock+0x248 ithread_loop(c26d0080,dd6e5d48,ffffffff,ffffffff,ffffffff) at ithread_loop+0x1a8 fork_exit(c05439c0,c26d0080,dd6e5d48) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip =3D 0, esp =3D 0xdd6e5d7c, ebp =3D 0 --- It should have looked something like this: [...] ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master = UDMA100 ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66 ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150 ad22: 238475MB <WDC WD2500JD-00FYB0> [484521/16/63] at ata11-master = SATA150 ar0: 476950MB <ATA RAID0 array> [60802/255/63] status: READY subdisks: disk0 READY on ad4 at ata2-master disk1 READY on ad5 at ata2-slave ar1: 478744MB <ATA RAID0 array> [61031/255/63] status: READY subdisks: disk0 READY on ad6 at ata3-master disk1 READY on ad7 at ata3-slave ar2: 388962MB <ATA RAID0 array> [49585/255/63] status: READY subdisks: disk0 READY on ad9 at ata4-slave disk1 READY on ad8 at ata4-master ar3: 228946MB <ATA RAID0 array> [29186/255/63] status: READY subdisks: disk0 READY on ad15 at ata7-slave disk1 READY on ad16 at ata8-master Waiting 5 seconds for SCSI devices to settle [...] I have switched back to the patch from Ville-Pertti that serializes the controller for now, to see if that is more stable. /Daniel Eriksson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1PdsKAAAAQAAAAMYTGmVxmR0Oc6P/8t/P6dgEAAAAA>