Date: Tue, 24 Jun 2003 00:12:49 +1000 From: Stephen McKay <smckay@internode.on.net> To: freebsd-hardware@freebsd.org Cc: Stephen McKay <smckay@internode.on.net> Subject: Re: ATA READ command timeout (and worse) Message-ID: <200306231412.h5NECn7K006239@dungeon.home> In-Reply-To: <200306181052.h5IAqTu2008960@dungeon.home> from Stephen McKay at "Wed, 18 Jun 2003 20:52:29 %2B1000" References: <200306171554.h5HFs2DQ041575@mail.synology.com> <200306181052.h5IAqTu2008960@dungeon.home>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, 18th June 2003, Stephen McKay wrote: >I recompiled the kernel with DDB. A few test runs and I got this: > >Jun 18 19:19:44 peon /kernel: ad4: no status, reselecting device >Jun 18 19:19:44 peon /kernel: ad4: timeout sending command=c8 s=ff e=00 >Jun 18 19:19:44 peon /kernel: ad4: error executing command - resetting >Jun 18 19:19:44 peon /kernel: ata2: resetting devices .. >Jun 18 19:19:44 peon /kernel: ad4: removed from configuration >Jun 18 19:19:44 peon /kernel: ad5: removed from configuration >Jun 18 19:19:44 peon /kernel: done > >Fatal trap 12: page fault while in kernel mode >fault virtual address = 0x63657865 After I compiled with INVARIANTS, my crash changes a little: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xdeadc0de fault code = supervisor read, page not present instruction pointer = 0x8:0xc012c7f0 stack pointer = 0x10:0xcd7fdbbc frame pointer = 0x10:0xcd7fdbc8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 397 (diff) interrupt mask = bio kernel: type 12 trap, code=0 Stopped at ad_detach+0x34: cmpl %esi,0(%ebx) db> trace ad_detach(c11ba12c,0) at ad_detach+0x34 ata_reinit(c11ba100,c11ba100,0,0,0) at ata_reinit+0x86 ad_transfer(c1302280) at ad_transfer+0x49c ata_start(c11ba100,0,c12468a4,c651b860,c1246958) at ata_start+0x98 adstrategy(c651b860,c651b860,cd05e780,cd7fdc74,c0192262) at adstrategy+0x95 diskstrategy(c651b860,c1253800,c651b860,c1293a00,cd7fdc80) at diskstrategy+0x95 ... The "0xdeadc0de" address implies the illegal reuse of a freed structure. It looks (after poking about a bit in DDB) that ad_detach is reusing a datastructure after it has been freed. And indeed, it is. After the ad_free(request) call, the request->chain field is used implicitly in the TAILQ_FOREACH() macro, causing hideous painful death. Solution is to do the TAILQ_FOREACH() manually, and a little more carefully. Then I'll be able to see if having my disks disappear is recoverable. Woo hoo! First bug found. I'll see if I can find more. :-) Stephen.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306231412.h5NECn7K006239>