Date: Wed, 18 Jun 2003 20:52:29 +1000 From: Stephen McKay <smckay@internode.on.net> To: joshuah@synology.com Cc: Stephen McKay <smckay@internode.on.net> Subject: Re: ATA READ command timeout (and worse) Message-ID: <200306181052.h5IAqTu2008960@dungeon.home> In-Reply-To: <200306171554.h5HFs2DQ041575@mail.synology.com> from Jaw-Shiang Joshua Huang at "Tue, 17 Jun 2003 23:54:02 %2B0800" References: <200306171554.h5HFs2DQ041575@mail.synology.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, 17th June 2003, Jaw-Shiang Joshua Huang wrote: >Because your machine will reboot automatically when the disk driver operation >is abnormal, it makes me want to know more. > >Is your kernel compiled with DDB? If not, it will reboot after 15 seconds >while hitting panic. If it's reproducable, would you mind to compile a new >kernel and try to find out where it panic or page fault? I just want to know >this bug will make FreeBSD kernel reboot or just hit panic or page fault. I recompiled the kernel with DDB. A few test runs and I got this: Jun 18 19:19:44 peon /kernel: ad4: no status, reselecting device Jun 18 19:19:44 peon /kernel: ad4: timeout sending command=c8 s=ff e=00 Jun 18 19:19:44 peon /kernel: ad4: error executing command - resetting Jun 18 19:19:44 peon /kernel: ata2: resetting devices .. Jun 18 19:19:44 peon /kernel: ad4: removed from configuration Jun 18 19:19:44 peon /kernel: ad5: removed from configuration Jun 18 19:19:44 peon /kernel: done Fatal trap 12: page fault while in kernel mode fault virtual address = 0x63657865 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0164cd9 stack pointer = 0x10:0xc02bd438 frame pointer = 0x10:0xc02bd4c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = kernel: type 12 trap, code=0 Stopped at kvprintf+0x545: repne scasb (%esi) db> trace kvprintf(c028add6,c016456c,c02bd4e0,a,c02bd4fc) at kvprintf+0x545 printf(c028add4,63657865,c1246800,c02bd528,c012d908) at printf+0x44 ata_prtdev(c139a400,c028d280,c028d271,5b512a0,0,0) at ata_prtdev+0x1a ad_timeout(c13bb200,400000,0,0,ffffffff) at ad_timeout+0x40 softclock(0,10,10,10,ffffffff) at softclock+0xd1 doreti_swi(e,665,2,183f9ff,756e6547) at doreti_swi+0xf idle_loop() at idle_loop+0x1d db> Obviously 0x63657865 is suspicious. On further investigation, the ata_device structure at 0xc139a400 has been corrupted. The unit and subsequent fields have been replace by the text string "/libexec/ld-elf.so.1" which is odd, to say the least. Now I don't know what I'm chasing: a random VM bug, bad memory, PCI bus errors, sagging power, bugs in the ata driver, cosmic rays, space aliens. It's been a long time since I've had to do any kernel debugging, but I suppose I'll have to set up a serial console and get to it. Stephen.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306181052.h5IAqTu2008960>