Date: Tue, 18 Oct 2011 18:40:17 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Alexey Shuvaev <shuvaev@physik.uni-wuerzburg.de> Cc: freebsd-current@freebsd.org Subject: Re: Panics after AHCI timeouts Message-ID: <4E9D9DE1.8060501@FreeBSD.org> In-Reply-To: <20111017190027.GA9873@lexx.ifp.tuwien.ac.at> References: <20111008201456.GA3529@lexx.ifp.tuwien.ac.at> <20111017190027.GA9873@lexx.ifp.tuwien.ac.at>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi. Alexey Shuvaev wrote: > On Sat, Oct 08, 2011 at 10:14:56PM +0200, Alexey Shuvaev wrote: > Errr... Replying to myself... Ping? Should I file a PR and put it > in the back burner? :) Sorry for not replying, I wasn't home to look on it closely. >> In the view of upcoming RELEASE-9.0 I should have reported it earlier, >> but it is better later than never... Every time I wanted to report >> this, the system was ~one month old and I tried to upgrade it >> to see, if the problem was still there, waiting for the next panic... >> and when it finally paniced it was one month old again. >> > [snip] >> >From core.txt.5: >> [snip] >> Unread portion of the kernel message buffer: >> Memory modified after free 0xfffffe000416e200(248) val=79e8800 @ 0xfffffe000416e200 >> panic: Most recently used by cred >> >> cpuid = 2 >> Uptime: 20h11m1s >> Dumping 1308 out of 7914 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91% >> [snip] >> #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:252 >> 252 if (textdump && textdump_pending) { >> (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:252 >> #1 0xffffffff808234aa in kern_reboot (howto=260) >> at /usr/src/sys/kern/kern_shutdown.c:430 >> #2 0xffffffff80822f41 in panic (fmt=Variable "fmt" is not available. >> ) >> at /usr/src/sys/kern/kern_shutdown.c:595 >> #3 0xffffffff80a6f7b4 in mtrash_ctor (mem=Variable "mem" is not available. >> ) at /usr/src/sys/vm/uma_dbg.c:137 >> #4 0xffffffff80a6f01c in uma_zalloc_arg (zone=0xfffffe021ffe0700, udata=0x0, >> flags=258) at /usr/src/sys/vm/uma_core.c:2018 >> #5 0xffffffff808108be in malloc (size=Variable "size" is not available. >> ) at uma.h:305 >> #6 0xffffffff8081c21f in crget () at /usr/src/sys/kern/kern_prot.c:1809 >> #7 0xffffffff8081c269 in crdup (cr=0xfffffe0143103300) >> at /usr/src/sys/kern/kern_prot.c:1911 >> #8 0xffffffff808c5ca6 in kern_accessat (td=0xfffffe0007dd7000, fd=-100, >> path=0x80065c000 <Address 0x80065c000 out of bounds>, >> pathseg=UIO_USERSPACE, flags=Variable "flags" is not available. >> ) at /usr/src/sys/kern/vfs_syscalls.c:2201 >> #9 0xffffffff8086719a in syscallenter (td=0xfffffe0007dd7000, >> sa=0xffffff8223f67bb0) at /usr/src/sys/kern/subr_trap.c:344 >> #10 0xffffffff80b0b43c in syscall (frame=0xffffff8223f67c50) >> at /usr/src/sys/amd64/amd64/trap.c:910 >> #11 0xffffffff80af617d in Xfast_syscall () >> at /usr/src/sys/amd64/amd64/exception.S:384 >> #12 0x000000080062dbdc in ?? () >> Previous frame inner to this frame (corrupt stack?) >> [snip] >> [last message in dmesg] >> ahcich0: Timeout on slot 29 port 0 >> ahcich0: is 00000000 cs 00000000 ss ffffffff rs ffffffff tfd 40 serr 00000000 cm >> d 0000fc17 >> [snip] Now looking on two you backtraces I don't see anything common between them. While first crash happened within timer event handler, it was not AHCI-related event. Second crash happened inside some unrelated syscall. I may suppose that some memory corruption could cause both, but I have no idea what it is and how can it be related to AHCI. With the same effect I could tell that some other hardware problem causes both problems. Try to collect more statistics. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E9D9DE1.8060501>