Date: Sat, 22 Dec 2001 17:09:38 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: "Kristian K. Nielsen" <jkkn@jkkn.dk>, <bradym@mail.hydrologue.com>, <peter.jeremy@alcatel.com.au>, <davidc@acns.ab.ca>, <rnyberg@it.su.se>, <david@catwhisker.org>, <freebsd-stable@FreeBSD.ORG> Subject: More research (was Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers) Message-ID: <200112230109.fBN19cK98748@apollo.backplane.com> References: <200112202333.fBKNXZ679605@apollo.backplane.com> <29650.193.88.88.10.1008956812.squirrel@webmail.jkkn.net> <200112211930.fBLJUs988388@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I've been examining Brady's crash dump a bit more and decided to try to locate a copy of the corrupted data elsewhere in the dump. Here is an excerpt from the dump associated with the corrupted vm_page. Everything from the 70 72 65 2d ... through ... 72 0a 08 is corruption. 07e85d14 00 00 00 00 d8 56 b0 c8 00 00 00 00 70 72 65 2d -------- 66 65 74 63 68 00 00 00 00 e0 0a 08 ff ff ff ff -------- 00 00 ff ff 00 00 00 00 00 72 0a 08 00 00 00 00 -------- 80 9f 9e c0 08 35 9f c0 00 00 00 00 80 9f 9e c0 -------- 14 35 9f c0 e0 34 ac c8 02 00 00 00 00 00 2a 03 -------- 00 00 00 00 00 00 00 00 68 9d 91 c0 Here is a matching sequence from elsewhere in the dump. 06f2dc10 a0 6b 0a 08 60 6c 0a 08 00 00 00 00 80 7f 0a 08 |.k..`l..........| 06f2dc20 b0 6b 0a 08 70 6c 0a 08 00 00 00 00 80 7f 0a 08 |.k..pl..........| 06f2dc30 70 72 65 2d 66 65 74 63 68 00 00 00 00 e0 0a 08 |pre-fetch.......| 06f2dc40 ff ff ff ff ff ff ff ff 00 00 00 00 80 72 0a 08 |.............r..| 06f2dc50 e0 6b 0a 08 ff ff ff ff 00 00 00 00 80 7f 0a 08 |.k..............| 06f2dc60 10 6c 0a 08 b0 6c 0a 08 00 00 00 00 00 f0 0a 08 |.l...l..........| 06f2dc70 20 6c 0a 08 c0 6c 0a 08 00 00 00 00 00 f0 0a 08 | l...l..........| 06f2dc80 70 6f 73 74 2d 66 65 74 63 68 00 00 00 e1 0a 08 |post-fetch......| 06f2dc90 ff ff ff ff ff ff ff ff 00 00 00 00 01 bf 0a 08 |................| 06f2dca0 00 6c 0a 08 f0 6c 0a 08 00 00 00 00 00 bf 0a 08 |.l...l..........| As you can see, starting at 06f2dc30 there is an almost exact match... including the '72 0a 08'. The parts that do not match exactly can be explained by modifications made to the vm_page prior to the panic. The portion of the above data space *appears* to be anonymous memory associated with a make, possibly anonymous memory that was paged in or out. I'm not sure. The question still stands: How the hell did this segment of data, approximately 32 bytes (possibly exactly 32 bytes!), on a 32 byte boundry, wind up in the middle of a vm_page_t ? 32 bytes, 32 bytes... cache line? Two cache lines? Possible bug in our TRAP assembly (if an interrupt occurs during the insw or insl rep loop)? Possible bug in our insb()/insw() inline assembly in i386/include/cpufunc.h? But these crashes occur whether DMA is turned on or not, so I dunno. Ideas anyone? Just for the hell of it (shooting in the dark), I would like people having this crashing problem with 4.x to apply the following patch. It disables the use of FP registers in a bcopy operation. I know, I know, probably can't be it, but as I said, I'm shooting in the dark here. -Matt Index: i386/i386/support.s =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/support.s,v retrieving revision 1.67.2.5 diff -u -r1.67.2.5 support.s --- i386/i386/support.s 15 Aug 2001 01:23:50 -0000 1.67.2.5 +++ i386/i386/support.s 23 Dec 2001 00:36:27 -0000 @@ -243,6 +243,9 @@ */ cmpl $0,_npxproc je i586_bz1 +#if 1 + jmp intreg_i586_bzero +#endif cmpl $256+184,%ecx /* empirical; not quite 2*108 more */ jb intreg_i586_bzero sarb $1,kernel_fpu_lock To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200112230109.fBN19cK98748>