Date: Mon, 9 Jun 2014 14:23:31 -0500 From: Alan Cox <alc@rice.edu> To: John-Mark Gurney <jmg@funkthat.com> Cc: alc@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: svn commit: r266850 - in head/sys/arm/xscale: i80321 i8134x ixp425 pxa Message-ID: <9100CDFA-0C40-4BC8-AA9C-1DE37EEA6208@rice.edu> In-Reply-To: <20140609174431.GT31367@funkthat.com> References: <20140601081153.GU43976@funkthat.com> <53935755.70908@rice.edu> <20140608003944.GK31367@funkthat.com> <53949D96.3060409@rice.edu> <20140608235611.GP31367@funkthat.com> <53950BB9.3090808@rice.edu> <20140609042206.GQ31367@funkthat.com> <5395D312.5000302@rice.edu> <20140609163302.GS31367@funkthat.com> <5395E725.7020807@rice.edu> <20140609174431.GT31367@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 9, 2014, at 12:44 PM, John-Mark Gurney wrote: > Alan Cox wrote this message on Mon, Jun 09, 2014 at 11:56 -0500: >> On 06/09/2014 11:33, John-Mark Gurney wrote: >>> Alan Cox wrote this message on Mon, Jun 09, 2014 at 10:30 -0500: >>>> On 06/08/2014 23:22, John-Mark Gurney wrote: >>>>> Alan Cox wrote this message on Sun, Jun 08, 2014 at 20:19 -0500: >>>>>> On 06/08/2014 18:56, John-Mark Gurney wrote: >>>>>>> Alan Cox wrote this message on Sun, Jun 08, 2014 at 12:29 -0500: >>>>>>>> On 06/07/2014 19:39, John-Mark Gurney wrote: >>>>>>>>> Alan Cox wrote this message on Sat, Jun 07, 2014 at 13:17 = -0500: >>>>>>>>>> On 06/01/2014 03:11, John-Mark Gurney wrote: >>>>>>>>>>> Alan Cox wrote this message on Fri, May 30, 2014 at 11:04 = -0500: >>>>>>>>>>>> On 05/30/2014 01:32, John-Mark Gurney wrote: >>>>>>>>>>>>> Olivier Houchard wrote this message on Thu, May 29, 2014 = at 19:38 +0200: >>>>>>>>>>>>>> On Thu, May 29, 2014 at 10:19:18AM -0700, Adrian Chadd = wrote: >>>>>>>>>>>>>>> On 29 May 2014 10:16, Olivier Houchard <cognet@ci0.org> = wrote: >>>>>>>>>>>>>>>> On Thu, May 29, 2014 at 10:14:53AM -0700, Adrian Chadd = wrote: >>>>>>>>>>>>>>>>> Have you tested this on xscale hardware? >>>>>>>>>>>>>>>> Yeah, my two last commits were an attempt to get the = AVILA kernel to boot >>>>>>>>>>>>>>>> again. >>>>>>>>>>>>>>> Woo! What can I provide to help you do this? :-) >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> (Drinks? Food? Donations?) >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Drinks and food are always appreciated ;) >>>>>>>>>>>>>> It almost boots for me now, except a few userland = programs gets SIGSEGV or >>>>>>>>>>>>>> SIGILL along the way, trying to figure out why. >>>>>>>>>>>>> Thanks for fixing ddb... I'm getting panic messages = again... bad >>>>>>>>>>>>> news is that my panic is still around: >>>>>>>>>>>>> panic: vm_page_alloc: page 0xc07e73b0 is wired >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Though, interestingly, it looks like sparc64 has a similar = panic: >>>>>>>>>>>>> https://www.freebsd.org/cgi/query-pr.cgi?pr=3D187080 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> kib, Alan, any clue to why this is happening? Any = suggestions as to >>>>>>>>>>>>> help track it down? >>>>>>>>>>>> I'm afraid not. The dump below shows a perfectly normal, = in-use page.=20 >>>>>>>>>>>> If this page had actually been free prior to the = vm_page_alloc() call, >>>>>>>>>>>> then other fields, like dirty, would have been different. = In other >>>>>>>>>>>> words, this isn't just a problem with the wire count. >>>>>>>>>>>>=20 >>>>>>>>>>>> What object is vm_page_alloc() being performed on? >>>>>>>>>>> Is this enough? Or do you need more? >>>>>>>>>>>=20 >>>>>>>>>>> panic: vm_page_alloc: page 0xc07e73b0 is wired, obj: = 0xc1500b40 >>>>>>>>>>> KDB: enter: panic >>>>>>>>>>> [ thread pid 781 tid 100051 ] >>>>>>>>>>> Stopped at kdb_enter+0x40: ldrb r15, [r15, r15, ror = r15]! >>>>>>>>>>> db> show object/f 0xc1500b40 >>>>>>>>>>> Object 0xc1500b40: type=3D2, size=3D0xa, res=3D9, ref=3D0, = flags=3D0x0 ruid -1 charge 0 >>>>>>>>>>> sref=3D0, backing_object(0)=3D(0)+0x0 >>>>>>>>>>> = memory:=3D(off=3D0x0,page=3D0x8f0000),(off=3D0x1,page=3D0x8f1000),(off=3D0= x2,page=3D0x8ee000),(off=3D0x3,page=3D0x8ef000),(off=3D0x4,page=3D0x8f3000= ),(off=3D0x5,page=3D0x8f4000) >>>>>>>>>>> = ...(off=3D0x6,page=3D0x8fa000),(off=3D0x7,page=3D0x8fb000),(off=3D0x8,page= =3D0x8fc000) >>>>>>>>>>>=20 >>>>>>>>>>> If you need more, let me know what/how to get it, and I = will... >>>>>>>>>>>=20 >>>>>>>>>> Anyone who has seen the "wired page" panic, please try the = attached >>>>>>>>>> patch. It introduces some new KASSERT()s that may help me to = narrow >>>>>>>>>> down the problem. I haven't been able to trigger these = KASSERT()s on >>>>>>>>>> amd64, but the symptoms that you guys are reporting are = consistent with >>>>>>>>>> a bug that would trigger these KASSERT()s. >>>>>>>>> Ok, it triggered the xxx one: >>>>>>>>> Starting sendmail_msp_queue. >>>>>>>>> panic: vm_phys_free_contig: xxx >>>>>>>>> KDB: enter: panic >>>>>>>>> [ thread pid 782 tid 100051 ] >>>>>>>>> Stopped at kdb_enter+0x40: ldrb r15, [r15, r15, ror = r15]! >>>>>>>>> db> bt >>>>>>>>> Tracing pid 782 tid 100051 td 0xc1470000 >>>>>>>>> db_trace_self() at db_trace_self >>>>>>>>> pc =3D 0xc0566ec8 lr =3D 0xc0566f54 = (db_trace_thread+0x50) >>>>>>>>> sp =3D 0xcd830850 fp =3D 0xc03db694 >>>>>>>>> db_trace_thread() at db_trace_thread+0x50 >>>>>>>>> pc =3D 0xc0566f54 lr =3D 0xc022cd14 = (db_command_init+0x620) >>>>>>>>> sp =3D 0xcd8308b0 fp =3D 0xc03db694 >>>>>>>>> db_command_init() at db_command_init+0x620 >>>>>>>>> pc =3D 0xc022cd14 lr =3D 0xc022c3ec = (db_skip_to_eol+0x480) >>>>>>>>> sp =3D 0xcd8308c8 fp =3D 0xc03db694 >>>>>>>>> r4 =3D 0xc0683c30 r5 =3D 0x00000000 >>>>>>>>> db_skip_to_eol() at db_skip_to_eol+0x480 >>>>>>>>> pc =3D 0xc022c3ec lr =3D 0xc022c554 = (db_command_loop+0x5c) >>>>>>>>> sp =3D 0xcd830968 fp =3D 0xc03db694 >>>>>>>>> r4 =3D 0xcd83097c r5 =3D 0xc0683efc >>>>>>>>> r6 =3D 0x00000000 r7 =3D 0x00000000 >>>>>>>>> r8 =3D 0x00000001 r10 =3D 0x600000d3 >>>>>>>>> db_command_loop() at db_command_loop+0x5c >>>>>>>>> pc =3D 0xc022c554 lr =3D 0xc022e99c = (X_db_sym_numargs+0xec) >>>>>>>>> sp =3D 0xcd830970 fp =3D 0xc03db694 >>>>>>>>> X_db_sym_numargs() at X_db_sym_numargs+0xec >>>>>>>>> pc =3D 0xc022e99c lr =3D 0xc03db8c4 (kdb_trap+0x94) >>>>>>>>> sp =3D 0xcd830a88 fp =3D 0xc03db694 >>>>>>>>> r4 =3D 0x00000000 >>>>>>>>> kdb_trap() at kdb_trap+0x94 >>>>>>>>> pc =3D 0xc03db8c4 lr =3D 0xc0578eb0 = (undefinedinstruction+0x2c8) >>>>>>>>> sp =3D 0xcd830aa8 fp =3D 0xc03db694 >>>>>>>>> r4 =3D 0x00000000 r5 =3D 0x00000000 >>>>>>>>> r6 =3D 0x00000000 r7 =3D 0xcd830b20 >>>>>>>>> r8 =3D 0xe7ffffff r10 =3D 0xe7ffffff >>>>>>>>> undefinedinstruction() at undefinedinstruction+0x2c8 >>>>>>>>> pc =3D 0xc0578eb0 lr =3D 0xc0568a0c (exception_exit) >>>>>>>>> sp =3D 0xcd830b20 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0xffffffff r5 =3D 0xffff1004 >>>>>>>>> r6 =3D 0xc06d0ebc r7 =3D 0xcd830ba4 >>>>>>>>> r8 =3D 0xc1470000 r9 =3D 0x00000013 >>>>>>>>> r10 =3D 0x00000010 >>>>>>>>> exception_exit() at exception_exit >>>>>>>>> pc =3D 0xc0568a0c lr =3D 0xc03db68c (kdb_enter+0x38) >>>>>>>>> sp =3D 0xcd830b70 fp =3D 0xc0613e70 >>>>>>>>> r0 =3D 0x00000012 r1 =3D 0x60000013 >>>>>>>>> r2 =3D 0xc06df2ac r3 =3D 0xc06d0ee8 >>>>>>>>> r4 =3D 0xc05e5258 r5 =3D 0xc06155e8 >>>>>>>>> r6 =3D 0xc06d0ebc r7 =3D 0xcd830ba4 >>>>>>>>> r8 =3D 0xc1470000 r9 =3D 0x00000013 >>>>>>>>> r10 =3D 0x00000010 r12 =3D 0xc05e2518 >>>>>>>>> kdb_enter() at kdb_enter+0x44 >>>>>>>>> pc =3D 0xc03db698 lr =3D 0xc03aa094 = (kern_reboot+0x948) >>>>>>>>> sp =3D 0xcd830b78 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0x00000100 >>>>>>>>> kern_reboot() at kern_reboot+0x948 >>>>>>>>> pc =3D 0xc03aa094 lr =3D 0xc03aa164 = (kassert_panic+0x68) >>>>>>>>> sp =3D 0xcd830b90 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0xc06155e8 r5 =3D 0xc07e74a0 >>>>>>>>> r6 =3D 0xc07e6fa0 r7 =3D 0x00000004 >>>>>>>>> r8 =3D 0x00000010 >>>>>>>>> kassert_panic() at kassert_panic+0x68 >>>>>>>>> pc =3D 0xc03aa164 lr =3D 0xc055a0a8 = (vm_phys_free_contig+0x8c) >>>>>>>>> sp =3D 0xcd830bb0 fp =3D 0xc0613e70 >>>>>>>>> r0 =3D 0xc06155e8 r1 =3D 0xc07e6d20 >>>>>>>>> r2 =3D 0xc06e6a70 r3 =3D 0x00000000 >>>>>>>>> r4 =3D 0xc07e73b0 >>>>>>>>> vm_phys_free_contig() at vm_phys_free_contig+0x8c >>>>>>>>> pc =3D 0xc055a0a8 lr =3D 0xc055ca70 = (vm_reserv_startup+0x4bc) >>>>>>>>> sp =3D 0xcd830bd0 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0xc08fb2cc r5 =3D 0x00000008 >>>>>>>>> r6 =3D 0x000000e8 r7 =3D 0xc08fb280 >>>>>>>>> r8 =3D 0x00000005 r10 =3D 0x00000001 >>>>>>>>> vm_reserv_startup() at vm_reserv_startup+0x4bc >>>>>>>>> pc =3D 0xc055ca70 lr =3D 0xc055cb40 = (vm_reserv_startup+0x58c) >>>>>>>>> sp =3D 0xcd830be8 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0xc08fb280 r5 =3D 0x00000000 >>>>>>>>> r6 =3D 0xc14b7280 r7 =3D 0x00000040 >>>>>>>>> r8 =3D 0x00000000 >>>>>>>>> vm_reserv_startup() at vm_reserv_startup+0x58c >>>>>>>>> pc =3D 0xc055cb40 lr =3D 0xc055ce08 = (vm_reserv_reclaim_inactive+0x34) >>>>>>>>> sp =3D 0xcd830bf0 fp =3D 0xc0613e70 >>>>>>>>> r4 =3D 0xc06e6550 >>>>>>>>> vm_reserv_reclaim_inactive() at = vm_reserv_reclaim_inactive+0x34 >>>>>>>>> pc =3D 0xc055ce08 lr =3D 0xc0554cb8 = (vm_page_alloc+0x280) >>>>>>>>> sp =3D 0xcd830bf8 fp =3D 0xc0613e70 >>>>>>>>> vm_page_alloc() at vm_page_alloc+0x280 >>>>>>>>> pc =3D 0xc0554cb8 lr =3D 0xc0540eb0 = (vm_fault_hold+0x60c) >>>>>>>>> sp =3D 0xcd830c30 fp =3D 0xcd830dac >>>>>>>>> r4 =3D 0xc14b7280 r5 =3D 0xc0618d00 >>>>>>>>> r6 =3D 0xcd830eb0 r7 =3D 0xc1470000 >>>>>>>>> r8 =3D 0xcd830e60 r9 =3D 0x00000000 >>>>>>>>> r10 =3D 0x00000000 >>>>>>>>> vm_fault_hold() at vm_fault_hold+0x60c >>>>>>>>> pc =3D 0xc0540eb0 lr =3D 0xc05426b8 (vm_fault+0x44) >>>>>>>>> sp =3D 0xcd830db0 fp =3D 0x00000002 >>>>>>>>> r4 =3D 0xc14c8a0c r5 =3D 0xc0618d00 >>>>>>>>> r6 =3D 0xcd830eb0 r7 =3D 0xc1470000 >>>>>>>>> r8 =3D 0xcd830e60 r9 =3D 0x00000001 >>>>>>>>> r10 =3D 0x00000000 >>>>>>>>> vm_fault() at vm_fault+0x44 >>>>>>>>> pc =3D 0xc05426b8 lr =3D 0xc05782d0 = (data_abort_handler+0x35c) >>>>>>>>> sp =3D 0xcd830dc0 fp =3D 0x00000002 >>>>>>>>> data_abort_handler() at data_abort_handler+0x35c >>>>>>>>> pc =3D 0xc05782d0 lr =3D 0xc0568a0c (exception_exit) >>>>>>>>> sp =3D 0xcd830dc0 fp =3D 0x00000002 >>>>>>>>> data_abort_handler() at data_abort_handler+0x35c >>>>>>>>> pc =3D 0xc05782d0 lr =3D 0xc0568a0c (exception_exit) >>>>>>>>> sp =3D 0xcd830e60 fp =3D 0x20c43000 >>>>>>>>> r4 =3D 0xffffffff r5 =3D 0xffff1004 >>>>>>>>> r6 =3D 0x00000000 r7 =3D 0x20443740 >>>>>>>>> r8 =3D 0x0009b8e4 r9 =3D 0x00000001 >>>>>>>>> r10 =3D 0x00000004 >>>>>>>>> exception_exit() at exception_exit >>>>>>>>> pc =3D 0xc0568a0c lr =3D 0x204140d0 (0x204140d0) >>>>>>>>> sp =3D 0xcd830eb0 fp =3D 0x20c43000 >>>>>>>>> r0 =3D 0x00000000 r1 =3D 0x20c4302c >>>>>>>>> r2 =3D 0x00000004 r3 =3D 0x00000000 >>>>>>>>> r4 =3D 0x20446190 r5 =3D 0x20c4302c >>>>>>>>> r6 =3D 0x00000000 r7 =3D 0x20443740 >>>>>>>>> r8 =3D 0x0009b8e4 r9 =3D 0x00000001 >>>>>>>>> r10 =3D 0x00000004 r12 =3D 0x00000001 >>>>>>>>> Unable to unwind into user mode >>>>>>>>>=20 >>>>>>>>> Hope this helps, let me know if you need anything else... >>>>>>>>>=20 >>>>>>>> Please try the attached patch. It adds another KASSERT() loop. >>>>>>>>=20 >>>>>>>> Depending on which KASSERT() fires, that will tell us whether = to look >>>>>>>> deeper at this function or its caller for the source of the = problem. >>>>>>> Ok, that panic is: >>>>>>> panic: vm_phys_free_contig: start 0xc07e6d20 21 24 >>>>>>>=20 >>>>>>> Let me know if you need any more info... oh, btw, the last %u = needed >>>>>>> to be %lu since it was a u_long, not an unsigned... >>>>>>>=20 >>>>>> Ok. Here is the next debug patch. >>>>> so, it's crashing in the same place: >>>>> panic: vm_phys_free_contig: start 0xc07e6d20 21 24 >>>>>=20 >>>>> so, I commented out this KASSERT, and now it panics with: >>>>> panic: vm_phys_free_contig: xxx 0xc07e6fa0 13 16 >>>>>=20 >>>>> so I commented out this KASSERT too, and it panics back w/ the = original >>>>> panic.. So it didn't hit the new KASSERT in vm_reserv_break... >>>> Next patch...It should panic in vm_reserv_break this time and tell = me if >>>> the reservation being broken belongs to the same object as the = inuse >>>> page that is being inappropriately freed. >>> So, bad news... still panics with: >>> panic: vm_phys_free_contig: start 0xc07e6d20 21 24 >>>=20 >>> This panic seems to be consistent now, in that the start address is >>> always the same... Is there a way you could add various debugging >>> for this specific vm page to catch a stack trace (stack(9)) where = it's >>> going wrong? =20 >>>=20 >>=20 >> I made a mistake with the new KASSERT()s in vm_reserv_break(). Try = this. >=20 > No worried, the new patch panics: > panic: vm_reserv_break: 2 saved_object=3D0xc06e6378 x=3D253 = m_tmp->object=3D0xc06e6378 (1) >=20 Is your arm processor running in big-endian or little-endian mode? > w/ a bt of: > [...] > vm_reserv_startup() at vm_reserv_startup+0x570 > pc =3D 0xc055cd94 lr =3D 0xc055cec8 (vm_reserv_startup+0x6a4) > sp =3D 0xcd833be8 fp =3D 0xc06142d0 > r4 =3D 0xc08fb280 r5 =3D 0x00000000 > r6 =3D 0xc14b76e0 r7 =3D 0x00000000 > r8 =3D 0x00000000 r9 =3D 0x00000033 > r10 =3D 0x00000001 > vm_reserv_startup() at vm_reserv_startup+0x6a4 > pc =3D 0xc055cec8 lr =3D 0xc055d190 = (vm_reserv_reclaim_inactive+0x34) > sp =3D 0xcd833bf0 fp =3D 0xc06142d0 > r4 =3D 0xc06e6550 > vm_reserv_reclaim_inactive() at vm_reserv_reclaim_inactive+0x34 > pc =3D 0xc055d190 lr =3D 0xc0554eb0 (vm_page_alloc+0x280) > sp =3D 0xcd833bf8 fp =3D 0xc06142d0 > vm_page_alloc() at vm_page_alloc+0x280 > pc =3D 0xc0554eb0 lr =3D 0xc0540ebc (vm_fault_hold+0x60c) > sp =3D 0xcd833c30 fp =3D 0xcd833dac > r4 =3D 0xc14b76e0 r5 =3D 0xc0619288 > r6 =3D 0xcd833eb0 r7 =3D 0xc0f7ec80 > r8 =3D 0xcd833e60 r9 =3D 0x00000000 > r10 =3D 0x00000000 > vm_fault_hold() at vm_fault_hold+0x60c > pc =3D 0xc0540ebc lr =3D 0xc05426c4 (vm_fault+0x44) > sp =3D 0xcd833db0 fp =3D 0x00000002 > r4 =3D 0xc14c66ec r5 =3D 0xc0619288 > r6 =3D 0xcd833eb0 r7 =3D 0xc0f7ec80 > r8 =3D 0xcd833e60 r9 =3D 0x00000001 > r10 =3D 0x00000000 > vm_fault() at vm_fault+0x44 > pc =3D 0xc05426c4 lr =3D 0xc05786d0 = (data_abort_handler+0x35c) > sp =3D 0xcd833dc0 fp =3D 0x00000002 > data_abort_handler() at data_abort_handler+0x35c > pc =3D 0xc05786d0 lr =3D 0xc0568dc8 (exception_exit) > sp =3D 0xcd833e60 fp =3D 0x00000000 > r4 =3D 0xffffffff r5 =3D 0xffff1004 > r6 =3D 0x001b7740 r7 =3D 0x00052ec4 > r8 =3D 0x00000000 r9 =3D 0x000cc4b0 > r10 =3D 0x00000000 > exception_exit() at exception_exit > pc =3D 0xc0568dc8 lr =3D 0x203f1208 (0x203f1208) > sp =3D 0xcd833eb0 fp =3D 0x00000000 > r0 =3D 0x20c53e60 r1 =3D 0x00000000 > r2 =3D 0x000eeb40 r3 =3D 0x00000001 > r4 =3D 0x00000000 r5 =3D 0x000e9654 > r6 =3D 0x001b7740 r7 =3D 0x00052ec4 > r8 =3D 0x00000000 r9 =3D 0x000cc4b0 > r10 =3D 0x00000000 r12 =3D 0x200d26a4 >=20 > Let me know if you need any more information.. >=20 > Thanks for tracking this down. >=20 > --=20 > John-Mark Gurney Voice: +1 415 225 5579 >=20 > "All that I will do, has been done, All that I have, has not." >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9100CDFA-0C40-4BC8-AA9C-1DE37EEA6208>