Date: Tue, 25 Mar 2014 07:35:55 -0600 From: Ian Lepore <ian@FreeBSD.org> To: Wojciech Macek <wma@semihalf.com> Cc: freebsd-arm@FreeBSD.org Subject: Re: arm SMP on Cortex-A15 Message-ID: <1395754555.81853.65.camel@revolution.hippie.lan> In-Reply-To: <CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w@mail.gmail.com> References: <CANsEV8euHTsfviiCMP_aet3qYiK2T-oK%2B-37eay7zAPH2S2vaA@mail.gmail.com> <20131220125638.GA5132@mail.bsdpad.com> <20131222092913.GA89153@mail.bsdpad.com> <CANsEV8fSoygoSUyQqKoEQ7tRxjqDOwrPD8dU7O2V2PXRj35j4A@mail.gmail.com> <20131222123636.GA61193@ci0.org> <CANsEV8fWvUkFHi8DP6Nr807RwPDB1iZrO39fpfa44qOkJPidZA@mail.gmail.com> <1395149146.1149.586.camel@revolution.hippie.lan> <1395254911.80941.9.camel@revolution.hippie.lan> <CANsEV8c047SNF61EgP6AiMR2oY=ofcMuTWYZnd60bRmp2Sk9HA@mail.gmail.com> <1395320561.80941.13.camel@revolution.hippie.lan> <CANsEV8f-Cyte-TO%2BCfWVcC_zw5dkYJT8Qfi92eH7yrfhqBvgjg@mail.gmail.com> <1395494401.81853.34.camel@revolution.hippie.lan> <CANsEV8eB1SQHjKrBbG=HCZyb6wuwZmmpq6BKfVatOHm8UtZ9ig@mail.gmail.com> <CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Good, that means we can stop searching for the missing tlb flush when a pte is invalidated. :) You guys can commit whenever you're ready (or I can do it if you'd like). -- Ian On Tue, 2014-03-25 at 11:14 +0100, Wojciech Macek wrote: > Hi Ian, > > The r263251 fix helped, thanks! > So now, if you don';t have any objections, I will clean up a little the > cpufunc.S & pmap-v6 changes and make them ready for submitting. > > Regards, > Wojtek > > > 2014-03-24 14:31 GMT+01:00 Wojciech Macek <wma@semihalf.com>: > > > Without the unconditional invalidation, the panic shows up just at the > > beginning, after rootfs is mounted and init scripts are running. When a > > userspace process is exitting, its memory resources are freed - this is the > > moment pmap_remove_pages fails due to tharanslation fault. It is the > > "typical" crash I observed when TLB-cache holds an old entry. Below there > > is a backtrace, but I doubt if it can be helpful. > > > > Regarding old pte/tlb, the TLB cache contains entry from old process > > context, when in-memory-PTE value is already correct - at least this was > > the scenario when I debugged it last year. So, invalidating after *pte=0 is > > definitely not our case. The issue shows up only on a15, where the > > tlb-prefetcher can cache pte entries anytime. > > > > I believe I don't have r263251 integrated. I'll give it a try - typically, > > the tlb-caused crash appears only on pages containing shared libraries code > > (with executable attr), so there is a chance Olivier's fix help. > > > > > > The fault: > > > > vm_fault(0xc5b894f0, 0, 2, 0) -> 1 > > Fatal kernel mode data abort: 'Translation Fault (P)' > > trapframe: 0xef2cca40 > > FSR=00000817, FAR=00000030, spsr=60000013 > > r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074 > > r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac > > r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0 > > r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60 > > > > [ thread pid 83 tid 100050 ] > > Stopped at pmap_remove_pages+0x270: streq r3, [r0, #0x030] > > db> bt > > Tracing pid 83 tid 100050 td 0xc5bc4320 > > db_trace_self() at db_trace_self > > pc = 0xc061f62c lr = 0xc024ddbc (db_hex2dec+0x498) > > sp = 0xef2cc738 fp = 0xef2cc750 > > r10 = 0xc0708270 > > db_hex2dec() at db_hex2dec+0x498 > > pc = 0xc024ddbc lr = 0xc024d76c (db_command_loop+0x2f0) > > sp = 0xef2cc758 fp = 0xef2cc7f8 > > r4 = 0x00000000 r5 = 0x00000000 > > r6 = 0xc0695cf1 > > db_command_loop() at db_command_loop+0x2f0 > > pc = 0xc024d76c lr = 0xc024d4dc (db_command_loop+0x60) > > sp = 0xef2cc800 fp = 0xef2cc810 > > r4 = 0xc0666f88 r5 = 0xc067b997 > > r6 = 0xc0752954 r7 = 0xc0748f80 > > r8 = 0xef2cca40 r9 = 0xc07084e0 > > r10 = 0xc0748f84 > > db_command_loop() at db_command_loop+0x60 > > pc = 0xc024d4dc lr = 0xc024ffb8 (X_db_symbol_values+0x254) > > sp = 0xef2cc818 fp = 0xef2cc938 > > r4 = 0x00000000 r5 = 0xef2cc820 > > r6 = 0xc0748fb0 > > X_db_symbol_values() at X_db_symbol_values+0x254 > > pc = 0xc024ffb8 lr = 0xc0430554 (kdb_trap+0x164) > > sp = 0xef2cc940 fp = 0xef2cc968 > > r4 = 0x00000000 r5 = 0x00000817 > > r6 = 0xc0748fb0 r7 = 0xc0748f80 > > kdb_trap() at kdb_trap+0x164 > > pc = 0xc0430554 lr = 0xc0632ef0 (data_abort_handler+0x7dc) > > sp = 0xef2cc970 fp = 0xef2cc988 > > r4 = 0xef2cca40 r5 = 0x600000d3 > > r6 = 0x00000030 r7 = 0x00000817 > > r8 = 0xc5b894f0 r9 = 0x00000001 > > r10 = 0xef2cca40 > > data_abort_handler() at data_abort_handler+0x7dc > > pc = 0xc0632ef0 lr = 0xc0632cc0 (data_abort_handler+0x5ac) > > sp = 0xef2cc990 fp = 0xef2cca38 > > r4 = 0x00000817 r5 = 0xc5bc4320 > > r6 = 0xc5a47a0c r7 = 0x00000004 > > data_abort_handler() at data_abort_handler+0x5ac > > pc = 0xc0632cc0 lr = 0xc0621214 (exception_exit) > > sp = 0xef2cca40 fp = 0xef2ccae0 > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > > r6 = 0xc5b05800 r7 = 0xc5b895ac > > r8 = 0xc320a044 r9 = 0xfffffffe > > r10 = 0xc5b895ac > > exception_exit() at exception_exit > > pc = 0xc0621214 lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48) > > sp = 0xef2cca94 fp = 0xef2ccae0 > > r0 = 0x00000000 r1 = 0xc320a048 > > r2 = 0x00000000 r3 = 0xc3208074 > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > > r6 = 0xc5b05800 r7 = 0xc5b895ac > > r8 = 0xc320a044 r9 = 0xfffffffe > > r10 = 0xc5b895ac r12 = 0x00000000 > > pmap_remove_pages() at pmap_remove_pages+0x270 > > pc = 0xc0628a60 lr = 0xc05f2d08 (vmspace_exit+0xd8) > > sp = 0xef2ccae8 fp = 0xef2ccb10 > > r4 = 0xc5b895a8 r5 = 0xc5bc4320 > > r6 = 0x00000001 r7 = 0xc5a47960 > > r8 = 0xc5b895ac r9 = 0xc5b894f0 > > r10 = 0xc0753be0 > > vmspace_exit() at vmspace_exit+0xd8 > > pc = 0xc05f2d08 lr = 0xc03a7348 (exit1+0x930) > > sp = 0xef2ccb18 fp = 0xef2ccb70 > > r4 = 0xc5a479fc r5 = 0x00000004 > > r6 = 0xc583861c r7 = 0x00000001 > > r8 = 0xc5a47960 r9 = 0xc5bc4320 > > r10 = 0xc5a47a0c > > exit1() at exit1+0x930 > > pc = 0xc03a7348 lr = 0xc03f1604 (sigexit+0x8c4) > > sp = 0xef2ccb78 fp = 0xef2ccd68 > > r4 = 0x00000002 r5 = 0xc5bc4320 > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > > r8 = 0xc5bc4320 r9 = 0xc5b7a000 > > r10 = 0x00000002 > > sigexit() at sigexit+0x8c4 > > pc = 0xc03f1604 lr = 0xc03f23a0 (postsig+0x39c) > > sp = 0xef2ccd70 fp = 0xef2cce18 > > r4 = 0x00000001 r5 = 0xc5bc4320 > > r6 = 0xc5a47960 r7 = 0xc5b7aab8 > > r8 = 0xc5a47a0c r9 = 0xc5b7a000 > > r10 = 0x00000002 > > postsig() at postsig+0x39c > > pc = 0xc03f23a0 lr = 0xc044388c (ast+0x4f4) > > sp = 0xef2cce20 fp = 0xef2cce58 > > r4 = 0x00000001 r5 = 0xc5bc4320 > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > > r8 = 0xc5a47a0c r9 = 0x01020804 > > r10 = 0x00000ab8 > > ast() at ast+0x4f4 > > pc = 0xc044388c lr = 0xc0621080 (swi_entry+0x6c) > > sp = 0xef2cce60 fp = 0xbfffe438 > > r4 = 0x40000013 r5 = 0xc5bc4320 > > r6 = 0x00000001 r7 = 0x00000154 > > r8 = 0x20037008 r9 = 0xbfffee5c > > r10 = 0xbfffea10 > > swi_entry() at swi_entry+0x6c > > pc = 0xc0621080 lr = 0xc0621080 (swi_entry+0x6c) > > sp = 0xef2cce60 fp = 0xbfffe438 > > Unable to unwind further > > db> > > > > > > > > 2014-03-22 14:20 GMT+01:00 Ian Lepore <ian@freebsd.org>: > > > > On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote: > >> > No, changing flushD to flushID did not make any difference, but I think > >> it > >> > should be there - D-only flushing might not be sufficient. > >> > > >> > >> Olivier reminded me right after I posted that: last week I made a change > >> to cpufunc.c that makes flushD and flushID the same. So of course it > >> made no difference. :) It really should be flushID though, in case > >> that ever changes. > >> > >> You didn't say whether you have that change, which was r263251. > >> > >> > Currently, I'm running pmap_kernel_internal attached below. It is doing > >> > unconditional flushID at the end, just like the old comment was saying > >> :) > >> > SMP seems to be stable. > >> > > >> > >> That seems to say that somehow there is a valid TLB entry even though > >> the old pte for that entry is zero. That means there's a problem > >> somewhere else in the code, but I don't see it. It looks to me like we > >> do a TLB flush everywhere that we zero out a pte. > >> > >> You said without the unconditional flush it panics at startup. Where in > >> startup? Early, or after init is launched or what? Where does the > >> panic backtrace to? > >> > >> If we've got some other pte/tlb maintenance problem, I'd hate to hide it > >> with this unconditional flush and have it appear as some other problem > >> later that will be even harder to track down. > >> > >> -- Ian > >> > >> > >> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1395754555.81853.65.camel>