Date: Fri, 28 Mar 2014 17:12:43 -0600 From: Ian Lepore <ian@FreeBSD.org> To: Wojciech Macek <wma@semihalf.com> Cc: freebsd-arm@FreeBSD.org Subject: Re: arm SMP on Cortex-A15 Message-ID: <1396048363.81853.177.camel@revolution.hippie.lan> In-Reply-To: <CANsEV8dmTZbWfNWDh77-CWOdVSCZfOucZLEs%2BdXvZQbb8PGbgQ@mail.gmail.com> References: <CANsEV8euHTsfviiCMP_aet3qYiK2T-oK%2B-37eay7zAPH2S2vaA@mail.gmail.com> <20131220125638.GA5132@mail.bsdpad.com> <20131222092913.GA89153@mail.bsdpad.com> <CANsEV8fSoygoSUyQqKoEQ7tRxjqDOwrPD8dU7O2V2PXRj35j4A@mail.gmail.com> <20131222123636.GA61193@ci0.org> <CANsEV8fWvUkFHi8DP6Nr807RwPDB1iZrO39fpfa44qOkJPidZA@mail.gmail.com> <1395149146.1149.586.camel@revolution.hippie.lan> <1395254911.80941.9.camel@revolution.hippie.lan> <CANsEV8c047SNF61EgP6AiMR2oY=ofcMuTWYZnd60bRmp2Sk9HA@mail.gmail.com> <1395320561.80941.13.camel@revolution.hippie.lan> <CANsEV8f-Cyte-TO%2BCfWVcC_zw5dkYJT8Qfi92eH7yrfhqBvgjg@mail.gmail.com> <1395494401.81853.34.camel@revolution.hippie.lan> <CANsEV8eB1SQHjKrBbG=HCZyb6wuwZmmpq6BKfVatOHm8UtZ9ig@mail.gmail.com> <CANsEV8fJaBmSD7i01BoG8%2BKDPg0ZD5mdHkpLeyyYNB-8sSse_w@mail.gmail.com> <1395754555.81853.65.camel@revolution.hippie.lan> <CANsEV8fGAuRvGYf5cNLd8YuMFwbaGra6DLmv5_S2J1sgj9eNnQ@mail.gmail.com> <CANsEV8dmTZbWfNWDh77-CWOdVSCZfOucZLEs%2BdXvZQbb8PGbgQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I finally got around to looking at your latest patch. Some comments... I think the changes to use ID flush in the following functions can't possibly be needed, because the memory in question is page table entries or other things that cannot be executable: pmap_l2ptp_ctor(), pmap_grow_map(), pmap_growkernel(), pmap_copy_page_generic(), pmap_copy_pages(). The change in vector_page_setprot() is probably unnecessary, but harmless. There is no need for the L2_S_EXECUTABLE() check -- we just forced it to be executable on the previous line. In pmap_postinit() I'm pretty sure an ID flush isn't needed, but harmless since it's only called once at startup In general, since flushD and flushID are the same code none of these changes is bad right now. There may come a day when we want to remove the BTB flush in the the flushD case for performance, and having a bunch of routines calling flushID when flushD is all that's really needed will mean that someone will have to do all this analysis again to actually get any performance benefit. -- Ian On Thu, 2014-03-27 at 14:27 +0100, Wojciech Macek wrote: > Sorry, I meant > https://drive.google.com/file/d/0B-7yTLrPxaWtRjJvTzJZY3g5akU/edit?usp=sharing > > W. > > > 2014-03-27 14:10 GMT+01:00 Wojciech Macek <wma@semihalf.com>: > > > I started cleaning up and have some concerns. Sorry guys, the r263251 > > still looks like a workaround to me :) > > The fact that TLB cache issue is seen only on executable pages and that > > Olivier's fix helps suggests, that we are missing flush-I somewhere (crash > > appears only when deallocating userspace vm, so my guess is pmap_kremove, > > where definitely should be flushID). For test purposes I reverted r263251 > > and added additional flushID/flushD logic to the pmap-v6 in > > performance-critical sections, or modified flushD to flushID where the > > speed is not the case. All changes integrated with previous patches can be > > found here: > > > > https://drive.google.com/file/d/0B-7yTLrPxaWtRFRHTUtpZ3VKbG8/edit?usp=sharing > > With above patch I'm able to run all tests without any problems. > > > > Maybe it would be better to fix these suspicious places in pmap instead? > > > > Regards, > > Wojtek > > > > > > 2014-03-25 14:35 GMT+01:00 Ian Lepore <ian@freebsd.org>: > > > > Good, that means we can stop searching for the missing tlb flush when a > >> pte is invalidated. :) You guys can commit whenever you're ready (or I > >> can do it if you'd like). > >> > >> -- Ian > >> > >> On Tue, 2014-03-25 at 11:14 +0100, Wojciech Macek wrote: > >> > Hi Ian, > >> > > >> > The r263251 fix helped, thanks! > >> > So now, if you don';t have any objections, I will clean up a little the > >> > cpufunc.S & pmap-v6 changes and make them ready for submitting. > >> > > >> > Regards, > >> > Wojtek > >> > > >> > > >> > 2014-03-24 14:31 GMT+01:00 Wojciech Macek <wma@semihalf.com>: > >> > > >> > > Without the unconditional invalidation, the panic shows up just at the > >> > > beginning, after rootfs is mounted and init scripts are running. When > >> a > >> > > userspace process is exitting, its memory resources are freed - this > >> is the > >> > > moment pmap_remove_pages fails due to tharanslation fault. It is the > >> > > "typical" crash I observed when TLB-cache holds an old entry. Below > >> there > >> > > is a backtrace, but I doubt if it can be helpful. > >> > > > >> > > Regarding old pte/tlb, the TLB cache contains entry from old process > >> > > context, when in-memory-PTE value is already correct - at least this > >> was > >> > > the scenario when I debugged it last year. So, invalidating after > >> *pte=0 is > >> > > definitely not our case. The issue shows up only on a15, where the > >> > > tlb-prefetcher can cache pte entries anytime. > >> > > > >> > > I believe I don't have r263251 integrated. I'll give it a try - > >> typically, > >> > > the tlb-caused crash appears only on pages containing shared > >> libraries code > >> > > (with executable attr), so there is a chance Olivier's fix help. > >> > > > >> > > > >> > > The fault: > >> > > > >> > > vm_fault(0xc5b894f0, 0, 2, 0) -> 1 > >> > > Fatal kernel mode data abort: 'Translation Fault (P)' > >> > > trapframe: 0xef2cca40 > >> > > FSR=00000817, FAR=00000030, spsr=60000013 > >> > > r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074 > >> > > r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac > >> > > r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0 > >> > > r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60 > >> > > > >> > > [ thread pid 83 tid 100050 ] > >> > > Stopped at pmap_remove_pages+0x270: streq r3, [r0, > >> #0x030] > >> > > db> bt > >> > > Tracing pid 83 tid 100050 td 0xc5bc4320 > >> > > db_trace_self() at db_trace_self > >> > > pc = 0xc061f62c lr = 0xc024ddbc (db_hex2dec+0x498) > >> > > sp = 0xef2cc738 fp = 0xef2cc750 > >> > > r10 = 0xc0708270 > >> > > db_hex2dec() at db_hex2dec+0x498 > >> > > pc = 0xc024ddbc lr = 0xc024d76c (db_command_loop+0x2f0) > >> > > sp = 0xef2cc758 fp = 0xef2cc7f8 > >> > > r4 = 0x00000000 r5 = 0x00000000 > >> > > r6 = 0xc0695cf1 > >> > > db_command_loop() at db_command_loop+0x2f0 > >> > > pc = 0xc024d76c lr = 0xc024d4dc (db_command_loop+0x60) > >> > > sp = 0xef2cc800 fp = 0xef2cc810 > >> > > r4 = 0xc0666f88 r5 = 0xc067b997 > >> > > r6 = 0xc0752954 r7 = 0xc0748f80 > >> > > r8 = 0xef2cca40 r9 = 0xc07084e0 > >> > > r10 = 0xc0748f84 > >> > > db_command_loop() at db_command_loop+0x60 > >> > > pc = 0xc024d4dc lr = 0xc024ffb8 (X_db_symbol_values+0x254) > >> > > sp = 0xef2cc818 fp = 0xef2cc938 > >> > > r4 = 0x00000000 r5 = 0xef2cc820 > >> > > r6 = 0xc0748fb0 > >> > > X_db_symbol_values() at X_db_symbol_values+0x254 > >> > > pc = 0xc024ffb8 lr = 0xc0430554 (kdb_trap+0x164) > >> > > sp = 0xef2cc940 fp = 0xef2cc968 > >> > > r4 = 0x00000000 r5 = 0x00000817 > >> > > r6 = 0xc0748fb0 r7 = 0xc0748f80 > >> > > kdb_trap() at kdb_trap+0x164 > >> > > pc = 0xc0430554 lr = 0xc0632ef0 (data_abort_handler+0x7dc) > >> > > sp = 0xef2cc970 fp = 0xef2cc988 > >> > > r4 = 0xef2cca40 r5 = 0x600000d3 > >> > > r6 = 0x00000030 r7 = 0x00000817 > >> > > r8 = 0xc5b894f0 r9 = 0x00000001 > >> > > r10 = 0xef2cca40 > >> > > data_abort_handler() at data_abort_handler+0x7dc > >> > > pc = 0xc0632ef0 lr = 0xc0632cc0 (data_abort_handler+0x5ac) > >> > > sp = 0xef2cc990 fp = 0xef2cca38 > >> > > r4 = 0x00000817 r5 = 0xc5bc4320 > >> > > r6 = 0xc5a47a0c r7 = 0x00000004 > >> > > data_abort_handler() at data_abort_handler+0x5ac > >> > > pc = 0xc0632cc0 lr = 0xc0621214 (exception_exit) > >> > > sp = 0xef2cca40 fp = 0xef2ccae0 > >> > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > >> > > r6 = 0xc5b05800 r7 = 0xc5b895ac > >> > > r8 = 0xc320a044 r9 = 0xfffffffe > >> > > r10 = 0xc5b895ac > >> > > exception_exit() at exception_exit > >> > > pc = 0xc0621214 lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48) > >> > > sp = 0xef2cca94 fp = 0xef2ccae0 > >> > > r0 = 0x00000000 r1 = 0xc320a048 > >> > > r2 = 0x00000000 r3 = 0xc3208074 > >> > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > >> > > r6 = 0xc5b05800 r7 = 0xc5b895ac > >> > > r8 = 0xc320a044 r9 = 0xfffffffe > >> > > r10 = 0xc5b895ac r12 = 0x00000000 > >> > > pmap_remove_pages() at pmap_remove_pages+0x270 > >> > > pc = 0xc0628a60 lr = 0xc05f2d08 (vmspace_exit+0xd8) > >> > > sp = 0xef2ccae8 fp = 0xef2ccb10 > >> > > r4 = 0xc5b895a8 r5 = 0xc5bc4320 > >> > > r6 = 0x00000001 r7 = 0xc5a47960 > >> > > r8 = 0xc5b895ac r9 = 0xc5b894f0 > >> > > r10 = 0xc0753be0 > >> > > vmspace_exit() at vmspace_exit+0xd8 > >> > > pc = 0xc05f2d08 lr = 0xc03a7348 (exit1+0x930) > >> > > sp = 0xef2ccb18 fp = 0xef2ccb70 > >> > > r4 = 0xc5a479fc r5 = 0x00000004 > >> > > r6 = 0xc583861c r7 = 0x00000001 > >> > > r8 = 0xc5a47960 r9 = 0xc5bc4320 > >> > > r10 = 0xc5a47a0c > >> > > exit1() at exit1+0x930 > >> > > pc = 0xc03a7348 lr = 0xc03f1604 (sigexit+0x8c4) > >> > > sp = 0xef2ccb78 fp = 0xef2ccd68 > >> > > r4 = 0x00000002 r5 = 0xc5bc4320 > >> > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > >> > > r8 = 0xc5bc4320 r9 = 0xc5b7a000 > >> > > r10 = 0x00000002 > >> > > sigexit() at sigexit+0x8c4 > >> > > pc = 0xc03f1604 lr = 0xc03f23a0 (postsig+0x39c) > >> > > sp = 0xef2ccd70 fp = 0xef2cce18 > >> > > r4 = 0x00000001 r5 = 0xc5bc4320 > >> > > r6 = 0xc5a47960 r7 = 0xc5b7aab8 > >> > > r8 = 0xc5a47a0c r9 = 0xc5b7a000 > >> > > r10 = 0x00000002 > >> > > postsig() at postsig+0x39c > >> > > pc = 0xc03f23a0 lr = 0xc044388c (ast+0x4f4) > >> > > sp = 0xef2cce20 fp = 0xef2cce58 > >> > > r4 = 0x00000001 r5 = 0xc5bc4320 > >> > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > >> > > r8 = 0xc5a47a0c r9 = 0x01020804 > >> > > r10 = 0x00000ab8 > >> > > ast() at ast+0x4f4 > >> > > pc = 0xc044388c lr = 0xc0621080 (swi_entry+0x6c) > >> > > sp = 0xef2cce60 fp = 0xbfffe438 > >> > > r4 = 0x40000013 r5 = 0xc5bc4320 > >> > > r6 = 0x00000001 r7 = 0x00000154 > >> > > r8 = 0x20037008 r9 = 0xbfffee5c > >> > > r10 = 0xbfffea10 > >> > > swi_entry() at swi_entry+0x6c > >> > > pc = 0xc0621080 lr = 0xc0621080 (swi_entry+0x6c) > >> > > sp = 0xef2cce60 fp = 0xbfffe438 > >> > > Unable to unwind further > >> > > db> > >> > > > >> > > > >> > > > >> > > 2014-03-22 14:20 GMT+01:00 Ian Lepore <ian@freebsd.org>: > >> > > > >> > > On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote: > >> > >> > No, changing flushD to flushID did not make any difference, but I > >> think > >> > >> it > >> > >> > should be there - D-only flushing might not be sufficient. > >> > >> > > >> > >> > >> > >> Olivier reminded me right after I posted that: last week I made a > >> change > >> > >> to cpufunc.c that makes flushD and flushID the same. So of course it > >> > >> made no difference. :) It really should be flushID though, in case > >> > >> that ever changes. > >> > >> > >> > >> You didn't say whether you have that change, which was r263251. > >> > >> > >> > >> > Currently, I'm running pmap_kernel_internal attached below. It is > >> doing > >> > >> > unconditional flushID at the end, just like the old comment was > >> saying > >> > >> :) > >> > >> > SMP seems to be stable. > >> > >> > > >> > >> > >> > >> That seems to say that somehow there is a valid TLB entry even though > >> > >> the old pte for that entry is zero. That means there's a problem > >> > >> somewhere else in the code, but I don't see it. It looks to me like > >> we > >> > >> do a TLB flush everywhere that we zero out a pte. > >> > >> > >> > >> You said without the unconditional flush it panics at startup. > >> Where in > >> > >> startup? Early, or after init is launched or what? Where does the > >> > >> panic backtrace to? > >> > >> > >> > >> If we've got some other pte/tlb maintenance problem, I'd hate to > >> hide it > >> > >> with this unconditional flush and have it appear as some other > >> problem > >> > >> later that will be even harder to track down. > >> > >> > >> > >> -- Ian > >> > >> > >> > >> > >> > >> > >> > > > >> > >> > >> > > > _______________________________________________ > freebsd-arm@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1396048363.81853.177.camel>