From owner-freebsd-arm@FreeBSD.ORG Mon Mar 31 07:57:04 2014 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6AE0853D for ; Mon, 31 Mar 2014 07:57:04 +0000 (UTC) Received: from mail-vc0-f170.google.com (mail-vc0-f170.google.com [209.85.220.170]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 19DB683A for ; Mon, 31 Mar 2014 07:57:03 +0000 (UTC) Received: by mail-vc0-f170.google.com with SMTP id hu19so8171212vcb.29 for ; Mon, 31 Mar 2014 00:57:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=2oae+dvcCNvQ1nXPBiq/syczK52ALBfVSVEyQ20UJvI=; b=GgUvnjMjjmbhkK9l7dh5M0XJEU3O4DZi0ZSgsBKytFNMfZe2uadPPPA2MyjHoZklnA COHTRLVnDApCN7fBWxRTc+/Tja6jiQDrXwHvrIUai0bnggpnpbU2SxkrcQecFbNkzohB fip/1zt6GTE2sk6W/c4Y11wGuoAco8jaTy8CAZ4i6oHlFvvC2OEfnipPkCHDs6X0vl/k 8DF6W4aOHxipM3coHMtiBmyZ+MicO4oEnMQFVeZOlvDJMdPuQMR+OXM5umUy+q0/te5C X2bt9oNyPClmX6blcYCUMCLw8jzkvUm4a1iC5gbev6Dc/ObxbVDqI+1JWYbavbtiIrZE a7tA== X-Gm-Message-State: ALoCoQlKBb8J7hNnqFKelPkxqehMdOy1i/T/ElVyAuFg9nH1Z3s0PyMFxV+qxmmtQN/ox4uW8jDF MIME-Version: 1.0 X-Received: by 10.58.23.6 with SMTP id i6mr21388262vef.12.1396250799541; Mon, 31 Mar 2014 00:26:39 -0700 (PDT) Received: by 10.220.209.135 with HTTP; Mon, 31 Mar 2014 00:26:39 -0700 (PDT) In-Reply-To: <1396048363.81853.177.camel@revolution.hippie.lan> References: <20131220125638.GA5132@mail.bsdpad.com> <20131222092913.GA89153@mail.bsdpad.com> <20131222123636.GA61193@ci0.org> <1395149146.1149.586.camel@revolution.hippie.lan> <1395254911.80941.9.camel@revolution.hippie.lan> <1395320561.80941.13.camel@revolution.hippie.lan> <1395494401.81853.34.camel@revolution.hippie.lan> <1395754555.81853.65.camel@revolution.hippie.lan> <1396048363.81853.177.camel@revolution.hippie.lan> Date: Mon, 31 Mar 2014 09:26:39 +0200 Message-ID: Subject: Re: arm SMP on Cortex-A15 From: Wojciech Macek To: Ian Lepore Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: freebsd-arm@freebsd.org X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2014 07:57:04 -0000 Sorry for late response. The word "workaround" might have been a little misleading, sorry. Yes, I agree that it is necessary step for armv7-mp, but I was afraid that by accident it can mask code errors. I just want to make sure if what I see in pmap is caused by any missing tlb-inv or not. The Olivier's change stopped the issue reproducing, so that's the reason I reverted it for a while. Well, each panic is an opportunity to make the code working better :) Also, that pmap-v6.c runs on platforms with separate D/I TBL flushing, and that's the reason why I'd like to spend some time debugging. Thanks a lot for comments. I remove the unnecessary lines and only leave changes in pmap_kenter_internal and pmap_kremove, where pages can be executable - it does not change anything for armv7, but might help on R-Pi. I try to close it up this week. Thanks, Wojtek 2014-03-29 0:12 GMT+01:00 Ian Lepore : > I finally got around to looking at your latest patch. Some comments... > > I think the changes to use ID flush in the following functions can't > possibly be needed, because the memory in question is page table entries > or other things that cannot be executable: pmap_l2ptp_ctor(), > pmap_grow_map(), pmap_growkernel(), pmap_copy_page_generic(), > pmap_copy_pages(). > > The change in vector_page_setprot() is probably unnecessary, but > harmless. There is no need for the L2_S_EXECUTABLE() check -- we just > forced it to be executable on the previous line. > > In pmap_postinit() I'm pretty sure an ID flush isn't needed, but > harmless since it's only called once at startup > > In general, since flushD and flushID are the same code none of these > changes is bad right now. There may come a day when we want to remove > the BTB flush in the the flushD case for performance, and having a bunch > of routines calling flushID when flushD is all that's really needed will > mean that someone will have to do all this analysis again to actually > get any performance benefit. > > -- Ian > > On Thu, 2014-03-27 at 14:27 +0100, Wojciech Macek wrote: > > Sorry, I meant > > > https://drive.google.com/file/d/0B-7yTLrPxaWtRjJvTzJZY3g5akU/edit?usp=sharing > > > > W. > > > > > > 2014-03-27 14:10 GMT+01:00 Wojciech Macek : > > > > > I started cleaning up and have some concerns. Sorry guys, the r263251 > > > still looks like a workaround to me :) > > > The fact that TLB cache issue is seen only on executable pages and that > > > Olivier's fix helps suggests, that we are missing flush-I somewhere > (crash > > > appears only when deallocating userspace vm, so my guess is > pmap_kremove, > > > where definitely should be flushID). For test purposes I reverted > r263251 > > > and added additional flushID/flushD logic to the pmap-v6 in > > > performance-critical sections, or modified flushD to flushID where the > > > speed is not the case. All changes integrated with previous patches > can be > > > found here: > > > > > > > https://drive.google.com/file/d/0B-7yTLrPxaWtRFRHTUtpZ3VKbG8/edit?usp=sharing > > > With above patch I'm able to run all tests without any problems. > > > > > > Maybe it would be better to fix these suspicious places in pmap > instead? > > > > > > Regards, > > > Wojtek > > > > > > > > > 2014-03-25 14:35 GMT+01:00 Ian Lepore : > > > > > > Good, that means we can stop searching for the missing tlb flush when a > > >> pte is invalidated. :) You guys can commit whenever you're ready (or > I > > >> can do it if you'd like). > > >> > > >> -- Ian > > >> > > >> On Tue, 2014-03-25 at 11:14 +0100, Wojciech Macek wrote: > > >> > Hi Ian, > > >> > > > >> > The r263251 fix helped, thanks! > > >> > So now, if you don';t have any objections, I will clean up a little > the > > >> > cpufunc.S & pmap-v6 changes and make them ready for submitting. > > >> > > > >> > Regards, > > >> > Wojtek > > >> > > > >> > > > >> > 2014-03-24 14:31 GMT+01:00 Wojciech Macek : > > >> > > > >> > > Without the unconditional invalidation, the panic shows up just > at the > > >> > > beginning, after rootfs is mounted and init scripts are running. > When > > >> a > > >> > > userspace process is exitting, its memory resources are freed - > this > > >> is the > > >> > > moment pmap_remove_pages fails due to tharanslation fault. It is > the > > >> > > "typical" crash I observed when TLB-cache holds an old entry. > Below > > >> there > > >> > > is a backtrace, but I doubt if it can be helpful. > > >> > > > > >> > > Regarding old pte/tlb, the TLB cache contains entry from old > process > > >> > > context, when in-memory-PTE value is already correct - at least > this > > >> was > > >> > > the scenario when I debugged it last year. So, invalidating after > > >> *pte=0 is > > >> > > definitely not our case. The issue shows up only on a15, where the > > >> > > tlb-prefetcher can cache pte entries anytime. > > >> > > > > >> > > I believe I don't have r263251 integrated. I'll give it a try - > > >> typically, > > >> > > the tlb-caused crash appears only on pages containing shared > > >> libraries code > > >> > > (with executable attr), so there is a chance Olivier's fix help. > > >> > > > > >> > > > > >> > > The fault: > > >> > > > > >> > > vm_fault(0xc5b894f0, 0, 2, 0) -> 1 > > >> > > Fatal kernel mode data abort: 'Translation Fault (P)' > > >> > > trapframe: 0xef2cca40 > > >> > > FSR=00000817, FAR=00000030, spsr=60000013 > > >> > > r0 =00000000, r1 =c320a048, r2 =00000000, r3 =c3208074 > > >> > > r4 =c5b7cd08, r5 =c5b7cd04, r6 =c5b05800, r7 =c5b895ac > > >> > > r8 =c320a044, r9 =fffffffe, r10=c5b895ac, r11=ef2ccae0 > > >> > > r12=00000000, ssp=ef2cca90, slr=c0604148, pc =c0628a60 > > >> > > > > >> > > [ thread pid 83 tid 100050 ] > > >> > > Stopped at pmap_remove_pages+0x270: streq r3, [r0, > > >> #0x030] > > >> > > db> bt > > >> > > Tracing pid 83 tid 100050 td 0xc5bc4320 > > >> > > db_trace_self() at db_trace_self > > >> > > pc = 0xc061f62c lr = 0xc024ddbc (db_hex2dec+0x498) > > >> > > sp = 0xef2cc738 fp = 0xef2cc750 > > >> > > r10 = 0xc0708270 > > >> > > db_hex2dec() at db_hex2dec+0x498 > > >> > > pc = 0xc024ddbc lr = 0xc024d76c (db_command_loop+0x2f0) > > >> > > sp = 0xef2cc758 fp = 0xef2cc7f8 > > >> > > r4 = 0x00000000 r5 = 0x00000000 > > >> > > r6 = 0xc0695cf1 > > >> > > db_command_loop() at db_command_loop+0x2f0 > > >> > > pc = 0xc024d76c lr = 0xc024d4dc (db_command_loop+0x60) > > >> > > sp = 0xef2cc800 fp = 0xef2cc810 > > >> > > r4 = 0xc0666f88 r5 = 0xc067b997 > > >> > > r6 = 0xc0752954 r7 = 0xc0748f80 > > >> > > r8 = 0xef2cca40 r9 = 0xc07084e0 > > >> > > r10 = 0xc0748f84 > > >> > > db_command_loop() at db_command_loop+0x60 > > >> > > pc = 0xc024d4dc lr = 0xc024ffb8 > (X_db_symbol_values+0x254) > > >> > > sp = 0xef2cc818 fp = 0xef2cc938 > > >> > > r4 = 0x00000000 r5 = 0xef2cc820 > > >> > > r6 = 0xc0748fb0 > > >> > > X_db_symbol_values() at X_db_symbol_values+0x254 > > >> > > pc = 0xc024ffb8 lr = 0xc0430554 (kdb_trap+0x164) > > >> > > sp = 0xef2cc940 fp = 0xef2cc968 > > >> > > r4 = 0x00000000 r5 = 0x00000817 > > >> > > r6 = 0xc0748fb0 r7 = 0xc0748f80 > > >> > > kdb_trap() at kdb_trap+0x164 > > >> > > pc = 0xc0430554 lr = 0xc0632ef0 > (data_abort_handler+0x7dc) > > >> > > sp = 0xef2cc970 fp = 0xef2cc988 > > >> > > r4 = 0xef2cca40 r5 = 0x600000d3 > > >> > > r6 = 0x00000030 r7 = 0x00000817 > > >> > > r8 = 0xc5b894f0 r9 = 0x00000001 > > >> > > r10 = 0xef2cca40 > > >> > > data_abort_handler() at data_abort_handler+0x7dc > > >> > > pc = 0xc0632ef0 lr = 0xc0632cc0 > (data_abort_handler+0x5ac) > > >> > > sp = 0xef2cc990 fp = 0xef2cca38 > > >> > > r4 = 0x00000817 r5 = 0xc5bc4320 > > >> > > r6 = 0xc5a47a0c r7 = 0x00000004 > > >> > > data_abort_handler() at data_abort_handler+0x5ac > > >> > > pc = 0xc0632cc0 lr = 0xc0621214 (exception_exit) > > >> > > sp = 0xef2cca40 fp = 0xef2ccae0 > > >> > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > > >> > > r6 = 0xc5b05800 r7 = 0xc5b895ac > > >> > > r8 = 0xc320a044 r9 = 0xfffffffe > > >> > > r10 = 0xc5b895ac > > >> > > exception_exit() at exception_exit > > >> > > pc = 0xc0621214 lr = 0xc0604148 (PHYS_TO_VM_PAGE+0x48) > > >> > > sp = 0xef2cca94 fp = 0xef2ccae0 > > >> > > r0 = 0x00000000 r1 = 0xc320a048 > > >> > > r2 = 0x00000000 r3 = 0xc3208074 > > >> > > r4 = 0xc5b7cd08 r5 = 0xc5b7cd04 > > >> > > r6 = 0xc5b05800 r7 = 0xc5b895ac > > >> > > r8 = 0xc320a044 r9 = 0xfffffffe > > >> > > r10 = 0xc5b895ac r12 = 0x00000000 > > >> > > pmap_remove_pages() at pmap_remove_pages+0x270 > > >> > > pc = 0xc0628a60 lr = 0xc05f2d08 (vmspace_exit+0xd8) > > >> > > sp = 0xef2ccae8 fp = 0xef2ccb10 > > >> > > r4 = 0xc5b895a8 r5 = 0xc5bc4320 > > >> > > r6 = 0x00000001 r7 = 0xc5a47960 > > >> > > r8 = 0xc5b895ac r9 = 0xc5b894f0 > > >> > > r10 = 0xc0753be0 > > >> > > vmspace_exit() at vmspace_exit+0xd8 > > >> > > pc = 0xc05f2d08 lr = 0xc03a7348 (exit1+0x930) > > >> > > sp = 0xef2ccb18 fp = 0xef2ccb70 > > >> > > r4 = 0xc5a479fc r5 = 0x00000004 > > >> > > r6 = 0xc583861c r7 = 0x00000001 > > >> > > r8 = 0xc5a47960 r9 = 0xc5bc4320 > > >> > > r10 = 0xc5a47a0c > > >> > > exit1() at exit1+0x930 > > >> > > pc = 0xc03a7348 lr = 0xc03f1604 (sigexit+0x8c4) > > >> > > sp = 0xef2ccb78 fp = 0xef2ccd68 > > >> > > r4 = 0x00000002 r5 = 0xc5bc4320 > > >> > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > > >> > > r8 = 0xc5bc4320 r9 = 0xc5b7a000 > > >> > > r10 = 0x00000002 > > >> > > sigexit() at sigexit+0x8c4 > > >> > > pc = 0xc03f1604 lr = 0xc03f23a0 (postsig+0x39c) > > >> > > sp = 0xef2ccd70 fp = 0xef2cce18 > > >> > > r4 = 0x00000001 r5 = 0xc5bc4320 > > >> > > r6 = 0xc5a47960 r7 = 0xc5b7aab8 > > >> > > r8 = 0xc5a47a0c r9 = 0xc5b7a000 > > >> > > r10 = 0x00000002 > > >> > > postsig() at postsig+0x39c > > >> > > pc = 0xc03f23a0 lr = 0xc044388c (ast+0x4f4) > > >> > > sp = 0xef2cce20 fp = 0xef2cce58 > > >> > > r4 = 0x00000001 r5 = 0xc5bc4320 > > >> > > r6 = 0xc5a47960 r7 = 0xc5a47a0c > > >> > > r8 = 0xc5a47a0c r9 = 0x01020804 > > >> > > r10 = 0x00000ab8 > > >> > > ast() at ast+0x4f4 > > >> > > pc = 0xc044388c lr = 0xc0621080 (swi_entry+0x6c) > > >> > > sp = 0xef2cce60 fp = 0xbfffe438 > > >> > > r4 = 0x40000013 r5 = 0xc5bc4320 > > >> > > r6 = 0x00000001 r7 = 0x00000154 > > >> > > r8 = 0x20037008 r9 = 0xbfffee5c > > >> > > r10 = 0xbfffea10 > > >> > > swi_entry() at swi_entry+0x6c > > >> > > pc = 0xc0621080 lr = 0xc0621080 (swi_entry+0x6c) > > >> > > sp = 0xef2cce60 fp = 0xbfffe438 > > >> > > Unable to unwind further > > >> > > db> > > >> > > > > >> > > > > >> > > > > >> > > 2014-03-22 14:20 GMT+01:00 Ian Lepore : > > >> > > > > >> > > On Fri, 2014-03-21 at 07:20 +0100, Wojciech Macek wrote: > > >> > >> > No, changing flushD to flushID did not make any difference, > but I > > >> think > > >> > >> it > > >> > >> > should be there - D-only flushing might not be sufficient. > > >> > >> > > > >> > >> > > >> > >> Olivier reminded me right after I posted that: last week I made a > > >> change > > >> > >> to cpufunc.c that makes flushD and flushID the same. So of > course it > > >> > >> made no difference. :) It really should be flushID though, in > case > > >> > >> that ever changes. > > >> > >> > > >> > >> You didn't say whether you have that change, which was r263251. > > >> > >> > > >> > >> > Currently, I'm running pmap_kernel_internal attached below. It > is > > >> doing > > >> > >> > unconditional flushID at the end, just like the old comment was > > >> saying > > >> > >> :) > > >> > >> > SMP seems to be stable. > > >> > >> > > > >> > >> > > >> > >> That seems to say that somehow there is a valid TLB entry even > though > > >> > >> the old pte for that entry is zero. That means there's a problem > > >> > >> somewhere else in the code, but I don't see it. It looks to me > like > > >> we > > >> > >> do a TLB flush everywhere that we zero out a pte. > > >> > >> > > >> > >> You said without the unconditional flush it panics at startup. > > >> Where in > > >> > >> startup? Early, or after init is launched or what? Where does > the > > >> > >> panic backtrace to? > > >> > >> > > >> > >> If we've got some other pte/tlb maintenance problem, I'd hate to > > >> hide it > > >> > >> with this unconditional flush and have it appear as some other > > >> problem > > >> > >> later that will be even harder to track down. > > >> > >> > > >> > >> -- Ian > > >> > >> > > >> > >> > > >> > >> > > >> > > > > >> > > >> > > >> > > > > > _______________________________________________ > > freebsd-arm@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-arm > > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" > > >