From owner-freebsd-hackers@freebsd.org Sun Apr 9 01:02:08 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8B0BD337B5 for ; Sun, 9 Apr 2017 01:02:08 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-8.reflexion.net [208.70.210.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9F43DC29 for ; Sun, 9 Apr 2017 01:02:07 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 13912 invoked from network); 9 Apr 2017 01:03:01 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 9 Apr 2017 01:03:01 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 08 Apr 2017 21:02:01 -0400 (EDT) Received: (qmail 10694 invoked from network); 9 Apr 2017 01:02:01 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Apr 2017 01:02:01 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id BEB76EC8172; Sat, 8 Apr 2017 18:02:00 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> Date: Sat, 8 Apr 2017 18:02:00 -0700 Cc: andrew@freebsd.org, Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> To: freebsd-arm , freebsd-hackers@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 01:02:08 -0000 [I've identified the code path involved is the arm64 small allocations turning into zeros for later fork-then-swapout-then-back-in, specifically the ongoing RES(ident memory) size decrease that "top -PCwaopid" shows before the fork/swap sequence. Hopefully I've also exposed enough related information for someone that knows what they are doing to get started with a specific investigation, looking for a fix. I'd like for a pine64+ 2GB to have buildworld complete despite the forking and swapping involved (yep: for a time zero RES(ident memory) for some processes involved in the build).] On 2017-Apr-7, at 1:16 AM, Mark Millard wrote: > [I now can: (A) crudely control the number of allocated > pages that get zeros (that should not). (B) Watch a > "top -PCwaopid" display and predict if the > test-architecture will fail or not before the fork() > or swap-out happens.] >=20 > On 2017-Apr-4, at 8:00 PM, Mark Millard wrote: >=20 >> Uncommenting/commenting parts of the below program allows >> exploring the problems with fork-then-swap-out-then-in on >> arm64. >>=20 >> Note: By swap-out I mean that zero RES(ident memory) results, >> for the process(s) of interest, as shown by >> "top -PCwaopid" . >>=20 >> I discovered recently that swapping-out just before the >> fork() prevents the failure from the swapping after the >> fork(). >>=20 >> Note: >> Without the fork() no problem happens. Without the later >> swap-out no problem happens. Both are required. But some >> activities before the fork() or between fork() and the >> swap-out prevent the failures. >>=20 >> Some of the comments are based on a pine64+ 2GB context. >> I use stress to force swap-outs during some sleeps in >> the program. See also Buzilla 217239 and 217138. (I now >> expect that they have the same cause.) >>=20 >> In my environment I've seen the fork-then-swap-out/swap-in >> failures on a pine64+ 2GB and a rpi3. They are repeatable >> on both. I do not have access to server-class machines, or >> any other arm64 machines. >>=20 >>=20 >> // swap_testing5.c >>=20 >> // Built via (cc was clang 4.0 in my case): >> // >> // cc -g -std=3Dc11 -Wpedantic -o swaptesting5 swap_testing5.c >> // -O0 and -O2 also gets the problem. >>=20 >> // Note: jemalloc's tcache needs to be enabled to get the failure. >> // But FreeBSD can get into a state were /etc/malloc.conf >> // -> 'tcache:false' is ineffective. Also: the allocation >> // size needs to by sufficiently small (<=3D SMALL_MAXCLASS) >> // to see the problem. Other comments are based on a specific >> // context (pine64+ 2GB). >>=20 >> #include // for raise(.), SIGABRT (induce core dump) >> #include // for fork(), sleep(.) >> #include // for pid_t >> #include // for wait(.) >>=20 >> extern void test_setup(void); // Sets up the memory byte = patterns. >> extern void test_check(void); // Tests the memory byte = patterns. >> extern void memory_willneed(void); // For seeing if >> // = posix_madvise(.,.,POSIX_MADV_WILLNEED) >> // makes a difference. >>=20 >> int main(void) { >> sleep(30); // Potentialy force swap-out here. >> // [Swap-out here does not avoid later failures.] >>=20 >> test_setup(); >> test_check(); // Before potential sleep(.)/swap-out or fork(.) = [passes] >>=20 >> sleep(30); // Potentialy force swap-out here. >> // [Everything below passes if swapped-out here, >> // no matter if there are later swap-outs >> // or not.] >>=20 >> pid_t pid =3D fork(); // To test no-fork use: =3D 0; no-fork does = not fail. >> int wait_status =3D 0; >>=20 >> // HERE: After fork; before sleep/swap-out/wait. >>=20 >> // if (0 < pid) memory_willneed(); // Does not prevent either = parent or >> // child failure if enabled. >>=20 >> // if (0 =3D=3D pid) memory_willneed(); // Prevents both the parent = and the >> // child failure. Disable to see >> // failure of both parent and = child. >> // [Presuming no prior swap-out: = that >> // would make everything pass.] >>=20 >> // During sleep/wait: manually force this process to >> // swap out. I use something like: >> // stress -m 1 --vm-bytes 1800M >> // in another shell and ^C'ing it after top shows the >> // swapped status desired. 1800M just happened to work >> // on the Pine64+ 2GB that I was using. I watch with >> // top -PCwaopid [checking for zero RES(ident memory)]. >>=20 >> if (0 < pid) { >> sleep(30); // Intend to swap-out during sleep. >> // test_check(); // Test in parent before child runs (longer = sleep). >> // This test fails if run for a failing = region_size >> // unless earlier preventing-activity happened. >> wait(&wait_status); // Only if test_check above passes or is >> // disabled above. >> } >> if (-1 !=3D wait_status && 0 <=3D pid) { >> if (0 =3D=3D pid) { sleep(90); } // Intend to swap-out during = sleep. >> test_check(); // Fails for small-enough region_size, both >> // parent and child processes, unless earlier >> // preventing-activty happened. >> } >> } >>=20 >> // The memory and test code follows. >>=20 >> #include // for size_t, NULL >> #include // for malloc(.), free(.) >> #include // for POSIX_MADV_WILLNEED, = posix_madvise(.,.,.) >>=20 >> #define region_size (14u*1024u) >> // Bad dyn_region pattern, parent and child processes examples: >> // 256u, 2u*1024u, 4u*1024u, 8u*1024u, 9u*1024u, 12u*1024u, = 14u*1024u >> // No failure examples: >> // 14u*1024u+1u, 15u*1024u, 16u*1024u, 32u*1024u, = 256u*1024u*1024u >> #define num_regions (256u*1024u*1024u/region_size) >>=20 >> typedef volatile unsigned char value_type; >> struct region_struct { value_type array[region_size]; }; >> typedef struct region_struct region; >> static region * volatile dyn_regions[num_regions] =3D {NULL,}; >>=20 >> static value_type value(size_t v) { return = (value_type)((v&0xFEu)|0x1u); } >> // value avoids zero values: the bad values are = zeros. >>=20 >> void test_setup(void) { >> for(size_t i=3D0u; i> dyn_regions[i] =3D malloc(sizeof(region)); >> if (!dyn_regions[i]) raise(SIGABRT); >>=20 >> for(size_t j=3D0u; j> (*dyn_regions[i]).array[j] =3D value(j); >> } >> } >> } >>=20 >> void memory_willneed(void) { >> for(size_t i=3D0u; i> (void) posix_madvise(dyn_regions[i], region_size, = POSIX_MADV_WILLNEED); >> } >> } >>=20 >> static volatile size_t first_failure_idx =3D 0u; // dyn_regions index >> static volatile size_t first_failure_pos =3D 0u; // sub-array index >> static volatile size_t after_bad_idx =3D 0u; // dyn_regions index >> static volatile size_t after_bad_pos =3D 0u; // sub-array index >> static volatile size_t after_good_idx =3D 0u; // dyn_regions index >> static volatile size_t after_good_pos =3D 0u; // sub-array index >>=20 >> // Note: Some failing cases get (conjunctive notation): >> // >> // 0 =3D=3D first_failure_idx < after_bad_idx < after_good_idx =3D=3D= num_regions >> // && 0 =3D=3D first_failure_pos && 0<=3Dafter_bad_pos<=3Dregion_size = && after_good_idx=3D=3D0 >> // && (after_bad_pos is a multiple of the page size in Bytes, here: >> // after_bad_pos=3D=3DN*4096 for some non-negative integral value = N) >> // >> // other failing cases instead fail with: >> // >> // 0 =3D=3D first_failure && num_regions =3D=3D after_bad_idx =3D=3D= after_good_idx >> // && 0 =3D=3D first_failure_pos =3D=3D after_bad_pos =3D=3D = after_good_idx >> // >> // after_bad_idx strongly tends to vary from failing run to failing = run >> // as does after_bad_pos. >>=20 >> // Note: The working cases get: >> // >> // num_regions =3D=3D first_failure =3D=3D after_bad_idx =3D=3D = after_good_idx >> // && 0 =3D=3D first_failure_pos =3D=3D after_bad_pos =3D=3D = after_good_idx >>=20 >> void test_check(void) { >> first_failure_idx =3D first_failure_pos =3D 0u; >>=20 >> while (first_failure_idx < num_regions) { >> while ( first_failure_pos < region_size >> && ( value(first_failure_pos) >> =3D=3D = (*dyn_regions[first_failure_idx]).array[first_failure_pos] >> ) >> ) { >> first_failure_pos++; >> } >>=20 >> if (region_size !=3D first_failure_pos) break; >>=20 >> first_failure_idx++; >> first_failure_pos =3D 0u; >> } >>=20 >> after_bad_idx =3D first_failure_idx; >> after_bad_pos =3D first_failure_pos; >>=20 >> while (after_bad_idx < num_regions) { >> while ( after_bad_pos < region_size >> && ( value(after_bad_pos) >> !=3D = (*dyn_regions[after_bad_idx]).array[after_bad_pos] >> ) >> ) { >> after_bad_pos++; >> } >>=20 >> if(region_size !=3D after_bad_pos) break; >>=20 >> after_bad_idx++; >> after_bad_pos =3D 0u; >> } >>=20 >> after_good_idx =3D after_bad_idx; >> after_good_pos =3D after_bad_pos; >>=20 >> while (after_good_idx < num_regions) { >> while ( after_good_pos < region_size >> && ( value(after_good_pos) >> =3D=3D = (*dyn_regions[after_good_idx]).array[after_good_pos] >> ) >> ) { >> after_good_pos++; >> } >>=20 >> if(region_size !=3D after_good_pos) break; >>=20 >> after_good_idx++; >> after_good_pos =3D 0u; >> } >>=20 >> if (num_regions !=3D first_failure_idx) raise(SIGABRT); >> } >=20 >=20 > I've found that for the above swap_testing5.c > I can make variations that change how much of the > allocated region prefix ends up zero vs. stays good. >=20 > I vary the sleep time between testing the initialized > allocations and doing the fork. The longer the sleep > the more zero pages show up (be sure to read the > comments): >=20 > # diff swap_testing[56].c = = 1c1 > < // swap_testing5.c > --- >> // swap_testing6.c > 5c5 > < // cc -g -std=3Dc11 -Wpedantic -o swaptesting5 swap_testing5.c > --- >> // cc -g -std=3Dc11 -Wpedantic -o swaptesting5 swap_testing6.c > 33c33 > < sleep(30); // Potentialy force swap-out here. > --- >> sleep(150); // Potentialy force swap-out here. > 37a38,48 >> // For no-swap-out here cases: >> // >> // The longer the sleep here the more allocations >> // that end up as zero. >> // >> // top's Mem Active, Inact, Wired, Bug, Free and >> // Swap Total, Used, and Free stay unchanged. >> // What does change is the process RES decreases >> // while the process SIZE and SWAP stay unchanged >> // during this sleep. >>=20 >=20 > NOTE: On other architectures that I've tried (such as armv6/v7) > RES does not decrease during the sleep --and the problem > does not happen even for as long of sleeps as I've tried. >=20 > (I use "stress -m 2 --vm-bytes 900M" on armv6/v7 instead > of -m 1 --vm-bytes 1800M because that large in one > process is not allowed.) >=20 > So watching top's RES during the sleep (longer than a few > seconds) just before the fork() predicts the later > fails-vs.-not status: If RES decreases (while other things > associated with the process status stay the same) then > there will be a failure. >=20 > At this point I've no clue why the sleeping process has > a decreasing RES(ident memory) size. >=20 > I infer that without the sleep there still is a small > amount of loss of RES but on too short of a timescale > to observe in a "top -PCwaopid" or other such: in other > words that the same behavior is causing the failure then > as well, possibly for a loss of only one page of RES. I've been able to identify what code sequence is gradually removing the "small_mappings" via some breakpointing in the kernel after reaching the "should be just sleeping" status. Specifically I started with breakpointing when pmap_resident_count_dec was on the call stack in order to see the call chain(s) that lead to it being called while RES(ident memory) is gradually decreasing during the sleep that is just before forking. (tid 100067 is [pagedaemon{pagedaemon}], which is in vm_pageout_worker. bt does not show inlined layers.) [ thread pid 17 tid 100067 ] Breakpoint at $x.1: undefined d65f03c0 db> bt Tracing pid 17 tid 100067 td 0xfffffd0001c4aa00 . . . handle_el1h_sync() at pmap_remove_l3+0xdc pc =3D 0xffff000000604870 lr =3D 0xffff000000611158 sp =3D 0xffff000083a49980 fp =3D 0xffff000083a49a40 pmap_remove_l3() at pmap_ts_referenced+0x580 pc =3D 0xffff000000611158 lr =3D 0xffff000000615c50 sp =3D 0xffff000083a49a50 fp =3D 0xffff000083a49ac0 pmap_ts_referenced() at vm_pageout+0xe60 pc =3D 0xffff000000615c50 lr =3D 0xffff0000005d1f74 sp =3D 0xffff000083a49ad0 fp =3D 0xffff000083a49b50 vm_pageout() at fork_exit+0x94 pc =3D 0xffff0000005d1f74 lr =3D 0xffff0000002e01c0 sp =3D 0xffff000083a49b60 fp =3D 0xffff000083a49b90 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002e01c0 lr =3D 0xffff0000006177b4 sp =3D 0xffff000083a49ba0 fp =3D 0x0000000000000000 It turns out that pmap_ts_referenced is on its: small_mappings: . . . path for the above so the pmap_remove_l3 call is the one from that execution path. (Found by more breakpointing after enabling such on the paths.) So this is the path with: (breakpoint hook not shown) /* * Wired pages cannot be paged out so * doing accessed bit emulation for * them is wasted effort. We do the * hard work for unwired pages only. */ pmap_remove_l3(pmap, pte, pv->pv_va, = tpde, &free, &lock); pmap_invalidate_page(pmap, pv->pv_va); cleared++; if (pvf =3D=3D pv) pvf =3D NULL; pv =3D NULL; . . . pmap_remove_l3 decrements the resident_count in this sequence. =46rom what I can tell this code is eliminating the content of pages that in the failing tests, ones with no backing store yet (not swapped-out yet by test design). The observed behavior is that the pages that have the above happen end up as zero pages after the later fork-then-swapout-then-back-in . I do not see anything putting the pages that this happens to into any other lists to keep track of the contents of the page content. The swap-out and swap-in seem to have ignored these pages and to have been based on automatically zeroed pages instead. Note that the (or a) question might be if these pages should have ever gotten to this code at all. (I'm no expert overall.) But that might get into why POSIX_MADV_WILLNEED spanning each page is sufficient to avoid the zeros issue for work-then-swapout-and-back-in. I'll only write here about what the backtrace code seems to be doing if I'm interpreting correctly. One oddity here is that pmap_remove_l3 does its own pmap_invalidate_page to invalidate the same tlb entry as the above pmap_invalidate_page, so a double-invalidate. (I've no clue if such is just suboptimal vs. a form of error.) pmap_remove_l3 here does things that the analogous sys/arm/arm/pmap-v6.c's pmap_ts_referenced does not do and pmap-v6 does something this code does not. arm64's pmap_remove_l3 does (in summary): pmap_invalidate_page decrements the resident_count pmap_unwire_l3 (then pmap_ts_referenced's small_mappings code does another pmap_invalidate_page for the same argument values) arm pmap-v6's pmap_ts_referenced's small_mappings code does: conditional vm_page_dirty pte2_clear_bit for PTE2_A pmap_tlb_flush There is, for example, no decrement of the resident_count involved (that I found anyway).=20 But I've no clue just what should be analogous vs. what should not between pmap-v6 and arm64's pmap code in this area. I'll also note that the code before the arm64 small_mappings code also uses pmap_remove_l3 but does not do the decrement nor the extra pmap_invalidate_page (for example). But again I do not know how analogous the two paths should be. Only the small_mappings path seems to have the end-up-with-zeros problem for the later fork-then-swap-out and then swap-back-in context. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sun Apr 9 10:13:49 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE0B9D347F9 for ; Sun, 9 Apr 2017 10:13:49 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com [IPv6:2607:f8b0:400e:c00::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8D9CC1A5 for ; Sun, 9 Apr 2017 10:13:49 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pf0-x243.google.com with SMTP id i5so3126316pfc.3 for ; Sun, 09 Apr 2017 03:13:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=XzePEJ9rJfqoPW0mLbD6oYy8bbBOiUmYY+DWmXcf8YE=; b=Lg9IcjOeCme4j0qPFLYygXO8o9G8O7CS2oY6bYFPh+IyDWV1xGcx/gLXn/fzWZTPAv DlCq2ieNo4BSZpqXAoNKqS0f6J00eFVsLYEoB3WhzsDWfNgxb6/IMnJyZJKYc8WYEuW6 sRTM5bZtZL9Mpag3rFuL8PCoomnrACk8QqpH5nzv2qUG16uEWGVeX7Snz6cao3YBmDUY KIjARLxThR58u0ziX2R4uMH0Yh4fv39DhDdoiHuQmSshxT0w+u25P2zel/iY72UsNwKR y2KYvxP8dw6KzAp3I0NJs8zSHkg2W5dyW1Y9ibFASBQrSUixG6hp45yI2aBQenaEJapp 5FEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=XzePEJ9rJfqoPW0mLbD6oYy8bbBOiUmYY+DWmXcf8YE=; b=mQAt8EUyxXpBTviMhUgGsarB0DK6DUyfb4uMIyBiYcTKfrXnGtIHSrE3a6CzPI0+eW r1RadyvL5tv7U9hS2XuErFCYYNXbq4C4N5SxTgdm3ZQtld6ZPqw8RYttkR4OzPnTFXVy ic9okfap03TKeueOyakM0mwCkLbik8NWbSA8OoqDbPgURjR60o4aVi4SkGGXvFLbuuBL +nREjbSZoIHDz9e+im1dKUupnaaiL88JMRt0mu03/t3idTmgUSRe9HEvyRy9acOCkJ8z 0Wb8YZPehJKmPVG0T+YTQpRyx9AgJf3XhniRdx6Vn4WnAUkGvfep4uE/LFujxlDsFvNR 3vCg== X-Gm-Message-State: AN3rC/7XiLg11OyTo5B8+tivHdDhvFl7Qlg73Achb/iceHDLL1pEaD2GOS1ulr1fZNzqvw== X-Received: by 10.98.103.1 with SMTP id b1mr1919350pfc.184.1491732828452; Sun, 09 Apr 2017 03:13:48 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id x30sm18654332pgc.2.2017.04.09.03.13.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 03:13:47 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Ed Schouten References: Cc: freebsd-hackers@freebsd.org From: Yubin Ruan Message-ID: Date: Sun, 9 Apr 2017 18:13:40 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 10:13:49 -0000 On 2017/4/6 17:31, Ed Schouten wrote: > Hi Yubin, > > 2017-04-06 11:16 GMT+02:00 Yubin Ruan : >> Does this function provides the ordinary "spinlock" functionality? There >> is no special "test-and-set" instruction, and neither any extra locking >> to protect internal data structure manipulation. Isn't this subjected to >> race condition? > > Locking a spinlock is done through macro mtx_lock_spin(), which > expands to __mtx_lock_spin() in sys/sys/mutex.h. That macro first > calls into the function you looked at, spinlock_enter(), to disable > interrupts. It then calls into the _mtx_obtain_lock_fetch() to do the > test-and-set operation you were looking for. Thanks for replying. I have read some of those codes. Just a few more questions, if you don't mind: (1) why are spinlocks forced to disable interrupt in FreeBSD? From the book "The design and implementation of the FreeBSD Operating System", the authors say "spinning can result in deadlock if a thread interrupted the thread that held a mutex and then tried to acquire the mutex"...(section 4.3, Mutex Synchronization, paragraph 4) I don't get the point why a spinlock(or *spin mutex* in the FreeBSD world) has to disable interrupt. Being interrupted does not necessarily mean a deadlock. Assume that thread A holding a lock T gets interrupted by another thread B(context switch here) and thread B try to acquire the lock T. After finding out that lock T has already been acquired, thread B will just spin until it gets preempted, after which thread A gets waken up and run and release the lock T. So, you see there is not necessarily any deadlock even if thread A get interrupted. I can only remember two conditions where using spinlock without disabling interrupts will cause deadlock: #######1, spinlock used in an interrupt handler If a thread A holding a spinlock T get interrupted and the interrupt handler responsible for this interrupt try to acquire T, then we have deadlock, because A would never have a chance to run before the interrupt handler return, and the interrupt handler, unfortunately, will continue to spin ... so in this situation, one has to disable interrupt before spinning. As far as I know, in Linux, they provide two kinds of spinlocks: spin_lock(..); /* spinlock that does not disable interrupts */ spin_lock_irqsave(...); /* spinlock that disable local interrupt */ #######2, priority inversion problem If thread B with a higher priority get in and try to acquire the lock that thread A currently holds, then thread B would spin, while at the same time thread A has no chance to run because it has lower priority, thus not being able to release the lock. (I haven't investigate enough into the source code, so I don't know how FreeBSD and Linux handle this priority inversion problem. Maybe they use priority inheritance or random boosting?) thanks, Yubin Ruan From owner-freebsd-hackers@freebsd.org Sun Apr 9 12:27:25 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CA0DD2AC9F; Sun, 9 Apr 2017 12:27:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D253DF49; Sun, 9 Apr 2017 12:27:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v39CRF34047280 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 9 Apr 2017 15:27:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v39CRF34047280 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v39CRFNJ047279; Sun, 9 Apr 2017 15:27:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 9 Apr 2017 15:27:15 +0300 From: Konstantin Belousov To: Mark Millard Cc: freebsd-arm , freebsd-hackers@freebsd.org, andrew@freebsd.org Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them Message-ID: <20170409122715.GF1788@kib.kiev.ua> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 12:27:25 -0000 On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: > [I've identified the code path involved is the arm64 small allocations > turning into zeros for later fork-then-swapout-then-back-in, > specifically the ongoing RES(ident memory) size decrease that > "top -PCwaopid" shows before the fork/swap sequence. Hopefully > I've also exposed enough related information for someone that > knows what they are doing to get started with a specific > investigation, looking for a fix. I'd like for a pine64+ > 2GB to have buildworld complete despite the forking and > swapping involved (yep: for a time zero RES(ident memory) for > some processes involved in the build).] I was not able to follow the walls of text, but do not think that I pmap_ts_reference() is the real culprit there. Is my impression right that the issue occurs on fork, and looks as a memory corruption, where some page suddently becomes zero-filled ? And swapping seems to be involved ? It is somewhat interesting to see if the problem is reproducable on non-arm64 machines, e.g. armv7 or amd64. If answers to my two questions are yes, there is probably some bug with arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not provide hardware dirty bit, and pmap interprets an accessed writeable page as unconditionally dirty. More, accessed bit is also not maintained by hardware, instead if should be set by pmap. And arm64 pmap sets the AF bit unconditionally when creating valid pte. Hmm, could you try the following patch, I did not even compiled it. diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c index 3d5756ba891..55aa402eb1c 100644 --- a/sys/arm64/arm64/pmap.c +++ b/sys/arm64/arm64/pmap.c @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) sva += L3_SIZE) { l3 = pmap_load(l3p); if (pmap_l3_valid(l3)) { + if ((l3 & ATTR_SW_MANAGED) && + pmap_page_dirty(l3)) { + vm_page_dirty(PHYS_TO_VM_PAGE(l3 & + ~ATTR_MASK)); + } pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ From owner-freebsd-hackers@freebsd.org Sun Apr 9 13:28:53 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BBEE7D3507E for ; Sun, 9 Apr 2017 13:28:53 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8053EE6B for ; Sun, 9 Apr 2017 13:28:53 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: by mail-oi0-x232.google.com with SMTP id g204so51509505oib.1 for ; Sun, 09 Apr 2017 06:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MvooiGLvDttcl//L/B/qEVo9aBPnzYDm2YR5CZZI1xA=; b=JME6LdWKTYMqwGo62ty6Yi9tziia5GvNv5D470u29z38p9QBKmNm8YgH2DhJs2TE9q FnTilbvcqipZ6bhQ8KAnsRPrrSunmlEN9o1uLeyp3VIVEE3wcsKD2tHJQLfioWVKD7C5 dIVLDaXmb2epdrjtB2kTUHCHCGa7WUDbWsP/Tyyr9O1kSOaQ/zW2x7ccBK6yw5l4gUTx 5YtAG0TfhEi3xFmFIoGSrLm2Vy5yiChxCvFJyCH7Nw9aIp5Ge9xx+oyBWrI6tY0+rIjm cSa1X9QCYVlvq4nnSoonzms+V+wCVCHMJSUPsY+in4SjrwuFoLA1/DhzB2dUz0twcKdR RvuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MvooiGLvDttcl//L/B/qEVo9aBPnzYDm2YR5CZZI1xA=; b=AEv6nYpTfaV9FqmnjfEGy5l1nix0bpNwG5X9fnld13knaGiUcM3G6Jj0ql+CC43Q/i xXeaHGxDt2ga8Nurt1FvoYQYyFo71t0jeDwQPYOBjCuE/aZtfywb25DBJDReIUS2A2Cw c5LGXlPxe35l+5lRCCarsbKmw16QuiaDHJE9ClD3rYmWtPkwR1DKvjl+kkScw8xCxEL3 Pj7nT7uU7KnDem3rM2RQ7WtCaB2sDTcigZSTdq73h+8qgVkjzc4e0bHyBhhSfS49MfQ3 jRhdNZseW5cKwI6buoTFu/iYRssyYSMd+EUTT4ruI0F7d8jqZ4049ipo4twBJ/E2r6pz Pvfw== X-Gm-Message-State: AN3rC/4k9xG6DtH7W9WHi/IBz066/NHndLhRDVfOBgQg/lWalSc5lSOPwri0wRloGFI1Hot3UjnPAk1alv47EQ== X-Received: by 10.202.245.137 with SMTP id t131mr798786oih.149.1491744532658; Sun, 09 Apr 2017 06:28:52 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: vasanth sabavat Date: Sun, 09 Apr 2017 13:28:41 +0000 Message-ID: Subject: Re: Understanding the FreeBSD locking mechanism To: Ed Schouten , Yubin Ruan Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 13:28:53 -0000 On Sun, Apr 9, 2017 at 3:14 AM Yubin Ruan wrote: > On 2017/4/6 17:31, Ed Schouten wrote: > > Hi Yubin, > > > > 2017-04-06 11:16 GMT+02:00 Yubin Ruan : > >> Does this function provides the ordinary "spinlock" functionality? There > >> is no special "test-and-set" instruction, and neither any extra locking > >> to protect internal data structure manipulation. Isn't this subjected to > >> race condition? > > > > Locking a spinlock is done through macro mtx_lock_spin(), which > > expands to __mtx_lock_spin() in sys/sys/mutex.h. That macro first > > calls into the function you looked at, spinlock_enter(), to disable > > interrupts. It then calls into the _mtx_obtain_lock_fetch() to do the > > test-and-set operation you were looking for. > > Thanks for replying. I have read some of those codes. > > Just a few more questions, if you don't mind: > > (1) why are spinlocks forced to disable interrupt in FreeBSD? > > From the book "The design and implementation of the FreeBSD Operating > System", the authors say "spinning can result in deadlock if a thread > interrupted the thread that held a mutex and then tried to acquire the > mutex"...(section 4.3, Mutex Synchronization, paragraph 4) > > I don't get the point why a spinlock(or *spin mutex* in the FreeBSD > world) has to disable interrupt. Being interrupted does not necessarily > mean a deadlock. Assume that thread A holding a lock T gets interrupted > by another thread B(context switch here) and thread B try to acquire > the lock T. After finding out that lock T has already been acquired, > thread B will just spin until it gets preempted, after which thread A > gets waken up and run and release the lock T. Assume single CPU, If thread B spins where will thread A get to run and finish up its critical section and release the lock? The one CPU you have is held by thread b for spinning. For spin locks on single core, it does not make sense to spin. We just disable interrupts as we are currently the only ones running we just need to make sure no others will get to preempt us. That's why spin locks should be held for short duration. When you have multiple cores, ThreadA can spin on cpu1, while thread B holding the lock on cpu2 can finish up and release it. We disable interrupts only on cpu1 so we don't want to preempt threadA. The cost of preemption is very high compared to short spin. Note: short spin. Look at adaptive spin locks. So, you see there is not > necessarily any deadlock even if thread A get interrupted. > > I can only remember two conditions where using spinlock without > disabling interrupts will cause deadlock: > > #######1, spinlock used in an interrupt handler > If a thread A holding a spinlock T get interrupted and the interrupt > handler responsible for this interrupt try to acquire T, then we have > deadlock, because A would never have a chance to run before the > interrupt handler return, and the interrupt handler, unfortunately, > will continue to spin ... so in this situation, one has to disable > interrupt before spinning. > > As far as I know, in Linux, they provide two kinds of spinlocks: > > spin_lock(..); /* spinlock that does not disable interrupts */ > spin_lock_irqsave(...); /* spinlock that disable local interrupt */ > > > #######2, priority inversion problem > If thread B with a higher priority get in and try to acquire the lock > that thread A currently holds, then thread B would spin, while at the > same time thread A has no chance to run because it has lower priority, > thus not being able to release the lock. > (I haven't investigate enough into the source code, so I don't know > how FreeBSD and Linux handle this priority inversion problem. Maybe > they use priority inheritance or random boosting?) > > thanks, > Yubin Ruan > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- Thanks, Vasanth From owner-freebsd-hackers@freebsd.org Sun Apr 9 14:40:12 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A0EDD3661E; Sun, 9 Apr 2017 14:40:12 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (mail.metricspace.net [IPv6:2001:470:1f11:617::107]) by mx1.freebsd.org (Postfix) with ESMTP id 4DF8417D; Sun, 9 Apr 2017 14:40:12 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [172.16.0.205] (unknown [172.16.0.205]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id B1EF8186E; Sun, 9 Apr 2017 14:40:11 +0000 (UTC) Subject: Re: Proposal for a design for signed kernel/modules/etc To: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> From: Eric McCorkle Message-ID: <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> Date: Sun, 9 Apr 2017 10:40:07 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170408115222.GA64207@brick> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mtArbQXOnqfKwxkx45JFK13QqcR6Jn47r" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 14:40:12 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --mtArbQXOnqfKwxkx45JFK13QqcR6Jn47r Content-Type: multipart/mixed; boundary="LRivelBAQdLMTNuKBGmlcXn2bavWRRppf"; protected-headers="v1" From: Eric McCorkle To: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Message-ID: <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> Subject: Re: Proposal for a design for signed kernel/modules/etc References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> In-Reply-To: <20170408115222.GA64207@brick> --LRivelBAQdLMTNuKBGmlcXn2bavWRRppf Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/08/2017 07:52, Edward Tomasz Napiera=C5=82a wrote: > On 0408T0803, Eric McCorkle wrote: >> On 04/08/2017 07:11, Edward Tomasz Napiera=C5=82a wrote: >>> On 0327T1354, Eric McCorkle wrote: >>>> Hello everyone, >>>> >>>> The following is a design proposal for signed kernel and kernel modu= le >>>> loading, both at boot- and runtime (with the possibility open for si= gned >>>> executables and libraries if someone wanted to go that route). I'm >>>> interested in feedback on the idea before I start actually writing c= ode >>>> for it. >>> >>> I see two potential problems with this. >>> >>> First, our current loader(8) depends heavily on Forth code. By makin= g >>> it load modified 4th files, you can do absolutely anything you want; >>> AFAIK they have unrestricted access to hardware. So you should prefe= rably >>> be able to sign them as well. You _might_ (not sure on this one) als= o >>> want to be able to restrict access to some of the loader configuratio= n >>> variables. >> >> Loader is handled by the UEFI secure boot framework, though the concer= ns >> about the 4th code are still valid. In a secure system, you'd want to= >> do something about that, but the concerns are different enough (and it= 's >> isolated enough) that it could be done separately. >=20 > Unless the way to address those ends up being a signature mechanism > that doesn't depend on the format of the files being signed. I explored the idea of wrapped or detached signatures in the previous discussion. Envelopes or detached signatures could make sense for the 4th files. It's a small, obscure set of code that probably isn't changed very often. Envelopes or detached signatures for kernel modules and especially signed executables and libraries both have extensive, far-reaching consequences for system administration, packaging, tooling, the ports collection, and so on, whereas signing the executable with an additional section has no such consequences. Config files (and the 4th files really are more like config files) have a different set of constraints, and detached signatures are probably the way to go there. So loader should probably support detached PKCS#7 signature checks. --LRivelBAQdLMTNuKBGmlcXn2bavWRRppf-- --mtArbQXOnqfKwxkx45JFK13QqcR6Jn47r Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQRELMWN3SgpoYkrmidWwohAqoAEjQUCWOpHyAAKCRBWwohAqoAE jT0zAQCjaQTkFbS5xkr4eixhwOysahTZRg1iKojdfj/NpbIwyQEAj8MuUJvPSi12 xIqgCFSa47WyfCEAoAMOcjMqwdSEpgs= =i63w -----END PGP SIGNATURE----- --mtArbQXOnqfKwxkx45JFK13QqcR6Jn47r-- From owner-freebsd-hackers@freebsd.org Sun Apr 9 15:52:45 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6378AD3642C; Sun, 9 Apr 2017 15:52:45 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E00DC370; Sun, 9 Apr 2017 15:52:44 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by mail-wm0-x22a.google.com with SMTP id t189so21549268wmt.1; Sun, 09 Apr 2017 08:52:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=cSWNPRDYzpXGgut0TcmsMKgHX9IWDb8BS4PWSe4Bwrw=; b=d+XCugcs4eR/vvrYVnu2s8AMj4oCtvs8SlCR/PjpqarV7L9pGpe8ISWF3NxMME7Xar E/xg/MGw9d1P9O8Qm7DRZ4oR5uV6rXmYwbF8+oMrIofSBaIB/s9skvcSKZiQwVVMs4/7 bTx/EXcc2akVr64wmdJIj3vDnDmk6Dm/iPCYqsrQxjbGIbK0fpsRw2uDWZOrcyUCYdCi caefe7TtRMXx8bjDg2d05/PO04X/beUpMhi+Cf6gSHB/It/m0mc/y7P3s0RM6RNxb2q9 SebnWXt2osiNi9Coa83Q3NZXt9fH5YSc77MrTc0NZL1ejYFVQSlmmwtkw1ozJQ/iIp2y DrBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=cSWNPRDYzpXGgut0TcmsMKgHX9IWDb8BS4PWSe4Bwrw=; b=f96PP7XQuGTIM6Sb+omP1tmo/ytWysmcJ+pEiElt7HiJDcNvkPNf2+CDVotrPl/V35 rmtytIqBHxn+OxUNxNGDcL1Q/C2S87Hlv/5rUolLZh1LDWmolcKzX37qgftIj7yK1GAd iEJXe2GOnuW79DbaadBqpWyBNmAFTANYecjc1FiKRrivf8q8Xcj/WL5keDSVHXptwmjV ttTtZbyIk2tSJjFzdTcdyEpn8gKkoEW+iGRPjjcBhONQE7RMYr+9hK2fMe/ZIsV1AtJQ 3HDEAtVS7KP4a9ypFzplB+mu0qt6agiA7H7aGkFLsAxJ/A1soPtH+qlsB2WkVktpdkRX JdVg== X-Gm-Message-State: AN3rC/49buf3i1xQMGn6u86ZX/cq55QDm863joY2x9oMXvZgbM7ikqgK SS95vJU8tHE7p/gk X-Received: by 10.28.90.2 with SMTP id o2mr6544309wmb.53.1491753162403; Sun, 09 Apr 2017 08:52:42 -0700 (PDT) Received: from brick (cpc92310-cmbg19-2-0-cust934.5-4.cable.virginm.net. [82.9.227.167]) by smtp.gmail.com with ESMTPSA id v186sm6809403wmv.2.2017.04.09.08.52.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 08:52:41 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Sun, 9 Apr 2017 16:52:40 +0100 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Eric McCorkle Cc: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Subject: Re: Proposal for a design for signed kernel/modules/etc Message-ID: <20170409155240.GA18363@brick> Mail-Followup-To: Eric McCorkle , "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> User-Agent: Mutt/1.8.0 (2017-02-23) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 15:52:45 -0000 On 0409T1040, Eric McCorkle wrote: > On 04/08/2017 07:52, Edward Tomasz Napierała wrote: > > On 0408T0803, Eric McCorkle wrote: > >> On 04/08/2017 07:11, Edward Tomasz Napierała wrote: > >>> On 0327T1354, Eric McCorkle wrote: > >>>> Hello everyone, > >>>> > >>>> The following is a design proposal for signed kernel and kernel module > >>>> loading, both at boot- and runtime (with the possibility open for signed > >>>> executables and libraries if someone wanted to go that route). I'm > >>>> interested in feedback on the idea before I start actually writing code > >>>> for it. > >>> > >>> I see two potential problems with this. > >>> > >>> First, our current loader(8) depends heavily on Forth code. By making > >>> it load modified 4th files, you can do absolutely anything you want; > >>> AFAIK they have unrestricted access to hardware. So you should preferably > >>> be able to sign them as well. You _might_ (not sure on this one) also > >>> want to be able to restrict access to some of the loader configuration > >>> variables. > >> > >> Loader is handled by the UEFI secure boot framework, though the concerns > >> about the 4th code are still valid. In a secure system, you'd want to > >> do something about that, but the concerns are different enough (and it's > >> isolated enough) that it could be done separately. > > > > Unless the way to address those ends up being a signature mechanism > > that doesn't depend on the format of the files being signed. > > I explored the idea of wrapped or detached signatures in the previous > discussion. Envelopes or detached signatures could make sense for the > 4th files. It's a small, obscure set of code that probably isn't > changed very often. > > Envelopes or detached signatures for kernel modules and especially > signed executables and libraries both have extensive, far-reaching > consequences for system administration, packaging, tooling, the ports > collection, and so on, whereas signing the executable with an additional > section has no such consequences. > > Config files (and the 4th files really are more like config files) have > a different set of constraints, and detached signatures are probably the > way to go there. So loader should probably support detached PKCS#7 > signature checks. The third way that might be worth considering would be to just append the signature. This would work for both 4th (if you prepend it with whatever is the 4th comment character) and ELF, without the need for changing or extending either format. From owner-freebsd-hackers@freebsd.org Sun Apr 9 15:56:36 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D99EBD365C9 for ; Sun, 9 Apr 2017 15:56:36 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pg0-x241.google.com (mail-pg0-x241.google.com [IPv6:2607:f8b0:400e:c05::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A7AF27A1 for ; Sun, 9 Apr 2017 15:56:36 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pg0-x241.google.com with SMTP id 81so22803373pgh.3 for ; Sun, 09 Apr 2017 08:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=8yKtTJGqBRM5lN7JcFOLcgahMmg9BBFrov2P3XgjQiM=; b=TtrSR+cnwoyzQyyR0+WPMMc/0Szy+JWVz0nm3nrV9xEvswRQ5ugJtK/zKNLFwhQRSz 3XHJPJB5JL8L15JmNYl7Q109N1wWSygtY0ZPom5/jW1iujSpQPUCsyLp+sqOCv2rjO7G 6ZTOdzCaok6j32t0JCVWVkUD7A/8XBsAcLg/XbXm3hmeEmOvHZwiPZ6icmkrvLjuwQDq 2CI96FOfmmR08G7L0VZm08GF3zCrHoH3QXXxHC9u3O8l+oARco7je0GgsRwOJ7Gh5Hcf t0/jOhACsKghzMYWcszXx/lwkF132cj2ffWXST+w0CKJeqlQEFsfUfzmIJzWjqKevByn aXgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=8yKtTJGqBRM5lN7JcFOLcgahMmg9BBFrov2P3XgjQiM=; b=VFbpv93sHljt7zpwBZQuHeauKFKKbQBBeRYa2iYNU6nGPLNEXlRlc9lIwyVZRX9K4v XbRymTceWVo82lBv9Id/pvEfnM2G6QYxI1xxIJDFYN0aERO5BH44+fN56h5wbeLqxNqR vJoisICvGGGFq2QhQfbbq/bYJHdTJ5D2HvYzgsNefOBbUEhZWkH2jWk96fbG9F2jw/X4 70u6oTEuGKf+O9dCzcBg0ax70YP/uMQDsbs7q0YX1qw1K4vNjkirYw8IznkcZOYnx3Ke I5Ye9m0ldZhG9wOfy90AMoYPJIbegPZVShP66oDQQilcu/T1vjOc+fjcHPIhcQIgQ05h giUw== X-Gm-Message-State: AFeK/H3nwWI5SDsFFlKoObg+DhST2Sfnl5Qim3NAiK0O3eIwMEEm1IQdYDNpirJVf7PiVg== X-Received: by 10.84.231.193 with SMTP id g1mr48577104pln.84.1491753396150; Sun, 09 Apr 2017 08:56:36 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id f1sm19719389pfc.105.2017.04.09.08.56.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 08:56:34 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: vasanth sabavat References: Cc: Ed Schouten , freebsd-hackers@freebsd.org From: Yubin Ruan Message-ID: Date: Sun, 9 Apr 2017 23:56:29 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 15:56:36 -0000 On 2017/4/9 21:28, vasanth sabavat wrote: > > On Sun, Apr 9, 2017 at 3:14 AM Yubin Ruan > wrote: > > On 2017/4/6 17:31, Ed Schouten wrote: > > Hi Yubin, > > > > 2017-04-06 11:16 GMT+02:00 Yubin Ruan >: > >> Does this function provides the ordinary "spinlock" > functionality? There > >> is no special "test-and-set" instruction, and neither any extra > locking > >> to protect internal data structure manipulation. Isn't this > subjected to > >> race condition? > > > > Locking a spinlock is done through macro mtx_lock_spin(), which > > expands to __mtx_lock_spin() in sys/sys/mutex.h. That macro first > > calls into the function you looked at, spinlock_enter(), to disable > > interrupts. It then calls into the _mtx_obtain_lock_fetch() to do the > > test-and-set operation you were looking for. > > Thanks for replying. I have read some of those codes. > > Just a few more questions, if you don't mind: > > (1) why are spinlocks forced to disable interrupt in FreeBSD? > > From the book "The design and implementation of the FreeBSD Operating > System", the authors say "spinning can result in deadlock if a thread > interrupted the thread that held a mutex and then tried to acquire the > mutex"...(section 4.3, Mutex Synchronization, paragraph 4) > > I don't get the point why a spinlock(or *spin mutex* in the FreeBSD > world) has to disable interrupt. Being interrupted does not necessarily > mean a deadlock. Assume that thread A holding a lock T gets interrupted > by another thread B(context switch here) and thread B try to acquire > the lock T. After finding out that lock T has already been acquired, > thread B will just spin until it gets preempted, after which thread A > gets waken up and run and release the lock T. > > > Assume single CPU, If thread B spins where will thread A get to run and > finish up its critical section and release the lock? The one CPU you > have is held by thread b for spinning. > > For spin locks on single core, it does not make sense to spin. We just > disable interrupts as we are currently the only ones running we just > need to make sure no others will get to preempt us. That's why spin > locks should be held for short duration. > > When you have multiple cores, ThreadA can spin on cpu1, while thread B > holding the lock on cpu2 can finish up and release it. We disable > interrupts only on cpu1 so we don't want to preempt threadA. The cost of > preemption is very high compared to short spin. Note: short spin. > > Look at adaptive spin locks. Can't the scheduler preempt thread B and put thread A to run? After all, we did not disable interrupt. regards, Yubin Ruan > > So, you see there is not > necessarily any deadlock even if thread A get interrupted. > > I can only remember two conditions where using spinlock without > disabling interrupts will cause deadlock: > > #######1, spinlock used in an interrupt handler > If a thread A holding a spinlock T get interrupted and the interrupt > handler responsible for this interrupt try to acquire T, then we have > deadlock, because A would never have a chance to run before the > interrupt handler return, and the interrupt handler, unfortunately, > will continue to spin ... so in this situation, one has to disable > interrupt before spinning. > > As far as I know, in Linux, they provide two kinds of spinlocks: > > spin_lock(..); /* spinlock that does not disable interrupts */ > spin_lock_irqsave(...); /* spinlock that disable local interrupt */ > > > #######2, priority inversion problem > If thread B with a higher priority get in and try to acquire the lock > that thread A currently holds, then thread B would spin, while at the > same time thread A has no chance to run because it has lower priority, > thus not being able to release the lock. > (I haven't investigate enough into the source code, so I don't know > how FreeBSD and Linux handle this priority inversion problem. Maybe > they use priority inheritance or random boosting?) > > thanks, > Yubin Ruan > _______________________________________________ > freebsd-hackers@freebsd.org > mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org > " > > -- > Thanks, > Vasanth From owner-freebsd-hackers@freebsd.org Sun Apr 9 16:01:46 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D02ACD3688F; Sun, 9 Apr 2017 16:01:46 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (mail.metricspace.net [IPv6:2001:470:1f11:617::107]) by mx1.freebsd.org (Postfix) with ESMTP id 9A58CCD2; Sun, 9 Apr 2017 16:01:46 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [172.16.0.205] (unknown [172.16.0.205]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id 1E01A189B; Sun, 9 Apr 2017 16:01:46 +0000 (UTC) Subject: Re: Proposal for a design for signed kernel/modules/etc To: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> <20170409155240.GA18363@brick> From: Eric McCorkle Message-ID: <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> Date: Sun, 9 Apr 2017 12:01:42 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170409155240.GA18363@brick> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="O4gwvhcQpw7ukKHrkL2c3DUgq5CLSkPG4" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 16:01:46 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --O4gwvhcQpw7ukKHrkL2c3DUgq5CLSkPG4 Content-Type: multipart/mixed; boundary="P2CjpF9BIF78trnltNW12bP2AkfCfbv8U"; protected-headers="v1" From: Eric McCorkle To: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Message-ID: <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> Subject: Re: Proposal for a design for signed kernel/modules/etc References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> <20170409155240.GA18363@brick> In-Reply-To: <20170409155240.GA18363@brick> --P2CjpF9BIF78trnltNW12bP2AkfCfbv8U Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/09/2017 11:52, Edward Tomasz Napiera=C5=82a wrote: > On 0409T1040, Eric McCorkle wrote: >> On 04/08/2017 07:52, Edward Tomasz Napiera=C5=82a wrote: >>> On 0408T0803, Eric McCorkle wrote: >>>> On 04/08/2017 07:11, Edward Tomasz Napiera=C5=82a wrote: >>>>> On 0327T1354, Eric McCorkle wrote: >>>>>> Hello everyone, >>>>>> >>>>>> The following is a design proposal for signed kernel and kernel mo= dule >>>>>> loading, both at boot- and runtime (with the possibility open for = signed >>>>>> executables and libraries if someone wanted to go that route). I'= m >>>>>> interested in feedback on the idea before I start actually writing= code >>>>>> for it. >>>>> >>>>> I see two potential problems with this. >>>>> >>>>> First, our current loader(8) depends heavily on Forth code. By mak= ing >>>>> it load modified 4th files, you can do absolutely anything you want= ; >>>>> AFAIK they have unrestricted access to hardware. So you should pre= ferably >>>>> be able to sign them as well. You _might_ (not sure on this one) a= lso >>>>> want to be able to restrict access to some of the loader configurat= ion >>>>> variables. >>>> >>>> Loader is handled by the UEFI secure boot framework, though the conc= erns >>>> about the 4th code are still valid. In a secure system, you'd want = to >>>> do something about that, but the concerns are different enough (and = it's >>>> isolated enough) that it could be done separately. >>> >>> Unless the way to address those ends up being a signature mechanism >>> that doesn't depend on the format of the files being signed. >> >> I explored the idea of wrapped or detached signatures in the previous >> discussion. Envelopes or detached signatures could make sense for the= >> 4th files. It's a small, obscure set of code that probably isn't >> changed very often. >> >> Envelopes or detached signatures for kernel modules and especially >> signed executables and libraries both have extensive, far-reaching >> consequences for system administration, packaging, tooling, the ports >> collection, and so on, whereas signing the executable with an addition= al >> section has no such consequences. >> >> Config files (and the 4th files really are more like config files) hav= e >> a different set of constraints, and detached signatures are probably t= he >> way to go there. So loader should probably support detached PKCS#7 >> signature checks. >=20 > The third way that might be worth considering would be to just append > the signature. This would work for both 4th (if you prepend it with > whatever is the 4th comment character) and ELF, without the need for > changing or extending either format. No, that won't work at all. That's going to break the tooling for ELF files as well as applications that use them, and it won't work for any configuration file aside from loader.4th It wouldn't even work for boot.conf, for example. More generally, that's basing an entire standard off a dead language that's used in only one place, and in a way the precludes compatibility with any file format that uses a different comment character. It also mandates some kind of ASCII encoding scheme to avoid newlines. If I was going to adopt a solution that broke existing tooling, I'd at least go with a proper envelope scheme. --P2CjpF9BIF78trnltNW12bP2AkfCfbv8U-- --O4gwvhcQpw7ukKHrkL2c3DUgq5CLSkPG4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQRELMWN3SgpoYkrmidWwohAqoAEjQUCWOpa5gAKCRBWwohAqoAE jWZbAP4iE8lRz6j0hwlRq4UEs8FRld4Okk4KzkmhwOJ4Wm8Z7QD+KTupXfPRXknm 6S8BLi6wyH1kgDDmwp8CGw/iQTv66Q8= =EaXS -----END PGP SIGNATURE----- --O4gwvhcQpw7ukKHrkL2c3DUgq5CLSkPG4-- From owner-freebsd-hackers@freebsd.org Sun Apr 9 16:24:36 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA036D36EE5 for ; Sun, 9 Apr 2017 16:24:36 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x232.google.com (mail-io0-x232.google.com [IPv6:2607:f8b0:4001:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8278CA12 for ; Sun, 9 Apr 2017 16:24:36 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-io0-x232.google.com with SMTP id a103so6831390ioj.1 for ; Sun, 09 Apr 2017 09:24:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=PGcYTtZ5TIDJ77DbPiJQ2wzZLs4147SEwUmNQEtEpzM=; b=ib1mlHiAngJi9ECbeWcxseHQhhgETA+HW0k7ppwTmoJ5scvlrgi7MxHwOpZWpoTtKV xoKSKQ9zBipwB3tibVxJVqW1AMiOtCU10kM5yys6gUTLFcQMbfyGUL1eIeDR5pKUSEOx FAc8ctSahadGIkwH7n3cKemmlc/WwR5TRNd15M4Q6VZtxqH5Tgi0JpWv+knmXEM5aLii caL8bbRSCB2bxxnEtnfwVRn8DJuxhpQk8MtOKuAbThrNrNbZgKvfznPqgT/MFx6NHoWJ uOzFdRKqB5srWA38wBxdUpVv+WNoQhOw/GV78MMT0uED8ASxQR7ra0uHRq/6rmujlmQO biFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=PGcYTtZ5TIDJ77DbPiJQ2wzZLs4147SEwUmNQEtEpzM=; b=APPWrkHtkNfTsAekwckFbr+VrB/3w8Azv+ynqTAl6MtlPhqZe8k5e1e4CewifLgHlj eQP+xKIPfhefdgzH9JwTM0CY/rr4Yi++ELS2leeQydbf3YD+KSe763W+kE7744KmNfb6 Z1ofzRMcKSU9sedZaLwhvkNtpwaH0qaQAzZp8SH5ZD1G+OwzkK1vhaCFMtAMvGm74ZVR Hw/k7ii0sbHkPdELpIX4YJgt9xHPO0C8cIksYZ6Lsdy3K5vTOq2CDa+0tA+dFHo5IW85 5TnZVMOQfj+8L7VpUyC3iArPqwfWXPO5q0QCb6b2jyY5/gQWZXg+La95YyWPO/T2Lo8A C11Q== X-Gm-Message-State: AN3rC/7dwtgGfntc+Tq39E3vxQ21UF1KC6cHtx4ZXvEFDQt2Me7SaxxzZZ+S/CH7+jr2mK3OsGm1JDjwuozeuQ== X-Received: by 10.107.11.159 with SMTP id 31mr2397501iol.41.1491755075934; Sun, 09 Apr 2017 09:24:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.19.33 with HTTP; Sun, 9 Apr 2017 09:24:35 -0700 (PDT) In-Reply-To: References: From: Ryan Stone Date: Sun, 9 Apr 2017 12:24:35 -0400 Message-ID: Subject: Re: Understanding the FreeBSD locking mechanism To: Yubin Ruan Cc: Ed Schouten , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 16:24:36 -0000 On Sun, Apr 9, 2017 at 6:13 AM, Yubin Ruan wrote: > > #######1, spinlock used in an interrupt handler > If a thread A holding a spinlock T get interrupted and the interrupt > handler responsible for this interrupt try to acquire T, then we have > deadlock, because A would never have a chance to run before the > interrupt handler return, and the interrupt handler, unfortunately, > will continue to spin ... so in this situation, one has to disable > interrupt before spinning. > > As far as I know, in Linux, they provide two kinds of spinlocks: > > spin_lock(..); /* spinlock that does not disable interrupts */ > spin_lock_irqsave(...); /* spinlock that disable local interrupt * In the FreeBSD locking style, a spinlock is only used in the case where one needs to synchronize with an interrupt handler. This is why spinlocks always disable local interrupts in FreeBSD. FreeBSD's lock for the first case is the MTX_DEF mutex, which is adaptively-spinning blocking mutex implementation. In short, the MTX_DEF mutex will spin waiting for the lock if the owner is running, but will block if the owner is deschedules. This prevents expensive trips through the scheduler for the common case where the mutex is only held for short periods, without wasting CPU cycles spinning in cases where the owner thread is descheduled and therefore will not be completing soon. #######2, priority inversion problem > If thread B with a higher priority get in and try to acquire the lock > that thread A currently holds, then thread B would spin, while at the > same time thread A has no chance to run because it has lower priority, > thus not being able to release the lock. > (I haven't investigate enough into the source code, so I don't know > how FreeBSD and Linux handle this priority inversion problem. Maybe > they use priority inheritance or random boosting?) > FreeBSD's spin locks prevent priority inversion by preventing the holder thread from being descheduled. MTX_DEF locks implement priority inheritance. From owner-freebsd-hackers@freebsd.org Sun Apr 9 16:48:51 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA113D364CF for ; Sun, 9 Apr 2017 16:48:51 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: from mail-oi0-x22e.google.com (mail-oi0-x22e.google.com [IPv6:2607:f8b0:4003:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6507E3B0 for ; Sun, 9 Apr 2017 16:48:51 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: by mail-oi0-x22e.google.com with SMTP id b187so127163346oif.0 for ; Sun, 09 Apr 2017 09:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6Y0XDOknjJXK/q5/gAGI4h8hsW3WVLx4f39J0Vi01jg=; b=ZQCNiWGfb8lq2JU2qeQfiYvTcJnqiFmKbYQOIPNtyRVYFvvVR7xyn5+ezk1mdsxKpK A1X0wCzg1f/suT986BCqP0Z2qNI2dOj9GcDIYYjHucMCHKlOG3lNYfxj+sQwnGOsKlIu vB9AITCzxypu3lwAyNbCmsfNz3LNGrgoVt6BBD8eSiiaYKbf9FHM38XjhStb8CNYqXeJ mIANAcoCMAnHzenxobVW+ksk3nCXgEGazdbT2PU2L/ol7HNo/1qxXeCZwuaSHbURUTkM WwwKIxnAYBHTLWwM/dY/fXWHiq3RmDGLK9JEKZ7QO9vZLgp5ehWy+Wf+Q2Z9OSLHFz+6 mX+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6Y0XDOknjJXK/q5/gAGI4h8hsW3WVLx4f39J0Vi01jg=; b=FjEzEcynYdbnLxe7pN3OFRED8/By1vzZhN7FYpVcwOrncS9P3WVS/5z5tp0qk3QIfn ahgK5B1MzS/jVyJvG++ep7iiNn3GHrhoZMaUZsC8smlw0WYCGfzmEH1jP8Pe6ol+iE6h XRGSF7UtJWyqEltI9p0yzxBakoLrBs020F6h7/HJbCshSdUWLpMxZ/252lluWt+IO17w DBt8caQKoEII+PWyZmpUV9H1HVaSIZIIpBGTDm+LgLHfjjqOM1gVOK4NfjxKn/j5C9jV AjMA8hdHD0i/TaMCR5oslYGM7baJDi/EvnnKS1sty0SxD9ZOYXOHy1PNq+BGmNLbcLdl QHLw== X-Gm-Message-State: AN3rC/45zT5DtAjTfWZ8h3nroQl8uwzrdlUmyAzMzFeOwR/azlG149cEhuoLFSEERXgMGO0JinRc+NBfBe8gVA== X-Received: by 10.157.11.123 with SMTP id p56mr7577784otd.149.1491756530356; Sun, 09 Apr 2017 09:48:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: vasanth sabavat Date: Sun, 09 Apr 2017 16:48:39 +0000 Message-ID: Subject: Re: Understanding the FreeBSD locking mechanism To: Ryan Stone , Yubin Ruan Cc: Ed Schouten , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 16:48:51 -0000 On Sun, Apr 9, 2017 at 9:24 AM Ryan Stone wrote: > On Sun, Apr 9, 2017 at 6:13 AM, Yubin Ruan wrote: > > > > > #######1, spinlock used in an interrupt handler > > If a thread A holding a spinlock T get interrupted and the interrupt > > handler responsible for this interrupt try to acquire T, then we have > > deadlock, because A would never have a chance to run before the > > interrupt handler return, and the interrupt handler, unfortunately, > > will continue to spin ... so in this situation, one has to disable > > interrupt before spinning. > > > > As far as I know, in Linux, they provide two kinds of spinlocks: > > > > spin_lock(..); /* spinlock that does not disable interrupts */ > > spin_lock_irqsave(...); /* spinlock that disable local interrupt * > > > In the FreeBSD locking style, a spinlock is only used in the case where one > needs to > synchronize with an interrupt handler. This is why spinlocks always > disable local > interrupts in FreeBSD. Isn't it true that interrupt handlers instead of running on the current thread stack now have their own thread? > > FreeBSD's lock for the first case is the MTX_DEF mutex, which is > adaptively-spinning > blocking mutex implementation. In short, the MTX_DEF mutex will spin > waiting for the > lock if the owner is running, but will block if the owner is deschedules. > This prevents > expensive trips through the scheduler for the common case where the mutex > is only held > for short periods, without wasting CPU cycles spinning in cases where the > owner thread is > descheduled and therefore will not be completing soon. > > #######2, priority inversion problem > > If thread B with a higher priority get in and try to acquire the lock > > that thread A currently holds, then thread B would spin, while at the > > same time thread A has no chance to run because it has lower priority, > > thus not being able to release the lock. > > (I haven't investigate enough into the source code, so I don't know > > how FreeBSD and Linux handle this priority inversion problem. Maybe > > they use priority inheritance or random boosting?) > > > > FreeBSD's spin locks prevent priority inversion by preventing the holder > thread from > being descheduled. > > MTX_DEF locks implement priority inheritance. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- Thanks, Vasanth From owner-freebsd-hackers@freebsd.org Sun Apr 9 16:50:37 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AE10CD365D0 for ; Sun, 9 Apr 2017 16:50:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x22c.google.com (mail-io0-x22c.google.com [IPv6:2607:f8b0:4001:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7A811789 for ; Sun, 9 Apr 2017 16:50:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x22c.google.com with SMTP id a103so7153059ioj.1 for ; Sun, 09 Apr 2017 09:50:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=uq4N62VcObOpr6hm1+yt24jW5Lh4EG5i+bJ9+JSrzn8=; b=Yy8+Q7CiyFdr9jOo5b0lcI6gWUdlae3qFXQgXDEpw7wne6mSQPVMBpmATDg579XYDu 5S6axvlm+TyNFVZhLr/N0pmR0UYF3DQHgYy33nW9Wh6fQ5Htwhw8oBm3FClQPMAs74R3 BQeK2ADJtQr2PHMkndIFAg3xO1zP/H5l7fGcWjRdDRbsSl6HVeQIOI3f8JygKMCRTwGV 3M5PjZ8itgU0ufL7UvkEVFWaXuM4yxl71UZR5AzGElQlNbhfZcqglKTapItY31miP7+L DbFugGr1IfUcVj8JBBNeRqR586HSx90LQvbHWbktfoY0IUmya4u2x1uekENOhY9BF2Nh Wnaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=uq4N62VcObOpr6hm1+yt24jW5Lh4EG5i+bJ9+JSrzn8=; b=OTYeGrhtkvV+TlQnLR3Ra0bPcXLgCgBU7vBCvOds7jsCErDahY/ELpVrvziueaKQ4v t9jE2n1PsjtAWpGZxkCjTR+10LAn1ZwLMuVATkZVwgUfO1Vnyj7rkAq9fyiVjLDSVPYx EKR5qGnxODX6/p6x6eVIPdVV8F5cI7dfn5NZwWQ8PVK0dKs1gfDfCWqi8zYIeNtPV6a7 w19oZhq9Yse9FDEDevkj1inoSGZLaL9h6kBX3cG77k5RkteCoqqYAXV1gIqND6WXrvWY VRDGb/Qded5Sx+M6o9pCIorw7P1DvFETZY3Pum7LinwqWfqRxe71z7XItauN5YAVXRCl 6xfg== X-Gm-Message-State: AN3rC/53UZlGsKPu34znOT5JiWojidwCmIiDaOmqzxx1fQPpXERBlMnkIJU1Jsk0YuX7lk8ZfgehIWhk1JgNkg== X-Received: by 10.107.198.137 with SMTP id w131mr2892855iof.19.1491756636689; Sun, 09 Apr 2017 09:50:36 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.146.24 with HTTP; Sun, 9 Apr 2017 09:50:36 -0700 (PDT) X-Originating-IP: [2607:fb10:7021:1::b517] In-Reply-To: <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> <20170409155240.GA18363@brick> <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> From: Warner Losh Date: Sun, 9 Apr 2017 10:50:36 -0600 X-Google-Sender-Auth: zkX6U72fUJc7oD084joHMA5XRig Message-ID: Subject: Re: Proposal for a design for signed kernel/modules/etc To: Eric McCorkle Cc: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 16:50:37 -0000 On Sun, Apr 9, 2017 at 10:01 AM, Eric McCorkle wrote= : > On 04/09/2017 11:52, Edward Tomasz Napiera=C5=82a wrote: >> On 0409T1040, Eric McCorkle wrote: >>> On 04/08/2017 07:52, Edward Tomasz Napiera=C5=82a wrote: >>>> On 0408T0803, Eric McCorkle wrote: >>>>> On 04/08/2017 07:11, Edward Tomasz Napiera=C5=82a wrote: >>>>>> On 0327T1354, Eric McCorkle wrote: >>>>>>> Hello everyone, >>>>>>> >>>>>>> The following is a design proposal for signed kernel and kernel mod= ule >>>>>>> loading, both at boot- and runtime (with the possibility open for s= igned >>>>>>> executables and libraries if someone wanted to go that route). I'm >>>>>>> interested in feedback on the idea before I start actually writing = code >>>>>>> for it. >>>>>> >>>>>> I see two potential problems with this. >>>>>> >>>>>> First, our current loader(8) depends heavily on Forth code. By maki= ng >>>>>> it load modified 4th files, you can do absolutely anything you want; >>>>>> AFAIK they have unrestricted access to hardware. So you should pref= erably >>>>>> be able to sign them as well. You _might_ (not sure on this one) al= so >>>>>> want to be able to restrict access to some of the loader configurati= on >>>>>> variables. >>>>> >>>>> Loader is handled by the UEFI secure boot framework, though the conce= rns >>>>> about the 4th code are still valid. In a secure system, you'd want t= o >>>>> do something about that, but the concerns are different enough (and i= t's >>>>> isolated enough) that it could be done separately. >>>> >>>> Unless the way to address those ends up being a signature mechanism >>>> that doesn't depend on the format of the files being signed. >>> >>> I explored the idea of wrapped or detached signatures in the previous >>> discussion. Envelopes or detached signatures could make sense for the >>> 4th files. It's a small, obscure set of code that probably isn't >>> changed very often. >>> >>> Envelopes or detached signatures for kernel modules and especially >>> signed executables and libraries both have extensive, far-reaching >>> consequences for system administration, packaging, tooling, the ports >>> collection, and so on, whereas signing the executable with an additiona= l >>> section has no such consequences. >>> >>> Config files (and the 4th files really are more like config files) have >>> a different set of constraints, and detached signatures are probably th= e >>> way to go there. So loader should probably support detached PKCS#7 >>> signature checks. >> >> The third way that might be worth considering would be to just append >> the signature. This would work for both 4th (if you prepend it with >> whatever is the 4th comment character) and ELF, without the need for >> changing or extending either format. > > No, that won't work at all. That's going to break the tooling for ELF > files as well as applications that use them, and it won't work for any > configuration file aside from loader.4th It wouldn't even work for > boot.conf, for example. > > More generally, that's basing an entire standard off a dead language > that's used in only one place, and in a way the precludes compatibility > with any file format that uses a different comment character. It also > mandates some kind of ASCII encoding scheme to avoid newlines. You don't need to avoid new lines with 4th. It doesn't even need to be an ASCII encoding scheme, unless you are doing something crazy like trying to push the signature through the 4th parser, which is nuts. Forth can read binary files just fine. But I think arguing over the 4th stuff is a distraction, dee below. > If I was going to adopt a solution that broke existing tooling, I'd at > least go with a proper envelope scheme. That would be preferable. But why the either-or dichotomy? Seems like you're looking at the problem wrong if you are arguing about 4th code. You should be thinking more in terms of, at most, a couple of 4th words that can implement this stuff (so the loader could show that the kernel is signed and valid vs is not signed vs is signed, but the signature is bogus). 99% of the functionality should be in C, and should be sharable between the loader, the kernel and whatever else may wish to verify signatures before loading. It would also allow the same functionality to be pushed into the on-again-off-again LUA boot project (which seems to have momentum this time). Warner From owner-freebsd-hackers@freebsd.org Sun Apr 9 17:17:37 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5BB91D3601E; Sun, 9 Apr 2017 17:17:37 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (mail.metricspace.net [IPv6:2001:470:1f11:617::107]) by mx1.freebsd.org (Postfix) with ESMTP id 2C2459CD; Sun, 9 Apr 2017 17:17:37 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [172.16.0.205] (unknown [172.16.0.205]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id A712318CF; Sun, 9 Apr 2017 17:17:29 +0000 (UTC) Subject: Re: Proposal for a design for signed kernel/modules/etc To: Warner Losh References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> <20170409155240.GA18363@brick> <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> Cc: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org From: Eric McCorkle Message-ID: <082223a0-1768-f5f0-9f4a-2e9fd45716c7@metricspace.net> Date: Sun, 9 Apr 2017 13:17:26 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MKJ6xGh0nJg9jVG0lQanCjkQ11rg9qNAe" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 17:17:37 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MKJ6xGh0nJg9jVG0lQanCjkQ11rg9qNAe Content-Type: multipart/mixed; boundary="CL7dlJ1GfWJTi96v7o8i4fBVIpSo8E8P3"; protected-headers="v1" From: Eric McCorkle To: Warner Losh Cc: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Message-ID: <082223a0-1768-f5f0-9f4a-2e9fd45716c7@metricspace.net> Subject: Re: Proposal for a design for signed kernel/modules/etc References: <6f6b47ed-84e0-e4c0-9df5-350620cff45b@metricspace.net> <20170408111144.GC14604@brick> <181f7b78-64c3-53a6-a143-721ef0cb5186@metricspace.net> <20170408115222.GA64207@brick> <7611f7a3-3e50-65f2-4347-e37018ae1abc@metricspace.net> <20170409155240.GA18363@brick> <8a60d967-eb7f-b529-df03-c0bfccbe9747@metricspace.net> In-Reply-To: --CL7dlJ1GfWJTi96v7o8i4fBVIpSo8E8P3 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/09/2017 12:50, Warner Losh wrote: > On Sun, Apr 9, 2017 at 10:01 AM, Eric McCorkle w= rote: >> On 04/09/2017 11:52, Edward Tomasz Napiera=C5=82a wrote: >>> On 0409T1040, Eric McCorkle wrote: >>>> On 04/08/2017 07:52, Edward Tomasz Napiera=C5=82a wrote: >>>>> On 0408T0803, Eric McCorkle wrote: >>>>>> On 04/08/2017 07:11, Edward Tomasz Napiera=C5=82a wrote: >>>>>>> On 0327T1354, Eric McCorkle wrote: >>>>>>>> Hello everyone, >>>>>>>> >>>>>>>> The following is a design proposal for signed kernel and kernel = module >>>>>>>> loading, both at boot- and runtime (with the possibility open fo= r signed >>>>>>>> executables and libraries if someone wanted to go that route). = I'm >>>>>>>> interested in feedback on the idea before I start actually writi= ng code >>>>>>>> for it. >>>>>>> >>>>>>> I see two potential problems with this. >>>>>>> >>>>>>> First, our current loader(8) depends heavily on Forth code. By m= aking >>>>>>> it load modified 4th files, you can do absolutely anything you wa= nt; >>>>>>> AFAIK they have unrestricted access to hardware. So you should p= referably >>>>>>> be able to sign them as well. You _might_ (not sure on this one)= also >>>>>>> want to be able to restrict access to some of the loader configur= ation >>>>>>> variables. >>>>>> >>>>>> Loader is handled by the UEFI secure boot framework, though the co= ncerns >>>>>> about the 4th code are still valid. In a secure system, you'd wan= t to >>>>>> do something about that, but the concerns are different enough (an= d it's >>>>>> isolated enough) that it could be done separately. >>>>> >>>>> Unless the way to address those ends up being a signature mechanism= >>>>> that doesn't depend on the format of the files being signed. >>>> >>>> I explored the idea of wrapped or detached signatures in the previou= s >>>> discussion. Envelopes or detached signatures could make sense for t= he >>>> 4th files. It's a small, obscure set of code that probably isn't >>>> changed very often. >>>> >>>> Envelopes or detached signatures for kernel modules and especially >>>> signed executables and libraries both have extensive, far-reaching >>>> consequences for system administration, packaging, tooling, the port= s >>>> collection, and so on, whereas signing the executable with an additi= onal >>>> section has no such consequences. >>>> >>>> Config files (and the 4th files really are more like config files) h= ave >>>> a different set of constraints, and detached signatures are probably= the >>>> way to go there. So loader should probably support detached PKCS#7 >>>> signature checks. >>> >>> The third way that might be worth considering would be to just append= >>> the signature. This would work for both 4th (if you prepend it with >>> whatever is the 4th comment character) and ELF, without the need for >>> changing or extending either format. >> >> No, that won't work at all. That's going to break the tooling for ELF= >> files as well as applications that use them, and it won't work for any= >> configuration file aside from loader.4th It wouldn't even work for >> boot.conf, for example. >> >> More generally, that's basing an entire standard off a dead language >> that's used in only one place, and in a way the precludes compatibilit= y >> with any file format that uses a different comment character. It also= >> mandates some kind of ASCII encoding scheme to avoid newlines. >=20 > You don't need to avoid new lines with 4th. It doesn't even need to be > an ASCII encoding scheme, unless you are doing something crazy like > trying to push the signature through the 4th parser, which is nuts. > Forth can read binary files just fine. But I think arguing over the > 4th stuff is a distraction, dee below. >=20 >> If I was going to adopt a solution that broke existing tooling, I'd at= >> least go with a proper envelope scheme. >=20 > That would be preferable. >=20 > But why the either-or dichotomy? Seems like you're looking at the > problem wrong if you are arguing about 4th code. You should be > thinking more in terms of, at most, a couple of 4th words that can > implement this stuff (so the loader could show that the kernel is > signed and valid vs is not signed vs is signed, but the signature is > bogus). 99% of the functionality should be in C, and should be > sharable between the loader, the kernel and whatever else may wish to > verify signatures before loading. It would also allow the same > functionality to be pushed into the on-again-off-again LUA boot > project (which seems to have momentum this time). >=20 I'm not following what you're saying. I don't think anyone was suggesting doing signature *verification* in 4th (at least I hope not!). The issue is about the format of the signatures. Basically, the crux of my proposal is about using an ELF section to store signatures, which has immediate use for kernel module loading as well as in the boot loader for the same purpose. Now, the boot programs, loader, and perhaps the kernel too all load various additional config files (boot.conf, loader.4th, loader.conf, etc). These do also need to be signed, so there needs to be a solution for this as well. There's significant advantages to the ELF .sign section, and all the alternatives have serious disadvantages. For these reasons, I'm pretty set on the .sign section. With the config files (which includes the 4th code), you don't have a file format that transparently supports additional metadata (like ELF does). So you have a choice between storing detached signatures in an external file (the way GRUB does) or using an envelope format. Of the two, the envelope is preferable, I think, though it should probably have a different name (ex: loader.4th.pk7, loader.conf.pk7) and be understood to contain an envelope, not a raw config file. The implementation of all this would be in C, of course. The verification stuff would be compiled in to loader and kernel. The elf-signing would be done by a command-line utility (which I've half-written at this point). Ideally, the signing of config files would be doable with the standard openssl command-line. --CL7dlJ1GfWJTi96v7o8i4fBVIpSo8E8P3-- --MKJ6xGh0nJg9jVG0lQanCjkQ11rg9qNAe Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQRELMWN3SgpoYkrmidWwohAqoAEjQUCWOpspgAKCRBWwohAqoAE jQ38AQCBS/XagV7XTbcddwhcVSvvwPw1iQKYnMYAUUumSSJ9ZQD/ahJsW5QVbf7R d8z+nk1a4SUI98zbv4crR0O+pXjHSgE= =BuYb -----END PGP SIGNATURE----- --MKJ6xGh0nJg9jVG0lQanCjkQ11rg9qNAe-- From owner-freebsd-hackers@freebsd.org Sun Apr 9 17:24:38 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64FBAD36426 for ; Sun, 9 Apr 2017 17:24:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-7.reflexion.net [208.70.210.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 29E6BC8 for ; Sun, 9 Apr 2017 17:24:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 31129 invoked from network); 9 Apr 2017 17:24:31 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 9 Apr 2017 17:24:31 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 09 Apr 2017 13:24:31 -0400 (EDT) Received: (qmail 20454 invoked from network); 9 Apr 2017 17:24:31 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Apr 2017 17:24:31 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 5CF31EC8630; Sun, 9 Apr 2017 10:24:30 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <20170409122715.GF1788@kib.kiev.ua> Date: Sun, 9 Apr 2017 10:24:29 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 17:24:38 -0000 On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: > On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: >> [I've identified the code path involved is the arm64 small = allocations >> turning into zeros for later fork-then-swapout-then-back-in, >> specifically the ongoing RES(ident memory) size decrease that >> "top -PCwaopid" shows before the fork/swap sequence. Hopefully >> I've also exposed enough related information for someone that >> knows what they are doing to get started with a specific >> investigation, looking for a fix. I'd like for a pine64+ >> 2GB to have buildworld complete despite the forking and >> swapping involved (yep: for a time zero RES(ident memory) for >> some processes involved in the build).] >=20 > I was not able to follow the walls of text, but do not think that > I pmap_ts_reference() is the real culprit there. >=20 > Is my impression right that the issue occurs on fork, and looks as > a memory corruption, where some page suddently becomes zero-filled ? > And swapping seems to be involved ? It is somewhat interesting to see > if the problem is reproducable on non-arm64 machines, e.g. armv7 or = amd64. Yes, yes, non-arm64 that I've tried works. But I think that the following extra detail my be of use: what top shows for RES over time is also odd on arm64 (only) and the amount of pages that are zeroed is proportional to the decrease in RES. In the test sequence: A) Allocate lots of 14 KiByte allocations and initialize the content of = each to non-zero. The example ends up with RES of about 265M. B) sleep some amount of time, I've been using well over 30 seconds here. C) fork D) sleep again (parent and child), also forcing swapping during the = sleep (I used stress, manually run.) E) Test the memory pattern in the parent and child process, passing over all the bytes, failed and good. Both the parent and the child in (E) see the first pages allocated as = zero, with the number of pages being zero increasing as the sleep time in (B) increases (as long as the sleep is over 30 sec or so). The parent and = child match for which pages are zero vs. not. It fails with (B) being a no-op as well. But the proportionality with the time for the sleep is interesting. During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec or so. The fork in (C) produces a child that does not have the same RES as the parent but instead a tiny RES (80K as I remember). During (E) the child's RES increases to full size. My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES decrease during (B). The child process gets the same RES as the parent as well, unlike for arm64. In the failing context (arm64) RES in the parent decreases during (D) before the swap-out as well. > If answers to my two questions are yes, there is probably some bug = with > arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not = provide > hardware dirty bit, and pmap interprets an accessed writeable page as > unconditionally dirty. More, accessed bit is also not maintained by > hardware, instead if should be set by pmap. And arm64 pmap sets the > AF bit unconditionally when creating valid pte. fork-then-swap-out/in is required to see the problem. Neither fork by itself nor swapping (zero RES as shown in top) by itself have shown the problem so far. > Hmm, could you try the following patch, I did not even compiled it. I'll try it later today. > diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c > index 3d5756ba891..55aa402eb1c 100644 > --- a/sys/arm64/arm64/pmap.c > +++ b/sys/arm64/arm64/pmap.c > @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) > sva +=3D L3_SIZE) { > l3 =3D pmap_load(l3p); > if (pmap_l3_valid(l3)) { > + if ((l3 & ATTR_SW_MANAGED) && > + pmap_page_dirty(l3)) { > + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & > + ~ATTR_MASK)); > + } > pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); > PTE_SYNC(l3p); > /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sun Apr 9 18:02:08 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C43DCD3607A for ; Sun, 9 Apr 2017 18:02:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8ACAF895 for ; Sun, 9 Apr 2017 18:02:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-io0-x230.google.com with SMTP id r16so19129880ioi.2 for ; Sun, 09 Apr 2017 11:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=uuoVb0oAtQXZ/NyuchdxgsKc/vAdEk+3WWY3C6y6/HA=; b=TFzidzG5LLGHLFZBiHNcCZOSGvHzlkQPZ1GVbvaUnFH34Eddap/PZn9IedLvAhz32j 0/StQdzqEq8vTTtdeFFytHoFdqraWZ8Tb/bgyLAeYavygNlz9Ts2tbecgf7OnW4y79nY LwsbfSADRxJjaRhB8Qy8i7r5gaOburE0gR+CIxXLJggUQAMjL0NnqSq4ekQySJalSw7J qBrsch3TvqAgKx0H8GaT72mRQlPWnNetHVDRbxCFHChqpJUkqRlOD/lqja6h1Gras/+H skkttrOv9mnVdZgxfmHm0TK/voQ48q/HbQwCwJygZ/G5zokii4MPiom/jb9ztY9FRS9Q PjJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=uuoVb0oAtQXZ/NyuchdxgsKc/vAdEk+3WWY3C6y6/HA=; b=AZE9dH5+dtwJW8xLuxlh+9fSDh9DM32keOkF70MWLBwJuZBCR0FJMOkdarMpwSXTNz SVJhz4oUNWxTFGByxNnbeK8eVR3ZRmkE9zeXxPeXcFAXO7rHqFe1tC65scnS+rjdBd+V 1UlYCt8YH/GAtPFguepBI9IJkFcIySycvclWnAW28JDNgMC95vDQAeO1YEKrpwNQHUB2 TURhB74bwk1g75H3MSRs/0fPnlDw640MonFd13o7U/GPVH2cbpOs3telA9QrozRk5v7/ 2SeKU7MQ4Gtdv+cjYVLxuLsoRg/5BbpFFWG07WwjmsZnA4S1RqqBDdZPJtdUjhgFwCkp axXw== X-Gm-Message-State: AFeK/H12LNHWavsepZmXn3VRggpOU38TrpP75uVwqgMl+3AjJV1iyT2RArpq5TIbNs9ARJX06Ro9mjGmSVzSQQ== X-Received: by 10.107.164.36 with SMTP id n36mr46312095ioe.103.1491760927967; Sun, 09 Apr 2017 11:02:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.19.33 with HTTP; Sun, 9 Apr 2017 11:02:07 -0700 (PDT) In-Reply-To: References: From: Ryan Stone Date: Sun, 9 Apr 2017 14:02:07 -0400 Message-ID: Subject: Re: Understanding the FreeBSD locking mechanism To: vasanth sabavat Cc: Yubin Ruan , Ed Schouten , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 18:02:08 -0000 On Sun, Apr 9, 2017 at 12:48 PM, vasanth sabavat wrote: > Isn't it true that interrupt handlers instead of running on the current > thread stack now have their own thread? > It depend on what you mean by "interrupt handler" in this context,as that's ambiguous in FreeBSD. Most driver interrupt handling is done through an ithread, which does have its own thread context, and MTX_DEF mutexes are the appropriate locking primitive to use with them. However, it is possible to handle an interrupt through what FreeBSD calls an "interrupt filter", which runs on the kernel stack of whatever thread happened to be running on the CPU, and therefore you must use a spinlock to synchronize with an interrupt. FreeBSD prefers the use of ithreads and MTX_DEF mutexes over filters and spinlocks. Sorry for the use of confusing terminology. I considering referring interrupt filters in my last message, but I figured the term would be unfamiliar to someone not intimately familiar with FreeBSD internals so I decided to avoid it. From owner-freebsd-hackers@freebsd.org Sun Apr 9 18:25:03 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F3C19D36B09 for ; Sun, 9 Apr 2017 18:25:02 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-7.reflexion.net [208.70.210.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B9D54AD0 for ; Sun, 9 Apr 2017 18:25:02 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 25550 invoked from network); 9 Apr 2017 18:25:01 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 9 Apr 2017 18:25:01 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 09 Apr 2017 14:25:01 -0400 (EDT) Received: (qmail 6531 invoked from network); 9 Apr 2017 18:25:00 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Apr 2017 18:25:00 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 2747CEC8630; Sun, 9 Apr 2017 11:25:00 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> Date: Sun, 9 Apr 2017 11:24:59 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 18:25:03 -0000 On 2017-Apr-9, at 10:24 AM, Mark Millard wrote: > On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >=20 >> On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: >>> [I've identified the code path involved is the arm64 small = allocations >>> turning into zeros for later fork-then-swapout-then-back-in, >>> specifically the ongoing RES(ident memory) size decrease that >>> "top -PCwaopid" shows before the fork/swap sequence. Hopefully >>> I've also exposed enough related information for someone that >>> knows what they are doing to get started with a specific >>> investigation, looking for a fix. I'd like for a pine64+ >>> 2GB to have buildworld complete despite the forking and >>> swapping involved (yep: for a time zero RES(ident memory) for >>> some processes involved in the build).] >>=20 >> I was not able to follow the walls of text, but do not think that >> I pmap_ts_reference() is the real culprit there. >>=20 >> Is my impression right that the issue occurs on fork, and looks as >> a memory corruption, where some page suddently becomes zero-filled ? >> And swapping seems to be involved ? It is somewhat interesting to = see >> if the problem is reproducable on non-arm64 machines, e.g. armv7 or = amd64. >=20 > Yes, yes, non-arm64 that I've tried works. >=20 > But I think that the following extra detail my be of use: what top > shows for RES over time is also odd on arm64 (only) and the amount > of pages that are zeroed is proportional to the decrease in RES. >=20 > In the test sequence: >=20 > A) Allocate lots of 14 KiByte allocations and initialize the content = of each > to non-zero. The example ends up with RES of about 265M. I did forget to list one important property: why I picked 14 KiBytes. A) Any allocation sizes <=3D 14 KiBytes that I've tried gets the zero's problem in my arm64 contexts (bpim3 and rip3). B) Any allocation size >=3D 14 KiBYtes + 1 Byte that I've tried works in those contexts. For the arm64 contexts that I use this happens to match with the jemalloc SMALL_MAXCLASS size boundary. When I looked it appeared that 14 Ki was the smallest SMALL_MAXCLASS value in jemalloc so it would always fit the category. > B) sleep some amount of time, I've been using well over 30 seconds = here. >=20 > C) fork >=20 > D) sleep again (parent and child), also forcing swapping during the = sleep > (I used stress, manually run.) >=20 > E) Test the memory pattern in the parent and child process, passing = over > all the bytes, failed and good. >=20 > Both the parent and the child in (E) see the first pages allocated as = zero, > with the number of pages being zero increasing as the sleep time in = (B) > increases (as long as the sleep is over 30 sec or so). The parent and = child > match for which pages are zero vs. not. >=20 > It fails with (B) being a no-op as well. But the proportionality with > the time for the sleep is interesting. >=20 > During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec > or so. The fork in (C) produces a child that does not have the same = RES > as the parent but instead a tiny RES (80K as I remember). During (E) > the child's RES increases to full size. >=20 > My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES > decrease during (B). The child process gets the same RES as the parent > as well, unlike for arm64. >=20 > In the failing context (arm64) RES in the parent decreases during (D) > before the swap-out as well. >=20 >> If answers to my two questions are yes, there is probably some bug = with >> arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not = provide >> hardware dirty bit, and pmap interprets an accessed writeable page as >> unconditionally dirty. More, accessed bit is also not maintained by >> hardware, instead if should be set by pmap. And arm64 pmap sets the >> AF bit unconditionally when creating valid pte. >=20 > fork-then-swap-out/in is required to see the problem. Neither fork > by itself nor swapping (zero RES as shown in top) by itself have > shown the problem so far. >=20 >> Hmm, could you try the following patch, I did not even compiled it. >=20 > I'll try it later today. >=20 >> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >> index 3d5756ba891..55aa402eb1c 100644 >> --- a/sys/arm64/arm64/pmap.c >> +++ b/sys/arm64/arm64/pmap.c >> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >> sva +=3D L3_SIZE) { >> l3 =3D pmap_load(l3p); >> if (pmap_l3_valid(l3)) { >> + if ((l3 & ATTR_SW_MANAGED) && >> + pmap_page_dirty(l3)) { >> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >> + ~ATTR_MASK)); >> + } >> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >> PTE_SYNC(l3p); >> /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sun Apr 9 20:13:06 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 223DBD360A9 for ; Sun, 9 Apr 2017 20:13:06 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from smtpq3.tb.ukmail.iss.as9143.net (smtpq3.tb.ukmail.iss.as9143.net [212.54.57.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DAB7A107 for ; Sun, 9 Apr 2017 20:13:04 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from [212.54.57.81] (helo=smtp2.tb.ukmail.iss.as9143.net) by smtpq3.tb.ukmail.iss.as9143.net with esmtp (Exim 4.86_2) (envelope-from ) id 1cxIsd-0002cD-Bb for freebsd-hackers@freebsd.org; Sun, 09 Apr 2017 21:52:11 +0200 Received: from oxbe4.tb.ukmail.iss.as9143.net ([172.25.160.135]) by smtp2.tb.ukmail.iss.as9143.net with bizsmtp id 6Ks71v0052vaL8C01Ks7TV; Sun, 09 Apr 2017 21:52:07 +0200 X-SourceIP: 172.25.160.135 X-Authenticated-User: j.deboynepollard-newsgroups@ntlworld.com Date: Sun, 9 Apr 2017 20:52:07 +0100 (BST) From: Jonathan de Boyne Pollard Reply-To: Jonathan de Boyne Pollard To: Debian users , FreeBSD Hackers , Supervision Message-ID: <731531599.156033.1491767527334.JavaMail.open-xchange@oxbe4.tb.ukmail.iss.as9143.net> In-Reply-To: References: <54430B41.3010301@NTLWorld.com> <76c00c13-4cc9-ed9c-f48f-81a3f050b80b@NTLWorld.com> <0d6afc48-3465-3509-ff46-494da45022bc@NTLWorld.com> Subject: nosh version 1.33 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v7.6.2-Rev60 X-Originating-IP: 86.10.211.13 X-Originating-Client: com.openexchange.ox.gui.dhtml X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 20:13:06 -0000 The nosh package is now up to version 1.33 . * http://jdebp.eu./Softwares/nosh/ * https://www.freebsd.org/news/status/report-2015-07-2015-09.html#The-nosh-Project * http://jdebp.info./Softwares/nosh/ This has been held back because of work being done by someone else. I don't want to steal xyr thunder, so I'll leave the announcement of that work to xem. Suffice it to say that it will interest a new group of people. There are several major improvements in 1.33 . Packaging --------- In the version 1.29 announcement I said that the Debian packaging system was going to be brought into line with the system used for FreeBSD/TrueOS and OpenBSD. This is now done. Debian and the BSDs all now use a similar system for generating each package manager's package maintenance instructions from an abstract package description. ============================================================== =========== IMPORTANT UPGRADE NOTE FOR Debian: =============== ============================================================== An important consequence of the aforementioned is that the semantics of the nosh-bundles package have changed. In earlier versions, the various nosh-run-* packages were how one set services running, except for a small rump set of services that were set up by the nosh-bundles package. This is now no longer the case. The nosh-bundles package now presets and starts no services at all. *All* running of services must be achieved with the nosh-run-* packages or some other sets of scripts and presets. To this end, there are now two new packages, nosh-run-debian-desktop-base and nosh-run-debian-server-base. These parallel the nosh-run-{freebsd,trueos}-{desktop,server}-base packages already available since 1.29 for FreeBSD/TrueOS. You must install, for a working fully-nosh-managed system, exactly one of the nosh-run-debian-{desktop,server}-base packages. If you are running nosh service management under systemd, you can of course run as many or as few services under the nosh service manager as you care to switch over from systemd. But if you are running a fully-nosh-managed system these packages will arrange to run the various fundamentals that one pretty much cannot do without, such as mounting/unmounting volumes, running udev/eudev/vdev/mdev, binfmt loading, and initializing the PRNG. Log service account names ------------------------- The naming scheme used for the user accounts for dedicated log service users has changed. Installing the new nosh-bundles package should automatically rename all existing log service accounts to use the new scheme. The new naming scheme is slightly more compact, and copes better with services that have things like underscores and plus characters (e.g. powerd++) in their names. As an ancillary to this, system-control now has an "escape" subcommand which can be (and indeed is) used in scripts to perform the escaping transformations. More packages ------------- There are now four more -shims packages, for commands whose names conflict with commands from other packages: nosh-kbd-shims, nosh-bsd-shims, nosh-core-shims, and nosh-execline-shims. nosh-kbd-shims, for example, contains a chvt shim that is an alias for the (also new) console-multiplexor-control command; with it, and suitable privileges to access the virtual terminal's input queue, one can switch between multiplexed user-space virtual terminals in much the same way as the old chvt command does with kernel virtual terminals. The Z Shell command-line completion for the various commands in the toolset (system-control, svcadm, shutdown, svstat, and so forth), which has been available to the people building from source for a while, is now also available as a binary package. Configuration import -------------------- ldconfig on TrueOS is now properly handled. In particular, the external configuration import subsystem now correctly pulls in and converts all of the ldconfig directories. (TrueOS has a lot more things that require ldconfig support than stock FreeBSD does.) The configuration import subsystem also now handles instances of Percona server, alongside MySQL and MariaDB. Moreover, these are now handled by the same set of service bundles, which always produce service bundles named mysql@*. MySQL version 5.7 or later is now assumed. The configuration import subsystem now automatically generates OpenVPN service bundles based upon the current OpenVPN configuration. ======================= ==== CAVE: OpenVPN ==== ======================= The upgrade process attempts to remove the old hardwired openvpn@server and openvpn@client service bundles. However, you might encounter remnants of these service bundles lying around in /var/sv that you will find that you need to clean up by hand. GOPHER ------ To accompany the new gopherd server in djbwares 5, there is a gopher6d service bundle that runs it, serving up the same static files area as http6d, https6d, and ftp4d do. The FreeBSD, OpenBSD, and Debian package repositories can now be browsed with GOPHER. This is gopherd in action. On the server side, generating the index.gopher files is a fairly humdrum exercise in the use of redo (to regenerate the indexes only when the directory contents change) and printf (to construct the GOPHER format menus). UCSPI-UNIX ---------- Two new UCSPI tools have been added to enable UCSPI-UNIX servers to listen on and accept connections on AF_UNIX sequential packet sockets. udevd is one such server, and it is now handed its listening socket at startup rather than expected to open its own. From owner-freebsd-hackers@freebsd.org Sun Apr 9 20:25:20 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9409DD3634B for ; Sun, 9 Apr 2017 20:25:20 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-8.reflexion.net [208.70.210.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B5E383A for ; Sun, 9 Apr 2017 20:25:19 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 10386 invoked from network); 9 Apr 2017 20:26:17 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 9 Apr 2017 20:26:17 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 09 Apr 2017 16:25:18 -0400 (EDT) Received: (qmail 21557 invoked from network); 9 Apr 2017 20:25:17 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Apr 2017 20:25:17 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 28AD4EC8630; Sun, 9 Apr 2017 13:25:17 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> Date: Sun, 9 Apr 2017 13:25:16 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 20:25:20 -0000 [I've not tried building the kernel with your patch yet.] Top post of new, independent information. Jordan Gordeev made a testing suggestion that got me to look at kdumps of runs with jemalloc allocations sizes that fail (14*1024) vs. work (14*1024+1). Example comparison: 2258 swaptesting6 0.000169 CALL = mmap(0,0x200000,0x3,0x1002,0xf= fffffff,0) 2258 swaptesting6 0.000047 RET mmap 1080033280/0x40600000 vs. 2325 swaptesting7 0.000091 CALL = mmap(0,0x200000,0x3,0x1002,0xf= fffffff,0) 2325 swaptesting7 0.000024 RET mmap 1080033280/0x40600000 No difference. And so it goes. What varies is the number of mmap's: the larger jemalloc allocation size gets more mmap's for the same number of jemalloc allocations. (All the mmap's from my program's explicit allocations are together, = back-to-back, with no other traced activity between.) But varying the number of jemalloc allocations in the program varies the = number of mmap calls, yet the size of the individual jemalloc allocations still = makes the difference between failure (zeroed pages after fork-then-swap) and = success. This problem is a complicated one to classify/isolate. After the allocations there is not much activity visible in kdump output. I traced with "-t +" and so avoided page fault tracing but got most everything else. I may have to ktrace the page faults for the two jemalloc allocation sizes and see if anything stands out. On 2017-Apr-9, at 11:24 AM, Mark Millard wrote: > On 2017-Apr-9, at 10:24 AM, Mark Millard = wrote: >=20 >> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >>=20 >>> On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote: >>>> [I've identified the code path involved is the arm64 small = allocations >>>> turning into zeros for later fork-then-swapout-then-back-in, >>>> specifically the ongoing RES(ident memory) size decrease that >>>> "top -PCwaopid" shows before the fork/swap sequence. Hopefully >>>> I've also exposed enough related information for someone that >>>> knows what they are doing to get started with a specific >>>> investigation, looking for a fix. I'd like for a pine64+ >>>> 2GB to have buildworld complete despite the forking and >>>> swapping involved (yep: for a time zero RES(ident memory) for >>>> some processes involved in the build).] >>>=20 >>> I was not able to follow the walls of text, but do not think that >>> I pmap_ts_reference() is the real culprit there. >>>=20 >>> Is my impression right that the issue occurs on fork, and looks as >>> a memory corruption, where some page suddently becomes zero-filled ? >>> And swapping seems to be involved ? It is somewhat interesting to = see >>> if the problem is reproducable on non-arm64 machines, e.g. armv7 or = amd64. >>=20 >> Yes, yes, non-arm64 that I've tried works. >>=20 >> But I think that the following extra detail my be of use: what top >> shows for RES over time is also odd on arm64 (only) and the amount >> of pages that are zeroed is proportional to the decrease in RES. >>=20 >> In the test sequence: >>=20 >> A) Allocate lots of 14 KiByte allocations and initialize the content = of each >> to non-zero. The example ends up with RES of about 265M. >=20 > I did forget to list one important property: why I picked 14 KiBytes. >=20 > A) Any allocation sizes <=3D 14 KiBytes that I've tried > gets the zero's problem in my arm64 contexts (bpim3 and rip3). >=20 > B) Any allocation size >=3D 14 KiBYtes + 1 Byte that I've > tried works in those contexts. >=20 > For the arm64 contexts that I use this happens to match with > the jemalloc SMALL_MAXCLASS size boundary. When I looked it > appeared that 14 Ki was the smallest SMALL_MAXCLASS value > in jemalloc so it would always fit the category. >=20 >> B) sleep some amount of time, I've been using well over 30 seconds = here. >>=20 >> C) fork >>=20 >> D) sleep again (parent and child), also forcing swapping during the = sleep >> (I used stress, manually run.) >>=20 >> E) Test the memory pattern in the parent and child process, passing = over >> all the bytes, failed and good. >>=20 >> Both the parent and the child in (E) see the first pages allocated as = zero, >> with the number of pages being zero increasing as the sleep time in = (B) >> increases (as long as the sleep is over 30 sec or so). The parent and = child >> match for which pages are zero vs. not. >>=20 >> It fails with (B) being a no-op as well. But the proportionality with >> the time for the sleep is interesting. >>=20 >> During (B) "top -PCwaopid" shows RES decreasing, starting after 30 = sec >> or so. The fork in (C) produces a child that does not have the same = RES >> as the parent but instead a tiny RES (80K as I remember). During (E) >> the child's RES increases to full size. >>=20 >> My powerpc64, armv7, and amd64 tests of such do not fail, nor does = RES >> decrease during (B). The child process gets the same RES as the = parent >> as well, unlike for arm64. >>=20 >> In the failing context (arm64) RES in the parent decreases during (D) >> before the swap-out as well. >>=20 >>> If answers to my two questions are yes, there is probably some bug = with >>> arm64 pmap handling of the dirty bit emulation. ARMv8.0 does not = provide >>> hardware dirty bit, and pmap interprets an accessed writeable page = as >>> unconditionally dirty. More, accessed bit is also not maintained by >>> hardware, instead if should be set by pmap. And arm64 pmap sets the >>> AF bit unconditionally when creating valid pte. >>=20 >> fork-then-swap-out/in is required to see the problem. Neither fork >> by itself nor swapping (zero RES as shown in top) by itself have >> shown the problem so far. >>=20 >>> Hmm, could you try the following patch, I did not even compiled it. >>=20 >> I'll try it later today. >>=20 >>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >>> index 3d5756ba891..55aa402eb1c 100644 >>> --- a/sys/arm64/arm64/pmap.c >>> +++ b/sys/arm64/arm64/pmap.c >>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >>> sva +=3D L3_SIZE) { >>> l3 =3D pmap_load(l3p); >>> if (pmap_l3_valid(l3)) { >>> + if ((l3 & ATTR_SW_MANAGED) && >>> + pmap_page_dirty(l3)) { >>> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >>> + ~ATTR_MASK)); >>> + } >>> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >>> PTE_SYNC(l3p); >>> /* XXX: Use pmap_invalidate_range */ =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sun Apr 9 22:10:01 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24872D3683F for ; Sun, 9 Apr 2017 22:10:01 +0000 (UTC) (envelope-from alfred@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 16E07A4C for ; Sun, 9 Apr 2017 22:10:01 +0000 (UTC) (envelope-from alfred@freebsd.org) Received: from Alfreds-MacBook-Pro-2.local (unknown [IPv6:2601:645:8003:a4d6:80a8:3cdd:4e29:76fd]) by elvis.mu.org (Postfix) with ESMTPSA id B19E9346DDF5 for ; Sun, 9 Apr 2017 15:10:00 -0700 (PDT) Subject: Re: One Priority Per Run Queue To: freebsd-hackers@freebsd.org References: <1aafd6a2-828c-06f5-bdac-e4c953a403b5@FreeBSD.org> From: Alfred Perlstein Organization: FreeBSD Message-ID: <836da108-7e25-fc94-2c84-bc2f85bb6398@freebsd.org> Date: Sun, 9 Apr 2017 15:10:33 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 22:10:01 -0000 On 3/29/17 2:18 PM, Warner Losh wrote: > On Wed, Mar 29, 2017 at 2:00 PM, Eric van Gyzen wrote: >> The FreeBSD schedulers assign four priorities to each run queue, making >> those priorities effectively equal. This breaks POSIX real-time priorities. >> >> Applications that use real-time scheduling use sched_get_priority_min() >> and sched_get_priority_max() [0] to determine the available range of >> priorities, and then use simple arithmetic to assign relatively higher >> or lower priorities. If an application configures two threads with >> priorities MAX and MAX-1 (for example), POSIX says the thread at >> priority MAX must be chosen if it is runnable. Since our implementation >> puts these two priorities in the same run queue, it may choose either >> thread, so it does not conform. >> >> The above functions currently return 0 and 31, respectively. One >> solution would change max() to return 7 and change other code to >> translate the 8 POSIX values into the 32 FreeBSD values. However, this >> would also not conform, because "conforming implementations shall >> provide a priority range of at least 32 priorities for this policy." [1] >> >> I propose that we assign one priority per run queue: >> >> https://reviews.freebsd.org/D10188 >> >> This would conform to POSIX. On a certain commercial block storage >> product, this change made no difference in performance. Benchmarks of >> buildworld on two different machines actually showed a tiny improvement >> in performance. [2] >> >> Please test the above change, especially if you have an interesting >> workload that might be sensitive to scheduler behavior. If you already >> know this change would cause problems, please point me toward the details. >> >> Assigning 4 priorities per run queue also caused a recent portability >> issue in ZFS, although that was fixed by r314058. > How does this scheme prevent starvation of low priority processes? Or > rather, how will this change after this change. > It would seem that for userland this should allow for starvation as that's the point. However once inside the kernel and any locks are taken you must do at minimum priority lending or bump priority higher otherwise you can cause deadlock. I thought we already do this...? -Alfred From owner-freebsd-hackers@freebsd.org Mon Apr 10 00:10:13 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A32E6D36D83 for ; Mon, 10 Apr 2017 00:10:13 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-4.reflexion.net [208.70.210.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 491DAAA5 for ; Mon, 10 Apr 2017 00:10:12 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 25736 invoked from network); 10 Apr 2017 00:10:11 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 10 Apr 2017 00:10:11 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 09 Apr 2017 20:10:11 -0400 (EDT) Received: (qmail 24322 invoked from network); 10 Apr 2017 00:10:10 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 10 Apr 2017 00:10:10 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 46C38EC7901; Sun, 9 Apr 2017 17:10:10 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> Date: Sun, 9 Apr 2017 17:10:09 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 00:10:13 -0000 On 2017-Apr-9, at 10:24 AM, Mark Millard wrote: > On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >=20 >> Hmm, could you try the following patch, I did not even compiled it. >=20 > I'll try it later today. >=20 >> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >> index 3d5756ba891..55aa402eb1c 100644 >> --- a/sys/arm64/arm64/pmap.c >> +++ b/sys/arm64/arm64/pmap.c >> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >> sva +=3D L3_SIZE) { >> l3 =3D pmap_load(l3p); >> if (pmap_l3_valid(l3)) { >> + if ((l3 & ATTR_SW_MANAGED) && >> + pmap_page_dirty(l3)) { >> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >> + ~ATTR_MASK)); >> + } >> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >> PTE_SYNC(l3p); >> /* XXX: Use pmap_invalidate_range */ Preliminary testing indicates that this fixes the some-pages-become-zero problem for fork-then-swapout/in. Thanks! I'll see if a buildworld can go through without being stopped by the type of issue. But that will take a while. (It is how I originally ran into the problem(s) that others had been reporting on the lists.) Side notes: The decreasing-RES(ident memory) behavior was unchanged. The "child gets only 80K RES initially" behavior was also unchanged. (These are as shown by "top -PCwaopid" . These are just differences with what I see for other TARGET_ARCH's.) =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Mon Apr 10 01:28:38 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 244F6D3571E for ; Mon, 10 Apr 2017 01:28:38 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-oi0-x242.google.com (mail-oi0-x242.google.com [IPv6:2607:f8b0:4003:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CC868350 for ; Mon, 10 Apr 2017 01:28:37 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-oi0-x242.google.com with SMTP id w197so9934847oiw.1 for ; Sun, 09 Apr 2017 18:28:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=m7+gMY3lTT1hn2K6qNpp4NVm7bDJCwBzlsnyQ7RhTrE=; b=u3LNmAxNLhVRNJoBfwUIyPRTLFOqfLiFvBIYYzpGoJ7NM99PnPlprQ4nGbTD/TjV8q W6m81iYn3bAsVD/wvnMexgHx1tMruvi0M60ir4lNfMDpXbg7Go3jeIvX2yMc/WmRjTYx 8SScBBCcM6EjApMObiho0a+Q6MW5t+I2/QxoMuCvwlcY7qJkiiuidxChlR++yXj5cez7 7iP8tivsF4Ha9X3DMWIGmUEJojype/I2q7TiGqOmkdUaIiSrI76NgaQy4njcoI1PeAyt 8dQ4vvEJ++4KvHr+3EipmhlDXF9+K4OiwFgNNix/tma8GadA18BpaNrmTvcla1tIYe8D RbzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=m7+gMY3lTT1hn2K6qNpp4NVm7bDJCwBzlsnyQ7RhTrE=; b=tmsJlzFYpfhsF9Oix7p6IoGT4Nu+7pecdsyIbe2kbfVNrQpt9AvX8liYrJ6ubfdtH5 RxD6B8kaf/agZiDqba9unarU/qJDGoLd5Eoauq/VqU6Ot+0bLLuXK+PJCwvUyO0e0PN6 6fD5kEqEOY88Zgk0WQnP8j38gZhTWg7YC/SI/vYUWvR/5rsJnFKpMelqo5e0ffo/kF2M 18aZRa8Zz7PmauDZR4bfwIcVl+fJlm4nMLPojUInKtBMvjnDQ8eMs/hIO5LUkgyn5rIQ jyb2edBkbp4barJOTIPS07HMhSINRV48uIm4MOUo6YcU05q3SfGXP9P9ZFYlSW94WIYA mEWA== X-Gm-Message-State: AFeK/H0DOo7y1rm2KTLmrueWFt1FkdXtgwsuRZ9JSx2Iinymc38Iyb1U8lNfOXTEt7xm0A== X-Received: by 10.202.228.17 with SMTP id b17mr26697188oih.212.1491787716997; Sun, 09 Apr 2017 18:28:36 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id v49sm5638242otb.13.2017.04.09.18.28.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 18:28:35 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Ryan Stone References: Cc: Ed Schouten , "freebsd-hackers@freebsd.org" From: Yubin Ruan Message-ID: <3f93930c-7f10-4d0b-35f2-2b07d64081f0@gmail.com> Date: Mon, 10 Apr 2017 09:28:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 01:28:38 -0000 On 2017/4/10 0:24, Ryan Stone wrote: > > > On Sun, Apr 9, 2017 at 6:13 AM, Yubin Ruan > wrote: > > > #######1, spinlock used in an interrupt handler > If a thread A holding a spinlock T get interrupted and the interrupt > handler responsible for this interrupt try to acquire T, then we have > deadlock, because A would never have a chance to run before the > interrupt handler return, and the interrupt handler, unfortunately, > will continue to spin ... so in this situation, one has to disable > interrupt before spinning. > > As far as I know, in Linux, they provide two kinds of spinlocks: > > spin_lock(..); /* spinlock that does not disable interrupts */ > spin_lock_irqsave(...); /* spinlock that disable local interrupt * > > > In the FreeBSD locking style, a spinlock is only used in the case where > one needs to synchronize with an interrupt handler. This is why spinlocks > always disable local interrupts in FreeBSD. > > FreeBSD's lock for the first case is the MTX_DEF mutex, which is > adaptively-spinning blocking mutex implementation. In short, the MTX_DEF > mutex will spin waiting for the lock if the owner is running, but will > block if the owner is deschedules. This prevents expensive trips through > the scheduler for the common case where the mutex is only held for short > periods, without wasting CPU cycles spinning in cases where the owner thread > is descheduled and therefore will not be completing soon. Great explanation! I read the man page at: > https://www.freebsd.org/cgi/man.cgi?query=mutex&sektion=9&apropos=0&manpath=FreeBSD+11.0-RELEASE+and+Ports and now clear about MTX_DEF and MTX_SPIN mutexs. But, still a few more question, if you don't mind: Is it true that a thread holding a MTX_DEF mutex can be descheduled? (shouldn't it disable interrupt like a MTX_SPIN mutex?) It is said on the main page that MTX_DEF mutex is used by default in FreeBSD, so its usecase must be very common. If a thread holding a MTX_DEF mutex can be descheduled, which means that it did not disable interrupt, then we may have lots of deadlock here, right? > > #######2, priority inversion problem > If thread B with a higher priority get in and try to acquire the lock > that thread A currently holds, then thread B would spin, while at the > same time thread A has no chance to run because it has lower priority, > thus not being able to release the lock. > (I haven't investigate enough into the source code, so I don't know > how FreeBSD and Linux handle this priority inversion problem. Maybe > they use priority inheritance or random boosting?) > > > FreeBSD's spin locks prevent priority inversion by preventing the holder > thread from being descheduled. > > MTX_DEF locks implement priority inheritance. Nice hints. Thanks! regards, Yubin Ruan From owner-freebsd-hackers@freebsd.org Mon Apr 10 01:51:47 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C7B3D35DA0 for ; Mon, 10 Apr 2017 01:51:47 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x242.google.com (mail-io0-x242.google.com [IPv6:2607:f8b0:4001:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 06683FED for ; Mon, 10 Apr 2017 01:51:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x242.google.com with SMTP id 68so13145607ioh.3 for ; Sun, 09 Apr 2017 18:51:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=MFIO27u4jXiPmjPOhFGd/MFX/n4NDaWQVAaq0tNGpmw=; b=eX/lkFirGcuoOEUaDvNFUCN3nP38/78LypHmxs66SbLaoIkhr1pZTlE7ATDFKsG7E6 xujQH4uFK3cVxwkPx7SswQDFxde86dkDNwEgLeoYcyiPSpcH8quJr2I6tzU/kluyfNPj lp8UOXS+8ULo6HkgQWfZggXh97eDuWTzvHkDlSoXqs8NAJQ3y+4aVf9yIflNfvcAG813 ig+zcU1t93J2P4YMnE+jGLuHPvXxk970k8SbPTJXbtdTHEWZOSGr3dYfklT8w7aYnm9W F48XXhk9+k2FfjtxCWOVw2jy5fVmDC5ANiZsXcF3qJAAALQmtlwP3mPQceZwGGGh6x5r xQtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=MFIO27u4jXiPmjPOhFGd/MFX/n4NDaWQVAaq0tNGpmw=; b=XpjKy2TdhZ4uhv4+bQ6n9M0nUgLRlBkerE75tQhyWZdTutzYM4WkYIb6XeUJobQKqW fXZIZfnlrXm010Ms3aAu7qOGNZKW8j8poB9CAAvhgtuLa/+w9nS7u2JuvktBHyaUNqmV o72x2mzPYmPaVYMzievUnsGxbib8v6X2bU6tQLDMMJ0/wcqFFw9DA0Sl64NPLFV1rjv4 RFhwK+taekhfq7Czxb8c1GvMpBv53ANOpmry/Y5ymLLSTyibh3ThJ3HxqbPRgrAmRo2T RT92i+79h/xPvGcIpccoJyZ/R8BVZbrUVCfG+80UvMJV0o9DYe/XzJhVDCny9f3minDz wt9w== X-Gm-Message-State: AN3rC/7sJAYQACOCsE+rer4KUETvOQOWDHTB7tHjSbHXGYBJFXeMyVAWCqhq+o8mitMWpWTksvOPdUChRrN6aA== X-Received: by 10.107.134.76 with SMTP id i73mr5787258iod.0.1491789105993; Sun, 09 Apr 2017 18:51:45 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.146.24 with HTTP; Sun, 9 Apr 2017 18:51:45 -0700 (PDT) X-Originating-IP: [2607:fb10:7021:1::b517] In-Reply-To: <3f93930c-7f10-4d0b-35f2-2b07d64081f0@gmail.com> References: <3f93930c-7f10-4d0b-35f2-2b07d64081f0@gmail.com> From: Warner Losh Date: Sun, 9 Apr 2017 19:51:45 -0600 X-Google-Sender-Auth: UCHBvUdt3vKI0KRKbrTdZaRoxUE Message-ID: Subject: Re: Understanding the FreeBSD locking mechanism To: Yubin Ruan Cc: Ryan Stone , "freebsd-hackers@freebsd.org" , Ed Schouten Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 01:51:47 -0000 On Sun, Apr 9, 2017 at 7:28 PM, Yubin Ruan wrote: > On 2017/4/10 0:24, Ryan Stone wrote: >> >> >> >> On Sun, Apr 9, 2017 at 6:13 AM, Yubin Ruan > > wrote: >> >> >> #######1, spinlock used in an interrupt handler >> If a thread A holding a spinlock T get interrupted and the interrupt >> handler responsible for this interrupt try to acquire T, then we have >> deadlock, because A would never have a chance to run before the >> interrupt handler return, and the interrupt handler, unfortunately, >> will continue to spin ... so in this situation, one has to disable >> interrupt before spinning. >> >> As far as I know, in Linux, they provide two kinds of spinlocks: >> >> spin_lock(..); /* spinlock that does not disable interrupts */ >> spin_lock_irqsave(...); /* spinlock that disable local interrupt * >> >> >> In the FreeBSD locking style, a spinlock is only used in the case where >> one needs to synchronize with an interrupt handler. This is why spinlocks >> always disable local interrupts in FreeBSD. >> >> FreeBSD's lock for the first case is the MTX_DEF mutex, which is >> adaptively-spinning blocking mutex implementation. In short, the MTX_DEF >> mutex will spin waiting for the lock if the owner is running, but will >> block if the owner is deschedules. This prevents expensive trips through >> the scheduler for the common case where the mutex is only held for short >> periods, without wasting CPU cycles spinning in cases where the owner >> thread >> is descheduled and therefore will not be completing soon. > > > Great explanation! I read the man page at: > >> >> https://www.freebsd.org/cgi/man.cgi?query=mutex&sektion=9&apropos=0&manpath=FreeBSD+11.0-RELEASE+and+Ports > > and now clear about MTX_DEF and MTX_SPIN mutexs. But, still a few more > question, if you don't mind: > > Is it true that a thread holding a MTX_DEF mutex can be descheduled? > (shouldn't it disable interrupt like a MTX_SPIN mutex?) It is said on > the main page that MTX_DEF mutex is used by default in FreeBSD, so its > usecase must be very common. If a thread holding a MTX_DEF mutex can be > descheduled, which means that it did not disable interrupt, then we may > have lots of deadlock here, right? Yes, they can be descheduled. But that's not a problem. No other thread can acquire the MTX_DEF lock. If another thread tries, it will sleep and wait for the thread that holds the MTX_DEF lock to release it. Eventually, the thread will get time to run again, and then release the lock. Threads that just hold a MTX_DEF lock may also migrate from CPU to CPU too. Warner >> #######2, priority inversion problem >> If thread B with a higher priority get in and try to acquire the lock >> that thread A currently holds, then thread B would spin, while at the >> same time thread A has no chance to run because it has lower priority, >> thus not being able to release the lock. >> (I haven't investigate enough into the source code, so I don't know >> how FreeBSD and Linux handle this priority inversion problem. Maybe >> they use priority inheritance or random boosting?) >> >> >> FreeBSD's spin locks prevent priority inversion by preventing the holder >> thread from being descheduled. >> >> MTX_DEF locks implement priority inheritance. > > > Nice hints. Thanks! > > regards, > Yubin Ruan > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Mon Apr 10 02:01:53 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50B2CD32010; Mon, 10 Apr 2017 02:01:53 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com [IPv6:2607:f8b0:4001:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 157446E7; Mon, 10 Apr 2017 02:01:53 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: by mail-io0-x22e.google.com with SMTP id l7so79076932ioe.3; Sun, 09 Apr 2017 19:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=z6n7AHYe/CeqWlfBZJM8VcXFSXQNpkIfC+ppK2DMQPo=; b=nXg6GCeXhiclxm7DtE3Y6Vbp6HgKF2WMa6ylfbPHzpi3uz+7fsP0VGz7iZwW/+2M4c B6OlsIym84KwZbSOtXVKYdcWfF+zquLbTzrStecnhDeVh4UG0Y2tUt3YFsMu7aGF61fH ygCS/E1EUxLgWsQFXmRBfYhCR5x8AOjvTGBTWFSDcaJEYNmG1gqTOT12gx/C6bMTrXKF ZUMtrLRMMEPxZ+ZGcrWRibX8LfsaFGP+uGReTHzRhTJJifRgG+HR8jB/oSTLChOqsdfW OwsVLx2Gy0e5DeES1N+43LzSIHTsO5lBZviBAAXxCLOQKo38EPc5Ej1JayOfHOavRSNk yO5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=z6n7AHYe/CeqWlfBZJM8VcXFSXQNpkIfC+ppK2DMQPo=; b=GMt/FUcJb3P0Vv7RGSMLLXK47LZkHqgculVM3w1h6XuhsIUZIElE+iFHv0yY6P3UiS yA8/sxc+iOcSoCYP1TTqqMz79UIBvVpeINhjQWmuERDdQTFhiGLPOiljWT2WWTptubOp r3qaXxK/Hi6lf9z/Twv84CFt/ZfRMYYszi1rKiFbPBA5niG2+ASvA85lFynrOUzn8GjY tOn6c2Mn7GQ51uv0tbsLek9FCY6n3OUekQkS0NbEaE6b3q5t3OMiegSg+zDaPU4U7dZB uQMIQf7bHuh1sWWLZ7NmPRnjtL99MS2jY0cP8ebOo421Uh69uW9vuBfX/7ioKz+zqAqx vdig== X-Gm-Message-State: AN3rC/5Xiet9lwe3Yexe2t4qbzEqeKjWGf4w4ZjDplCRVYnZ0TfWk40K /62Qx34QWrLvvbQEfBp1PuAgZHeQrg== X-Received: by 10.36.36.131 with SMTP id f125mr10145622ita.45.1491789712474; Sun, 09 Apr 2017 19:01:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.15.130 with HTTP; Sun, 9 Apr 2017 19:01:51 -0700 (PDT) Reply-To: alc@freebsd.org In-Reply-To: <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> From: Alan Cox Date: Sun, 9 Apr 2017 21:01:51 -0500 Message-ID: Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them To: Mark Millard Cc: Konstantin Belousov , andrew@freebsd.org, freebsd-hackers , freebsd-arm Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 02:01:53 -0000 On Sun, Apr 9, 2017 at 7:10 PM, Mark Millard wrote: > On 2017-Apr-9, at 10:24 AM, Mark Millard wrote: > > > On 2017-Apr-9, at 5:27 AM, Konstantin Belousov > wrote: > > > > >> Hmm, could you try the following patch, I did not even compiled it. > > > > I'll try it later today. > > > >> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c > >> index 3d5756ba891..55aa402eb1c 100644 > >> --- a/sys/arm64/arm64/pmap.c > >> +++ b/sys/arm64/arm64/pmap.c > >> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, > vm_offset_t eva, vm_prot_t prot) > >> sva += L3_SIZE) { > >> l3 = pmap_load(l3p); > >> if (pmap_l3_valid(l3)) { > >> + if ((l3 & ATTR_SW_MANAGED) && > >> + pmap_page_dirty(l3)) { > >> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 & > >> + ~ATTR_MASK)); > >> + } > >> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); > >> PTE_SYNC(l3p); > >> /* XXX: Use pmap_invalidate_range */ > > > Preliminary testing indicates that this fixes the > some-pages-become-zero problem for fork-then-swapout/in. > > Thanks! > > I'll see if a buildworld can go through without being stopped > by the type of issue. But that will take a while. (It is how > I originally ran into the problem(s) that others had been > reporting on the lists.) > > > Side notes: > > The decreasing-RES(ident memory) behavior was unchanged. > > The "child gets only 80K RES initially" behavior was also > unchanged. > > That is because the arm64 pmap doesn't implement pmap_copy(). > (These are as shown by "top -PCwaopid" . These are just > differences with what I see for other TARGET_ARCH's.) > > === > Mark Millard > markmi at dsl-only.net > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Mon Apr 10 02:20:25 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FEB9D324CA for ; Mon, 10 Apr 2017 02:20:25 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 04753E5C for ; Mon, 10 Apr 2017 02:20:24 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3A2GQ9G032228 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 9 Apr 2017 19:16:26 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3A2GQ2s032227; Sun, 9 Apr 2017 19:16:26 -0700 (PDT) (envelope-from torek) Date: Sun, 9 Apr 2017 19:16:26 -0700 (PDT) From: Chris Torek Message-Id: <201704100216.v3A2GQ2s032227@elf.torek.net> To: rysto32@gmail.com, vasanth.raonaik@gmail.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ablacktshirt@gmail.com, ed@nuxi.nl, freebsd-hackers@freebsd.org In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Sun, 09 Apr 2017 19:16:26 -0700 (PDT) X-Mailman-Approved-At: Mon, 10 Apr 2017 02:50:47 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 02:20:25 -0000 Ryan Stone is of course correct here. I have not kept up with the latest terminology shifts, but I can describe a bit of the history of all of this. (I was somewhat involved with writing the original MTX_DEF and MTX_SPIN mutex code way back when). (None of the rest of this should be new to experienced kernel hackers.) In the old non-SMP days, BSD, like traditional V6 Unix, divided the kernel into "top half" and "bottom half" sections. The top half was anything driven from something other than an interrupt, such as initial bootstrap or any user-sourced system call. Each of these had just one (per-process) kernel stack, in the "u. area", which was UPAGES * NBPG (number of bytes per page) bytes long, but also had to contain "struct user". (In other words, the stack space available was actually smaller than that. The "user" struct was *above* the kernel stack, so that ksp would not grow down into the structure; there was also signal trampoline code wedged in there, at least on the VAX and some of the early other ports. I desperately wanted to move the trampoline code to libc for the sparc port. It was *in theory* easy to do this :-) ... practice was another matter.) When an interrupt arrived, as long as it was not interrupting another interrupt, the system would get on a separate "interrupt stack" -- some hardware supports this directly, with a separate interrupt stack register -- which meant we did not have to provide enough interrupt-handling space in the per-process kernel stack, nor take interrupts on some possibly dodgy user stack. (Interrupts can occur at any time, so the system may be running user code, not kernel code.) It also meant that a simple: s = splfoo(); call in the top half would block any interrupts at priority foo or lower, so that "top half" code could know that "bottom half" code for foo would not run at this point. With prioritized interrupts, *taking* an interrupt at level foo automatically raised the CPU's priority to foo, so any "bottom half" code for foo would know that no important "top half" code was running at the time -- if that had been the case, the top half would have done an splfoo() to block it -- and of course no other "bottom half" code for foo could run now. Meanwhile, no bottom-half code was *ever* allowed to block. When you took an interrupt, you were committed to finishing all of the work for that interrupt before issuing a "return from interrupt" instruction (which would, if the interrupt was not interrrupting another intterupt, switch back to the appropriate A good way to describe this strategy: s = splfoo(); /* block all bottom half code at this priority */ ... splx(s); /* resume ability of blocked bottom half code to run */ is that spl (set priority level) provides mutual exclusion to *code paths*. The top half blocks out the bottom half with an spl, and the bottom half blocks out the top half by simply *being* bottom-half code, handling interrupts. ----- With SMP, this whole strategy is a non-starter. We don't have just one CPU running code; we cannot block *code* paths at all. Instead, we switch to mutually exclusive access to *data*. We then make several observations: * Most data structures are mostly uncontested. (If not, we need to redesign them!) "Get lock" should be fast in this usual case. * If we provide what used to be "bottom half" drivers with *their own* stacks / interrupt threads ("ithreads"), they can block if they need to: when the data structure *is* actually contested. This means we mainly need just one kind of mutex. For read-mostly data structures, we would like to have shared-read locks and so on, but we can build them on this base mutex. (As it turns out, this view is a little simplistic; we want to build them on a base compare-and-swap, typically, rather than a base full-blown-mutex. It would also be nice to have CAS-based multi-producer single- consumer and single-producer multi-consumer queues. These are particuarly useful on systems with hundreds or thousands of cores.) Of course, we also have to start dealing with issues like priority inversion and lock order / possible deadlock, any time we lock data instead of code paths. But that's mostly a separate issue. ----- This is all fine for most code, including most device drivers, but for manipulating the hardware itself, the lowest level interrupt dispatching code, and also the system scheduler, still must block interrupts from time to time. We also have some special oddball cases, such as profiling interrupts, where we want to know: "What was running when the interrupt fired?" For these cases we *don't* want to switch to a separate interrupt thread: * In the hardware interrupt dispatcher, we may not *know which thread to switch to* yet. We must find the right ithread, then schedule it to run. (Then we have to manipulate the hardware based on whether the interrupt is edge or level triggered, and so on, but that's an in-the-weeds detail also mostly unrelated to this scheduling.) For the profiling "what was running" case, we'd like to sample what was running, which we *can't* do from a separate thread: we need access to the current stack. (Strictly speaking, we merely need said access ... but we also need that thread to remain paused while we sample it.) And, for some low-cost paths such as gathering entropy, we may not want or need to *pay* the up-front cost of a separate ithread. * In the scheduler, we're either in the process of choosing threads and changing stacks, or setting up data structures to tell the chooser which threads to choose. We need to block all scheduling events, including all interrupts, for some of these super-critical sections. These use the MTX_SPIN type lock, which is similar to MTX_DEF, but: * does block interrupts, and * never *invokes* the scheduler: never tries to put the current thread out of the running until the locked data are available. Since then, we have added another special case: * In a "critical section", we wish to make sure that the current thread does not migrate from one CPU to another. This does not, strictly speaking, require blocking interrupts entirely, but because the scheduler does its thing by blocking interrupts, we block interrupts for short durations here as well (actually when *leaving* the critical section, where we check to see if the scheduler would *like* us to migrate). This is not really a mutex at all, but it does interact with them, so it's worth mentioning. Essentially, if you are in a critical section, you may not switch threads, so if you need a mutex, you must use a spin mutex. (This *is* well-documented in "man 9 critical_enter".) One might argue that being in a critical section should turn an MTX_DEF mutex into an MTX_SPIN mutex, but it's not that easy to do, and if you're taking a possibly slow MTX_DEF lock *while* in a critical section, "you're doing it wrong" (as with the heavily contested datum problem, we should rewrite the code so that the critical section happens *while* holding the MTX_DEF object, not the other way around). Anyway, that's how we got here, and why things are the way they are. Chris From owner-freebsd-hackers@freebsd.org Mon Apr 10 04:02:23 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D596D365CF for ; Mon, 10 Apr 2017 04:02:23 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-oi0-x242.google.com (mail-oi0-x242.google.com [IPv6:2607:f8b0:4003:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 32F837A4 for ; Mon, 10 Apr 2017 04:02:23 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-oi0-x242.google.com with SMTP id w197so10298429oiw.1 for ; Sun, 09 Apr 2017 21:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=wqmoaea+s9YKC28jD3SwkdZ2MtrYqVk7eDe0InvWfv8=; b=sj4jc96EFNt5t/LDwRW71YGB1nuYyO+hhQoccrj+4/Nqfbz2N5i6Xwhj1GfUy7SDY+ YCP5tambaaCVc/DOlluipV/WNliezaWiKhEnBx/zj10g1XDQGgw3P8N0KLpK0+5MVX53 M2FZJe+K70vxn6uHeCfzlWauf7nHCACqRJUj/+ck1zEWFmLDbo0iIP2teqE9TWU/PpMz umy2lh91avEJw23QqSm6mOFaAOLdFByUMXL0hyscqJIR+UI8D/rMg6VFT3EQAVGmZbbJ NL9KkcGTjWwscfw3icMUnn099SvUSuZwreiKiol+v0huBlMU3auSud+uzGCKKQIZB8Mn Xlwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=wqmoaea+s9YKC28jD3SwkdZ2MtrYqVk7eDe0InvWfv8=; b=pPxREbvl3dL+6zrgeQXw6iXsJdPIGuFsc3uY3lYcGzC1u1M5CDOHmM19/V/lXgaTCU +Z/WFJBAWIF0iODGYgtJQelzTNrILYDLf2B9kJyA2w71kjOz8xn9bHVLpzb7wrMhDNSH k6H+4SefCGLfO342xYyGNjEog8E8kmGJjQmXiYHeFb+D0RxWROOO38jdeFuL3H247ETY f4pCy1RECpCqihuu3pxKBaD4fmxvmGC6ugJVUBfkA+pA35QdSGHceCtiWr3/Z7XN/B25 W4Y6Ny9xtGYjrIi/YuTspLUj8TKllS2zradeCyKOADLwRV80j3gx/hU+IuXSdSvVUhBR RQIg== X-Gm-Message-State: AN3rC/6RbfN812Kpr/CF2Ym5BRM9KVPxEM0xx7B6nUFkbtAYfWTwqoGufLCELsgF0ukiVA== X-Received: by 10.157.82.9 with SMTP id e9mr7888758oth.50.1491796942394; Sun, 09 Apr 2017 21:02:22 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id p3sm4067200ota.54.2017.04.09.21.02.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 21:02:21 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Warner Losh References: <3f93930c-7f10-4d0b-35f2-2b07d64081f0@gmail.com> Cc: Ryan Stone , "freebsd-hackers@freebsd.org" , Ed Schouten From: Yubin Ruan Message-ID: Date: Mon, 10 Apr 2017 12:01:51 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 04:02:23 -0000 On 2017/4/10 9:51, Warner Losh wrote: > On Sun, Apr 9, 2017 at 7:28 PM, Yubin Ruan wrote: >> On 2017/4/10 0:24, Ryan Stone wrote: >>> >>> >>> >>> On Sun, Apr 9, 2017 at 6:13 AM, Yubin Ruan >> > wrote: >>> >>> >>> #######1, spinlock used in an interrupt handler >>> If a thread A holding a spinlock T get interrupted and the interrupt >>> handler responsible for this interrupt try to acquire T, then we have >>> deadlock, because A would never have a chance to run before the >>> interrupt handler return, and the interrupt handler, unfortunately, >>> will continue to spin ... so in this situation, one has to disable >>> interrupt before spinning. >>> >>> As far as I know, in Linux, they provide two kinds of spinlocks: >>> >>> spin_lock(..); /* spinlock that does not disable interrupts */ >>> spin_lock_irqsave(...); /* spinlock that disable local interrupt * >>> >>> >>> In the FreeBSD locking style, a spinlock is only used in the case where >>> one needs to synchronize with an interrupt handler. This is why spinlocks >>> always disable local interrupts in FreeBSD. >>> >>> FreeBSD's lock for the first case is the MTX_DEF mutex, which is >>> adaptively-spinning blocking mutex implementation. In short, the MTX_DEF >>> mutex will spin waiting for the lock if the owner is running, but will >>> block if the owner is deschedules. This prevents expensive trips through >>> the scheduler for the common case where the mutex is only held for short >>> periods, without wasting CPU cycles spinning in cases where the owner >>> thread >>> is descheduled and therefore will not be completing soon. >> >> >> Great explanation! I read the man page at: >> >>> >>> https://www.freebsd.org/cgi/man.cgi?query=mutex&sektion=9&apropos=0&manpath=FreeBSD+11.0-RELEASE+and+Ports >> >> and now clear about MTX_DEF and MTX_SPIN mutexs. But, still a few more >> question, if you don't mind: >> >> Is it true that a thread holding a MTX_DEF mutex can be descheduled? >> (shouldn't it disable interrupt like a MTX_SPIN mutex?) It is said on >> the main page that MTX_DEF mutex is used by default in FreeBSD, so its >> usecase must be very common. If a thread holding a MTX_DEF mutex can be >> descheduled, which means that it did not disable interrupt, then we may >> have lots of deadlock here, right? > > Yes, they can be descheduled. But that's not a problem. No other > thread can acquire the MTX_DEF lock. If another thread tries, it will > sleep and wait for the thread that holds the MTX_DEF lock to release > it. Eventually, the thread will get time to run again, and then > release the lock. Threads that just hold a MTX_DEF lock may also > migrate from CPU to CPU too. > > Warner > Does that imply that MTX_DEF should not be used in something like interrupt handler? Putting an interrupt handler into sleep doesn't make so much sense. Yubin >>> #######2, priority inversion problem >>> If thread B with a higher priority get in and try to acquire the lock >>> that thread A currently holds, then thread B would spin, while at the >>> same time thread A has no chance to run because it has lower priority, >>> thus not being able to release the lock. >>> (I haven't investigate enough into the source code, so I don't know >>> how FreeBSD and Linux handle this priority inversion problem. Maybe >>> they use priority inheritance or random boosting?) >>> >>> >>> FreeBSD's spin locks prevent priority inversion by preventing the holder >>> thread from being descheduled. >>> >>> MTX_DEF locks implement priority inheritance. >> >> >> Nice hints. Thanks! >> >> regards, >> Yubin Ruan >> >> >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Mon Apr 10 04:26:30 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 206B8D36AE0 for ; Mon, 10 Apr 2017 04:26:30 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F2186FBC for ; Mon, 10 Apr 2017 04:26:29 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3A4QRua042762 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 9 Apr 2017 21:26:27 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3A4QR9Q042761; Sun, 9 Apr 2017 21:26:27 -0700 (PDT) (envelope-from torek) Date: Sun, 9 Apr 2017 21:26:27 -0700 (PDT) From: Chris Torek Message-Id: <201704100426.v3A4QR9Q042761@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Sun, 09 Apr 2017 21:26:27 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 04:26:30 -0000 >>> Is it true that a thread holding a MTX_DEF mutex can be descheduled? >> Yes, they can be descheduled. But that's not a problem. No other >> thread can acquire the MTX_DEF lock. ... >Does that imply that MTX_DEF should not be used in something like >interrupt handler? Putting an interrupt handler into sleep doesn't >make so much sense. Go back to the old top-half / bottom-half model, and consider that now that there are interrupt *threads*, your ithread is also in the "top half". It's therefore OK to suspend. ("Sleep" is not quite correct here: a mutex wait is not a "sleep" state but instead is just a waiting, not-scheduled-to-run state. The precise difference is irrelevant at this level though.) It's not *great* to suspend here, but all your alternatives are *also* bad: * You may grab incoming data and stuff it into a ring buffer, and schedule some other thread to handle it later. But if the ring buffer is full you have a problem, and all you have done is push the actual processing off to another thread, adding more overhead. * You may put the device itself on hold so that no more data can come in (if it's that kind of device). On the other hand, if you are handling an interrupt but not in an interrupt thread, you are running in the "bottom half". It is therefore *not OK* to suspend. You must now use one of those alternatives. Note that if you suspend on an MTX_DEF mutex, and your priority is *higher* than the priority of whatever thread actually holds that mutex now, that other thread gets a priority boost to your level (priority propagation, to prevent priority inversion). So letting your ithread suspend, assuming you have an ithread, is probably your best bet. Chris From owner-freebsd-hackers@freebsd.org Mon Apr 10 07:11:44 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92935D373BB for ; Mon, 10 Apr 2017 07:11:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 371DDBC3 for ; Mon, 10 Apr 2017 07:11:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v3A7BcUx095781 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 10 Apr 2017 10:11:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v3A7BcUx095781 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v3A7Bb1h095780; Mon, 10 Apr 2017 10:11:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 10 Apr 2017 10:11:37 +0300 From: Konstantin Belousov To: Chris Torek Cc: rysto32@gmail.com, vasanth.raonaik@gmail.com, freebsd-hackers@freebsd.org, ed@nuxi.nl, ablacktshirt@gmail.com Subject: Re: Understanding the FreeBSD locking mechanism Message-ID: <20170410071137.GH1788@kib.kiev.ua> References: <201704100216.v3A2GQ2s032227@elf.torek.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201704100216.v3A2GQ2s032227@elf.torek.net> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 07:11:44 -0000 On Sun, Apr 09, 2017 at 07:16:26PM -0700, Chris Torek wrote: > In the old non-SMP days, BSD, like traditional V6 Unix, divided > the kernel into "top half" and "bottom half" sections. The top > half was anything driven from something other than an interrupt, > such as initial bootstrap or any user-sourced system call. Each > of these had just one (per-process) kernel stack, in the "u. > area", which was UPAGES * NBPG (number of bytes per page) bytes > long, but also had to contain "struct user". > > (In other words, the stack space available was actually smaller > than that. The "user" struct was *above* the kernel stack, so > that ksp would not grow down into the structure; there was also > signal trampoline code wedged in there, at least on the VAX and > some of the early other ports. I desperately wanted to move the > trampoline code to libc for the sparc port. It was *in theory* > easy to do this :-) ... practice was another matter.) Signal trampolines never were put on the kernel stack, simply because uarea/kstack is not accessible from the user space. They lived on top the user mode stack of the main thread. Currently on x86/powerpc/arm, signal trampolines are mapped from the 'shared page', which was done to allow marking the user stack as non-executable. Kstack still contains the remnants of the uarea, renamed to (per-thread) pcb. There is no much sense in the split of struct thread vs. struct pcb, but it is historically survived up to this moment, and clearing things up requires too much MD work. My opinion is that pcb on kstack indeed only eats the space and better be put into td_md. Yet another thing which is shared with kstack, is the usermode FPU save area for x86 and arm64. At least on x86, the save area is dynamically sized at boot to support extentions like AVX/AVX256/AVX512 etc, and chomping part of the kstack saves one more contiguous KVA allocation and allows to reuse kstack cache. Again historically, pre-AVX kernels put XMM save area into pcb->kstack. > > When an interrupt arrived, as long as it was not interrupting > another interrupt, the system would get on a separate "interrupt > stack" -- some hardware supports this directly, with a separate > interrupt stack register -- which meant we did not have to provide > enough interrupt-handling space in the per-process kernel stack, > nor take interrupts on some possibly dodgy user stack. > (Interrupts can occur at any time, so the system may be running > user code, not kernel code.) No, this is not a case, at least on x86. There, 'normal' interrupts and exceptions reuse the current thread kstack, thus participating in the common stack overflow business. On i386, only NMI and double fault exceptions are routed through task gates in IDT, and are provided with the separate stack [double fault almost always indicates that stack overflow]. On amd64, TSS switching is impossible, but IDT descriptors may be marked with non-zero IST, which basically reference some static stack besides kstack. Only NMI uses IST. > Since then, we have added another special case: > > * In a "critical section", we wish to make sure that the current > thread does not migrate from one CPU to another. This does > not, strictly speaking, require blocking interrupts entirely, > but because the scheduler does its thing by blocking interrupts, > we block interrupts for short durations here as well (actually > when *leaving* the critical section, where we check to see if > the scheduler would *like* us to migrate). This is not true, both in explanation of intent, and in the implementation details. Critical section prevents de-scheduling of the current thread, disabling any context switches on the current CPU. It works by incrementing current thread td_critnest counter. Note that the interrupts are still enabled when critical section is ensured, so the flow of control can still be 'preempted' to the interrupt, but after return from the interrupt, current thread continues to execute. If any higher-priority thread needs to be scheduled due to interrupt, the scheduler and context switch are done after the td_critnest returns to zero. > > This is not really a mutex at all, but it does interact with > them, so it's worth mentioning. Essentially, if you are in a > critical section, you may not switch threads, so if you need > a mutex, you must use a spin mutex. You probably mixed critical_enter() and spinlock_enter() there. The later indeed disables interrupt and intended to be used as part of the spinlock (spin mutexes) implementation. > > (This *is* well-documented in "man 9 critical_enter".) The explanation in critical_enter(9) is somewhat misleading. The spinlock_enter() call consequences include most side-effects of critical_enter(), because interrupts are disabled for later and thus context-switching cannot occur at all. Spinlocks do not enter the critical section technically, i.e. the td_critnest is not incremented. From owner-freebsd-hackers@freebsd.org Mon Apr 10 08:11:27 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09B56D36128 for ; Mon, 10 Apr 2017 08:11:27 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CF1E0B19 for ; Mon, 10 Apr 2017 08:11:26 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3A8BP3c049596 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 10 Apr 2017 01:11:25 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3A8BP8B049595; Mon, 10 Apr 2017 01:11:25 -0700 (PDT) (envelope-from torek) Date: Mon, 10 Apr 2017 01:11:25 -0700 (PDT) From: Chris Torek Message-Id: <201704100811.v3A8BP8B049595@elf.torek.net> To: kostikbel@gmail.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ablacktshirt@gmail.com, ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com, vasanth.raonaik@gmail.com In-Reply-To: <20170410071137.GH1788@kib.kiev.ua> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Mon, 10 Apr 2017 01:11:25 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 08:11:27 -0000 >Signal trampolines never were put on the kernel stack ... Oops, right, not sure why I was thinking that. However I would still prefer to have libc supply the trampoline address (the underlying signal system calls can do this, since until you are catching a signal in the first place, there is no need for a known-in-advance trampoline address). >Kstack still contains the remnants of the uarea, renamed to (per-thread) >pcb. There is no much sense in the split of struct thread vs. struct >pcb, but it is historically survived up to this moment, and clearing >things up requires too much MD work. >My opinion is that pcb on kstack indeed only eats the space and better be >put into td_md. That would be good. >> When an interrupt arrived, as long as it was not interrupting >> another interrupt, the system would get on a separate "interrupt >> stack" ... No, this is not a case, at least on x86. On VAX, and (emulated without hardware support) in my SPARC port, it was. :-) >There, 'normal' interrupts and exceptions reuse the current thread >kstack ... I never liked this very much, but if it's faster on x86, it's not unreasonable. And without hardware support (or if the TSS switch is too slow) it's OK. >> * In a "critical section", we wish to make sure that the current >> thread does not migrate from one CPU to another. >> ... we block interrupts for short durations here as well (actually >> when *leaving* the critical section, where we check to see if >> the scheduler would *like* us to migrate). >This is not true, both in explanation of intent, and in the implementation >details. Ah, and I see you added a compiler_membar and some comments here recently. I did indeed misread the micro-optimization. >You probably mixed critical_enter() and spinlock_enter() there. >The later indeed disables interrupt and intended to be used as part >of the spinlock (spin mutexes) implementation. What I meant was that it's a dreadful error to do, e.g.: critical_enter(); mtx_lock(mtx); ... mtx_unlock(mtx); critical_exit(); but the other order (lock first, then enter/exit) is OK. This is similar to the prohibition against obtaining a default mutex while holding a spin mutex. Chris From owner-freebsd-hackers@freebsd.org Mon Apr 10 08:48:03 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F984D36E00 for ; Mon, 10 Apr 2017 08:48:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E63B723E for ; Mon, 10 Apr 2017 08:48:02 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v3A8lvou016982 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 10 Apr 2017 11:47:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v3A8lvou016982 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v3A8lvrt016981; Mon, 10 Apr 2017 11:47:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 10 Apr 2017 11:47:56 +0300 From: Konstantin Belousov To: Chris Torek Cc: ablacktshirt@gmail.com, ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com, vasanth.raonaik@gmail.com Subject: Re: Understanding the FreeBSD locking mechanism Message-ID: <20170410084756.GJ1788@kib.kiev.ua> References: <20170410071137.GH1788@kib.kiev.ua> <201704100811.v3A8BP8B049595@elf.torek.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201704100811.v3A8BP8B049595@elf.torek.net> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 08:48:03 -0000 On Mon, Apr 10, 2017 at 01:11:25AM -0700, Chris Torek wrote: > >Signal trampolines never were put on the kernel stack ... > > Oops, right, not sure why I was thinking that. However I would still > prefer to have libc supply the trampoline address (the underlying > signal system calls can do this, since until you are catching a > signal in the first place, there is no need for a known-in-advance > trampoline address). I considered some variation of this scheme when I worked on the non-executable stack support. AFAIR the reason why I decided not to do this was that the kernel-injected signal trampoline is still needed for backward ABI-compat. In other words, the shared page would be still needed, and we would end up with both libc trampoline and kernel trampoline, which felt somewhat excessive. Selecting one scheme or another based e.g. on the binary osrel was too fragile, e.g. new binary might have loaded old library, and the kernel trampoline still must be present in this situation. > What I meant was that it's a dreadful error to do, e.g.: > > critical_enter(); > mtx_lock(mtx); > ... > mtx_unlock(mtx); > critical_exit(); > > but the other order (lock first, then enter/exit) is OK. This > is similar to the prohibition against obtaining a default mutex > while holding a spin mutex. Sure, this is a bug. Debugging kernel would catch this, at least mi_switch() asserts that td_critnest == 0 (technically it checks that td_critnest == 1 but the thread lock is owned there). So if such code tries to lock contested mutex, the bug causes panic. I am sorry my previous mail contained an error: the spinlock_enter() also increments td_critnest. Still, since interrupts are disabled, this is mostly cosmetics. The more important consequence is that critical_exit() on spinlock unlock re-checks td_owepreempt and executes potential postponed scheduling actions. From owner-freebsd-hackers@freebsd.org Mon Apr 10 08:57:42 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A269CD371E1 for ; Mon, 10 Apr 2017 08:57:42 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8CFE2B17 for ; Mon, 10 Apr 2017 08:57:42 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3A8vf3B049846 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 10 Apr 2017 01:57:41 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3A8vffM049845; Mon, 10 Apr 2017 01:57:41 -0700 (PDT) (envelope-from torek) Date: Mon, 10 Apr 2017 01:57:41 -0700 (PDT) From: Chris Torek Message-Id: <201704100857.v3A8vffM049845@elf.torek.net> To: kostikbel@gmail.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ablacktshirt@gmail.com, ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com, vasanth.raonaik@gmail.com In-Reply-To: <20170410084756.GJ1788@kib.kiev.ua> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Mon, 10 Apr 2017 01:57:41 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 08:57:42 -0000 >I considered some variation of this scheme when I worked on the >non-executable stack support. AFAIR the reason why I decided not to do >this was that the kernel-injected signal trampoline is still needed >for backward ABI-compat. In other words, the shared page would be >still needed, and we would end up with both libc trampoline and kernel >trampoline, which felt somewhat excessive. Those are pretty much the same reasons I never did it as well. >Selecting one scheme or another based e.g. on the binary osrel was too >fragile, e.g. new binary might have loaded old library, and the kernel >trampoline still must be present in this situation. The method by which to select the scheme, though, is straightforward: old vs new signal system call numbers and/or flags. ("Flags" presents issues if users of existing mechanism are not good about clearing unknown flag bits.) Besides non-executable stack / shared-page, this would also be particularly good for cases where a runtime library (not necessarily libc itself, perhaps for other languages) wants a different signal handling method in user space. For instance, instead of signals being delivered to some existing thread as interrupts, they might spin off new threads entirely. I think it's still worth pursuing, but it's one of those "forever in the future, low priority" ideas. I can't even seem to get back to my medium-priority ideas these days... Chris From owner-freebsd-hackers@freebsd.org Mon Apr 10 09:51:49 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B7A5D34100 for ; Mon, 10 Apr 2017 09:51:49 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-5.reflexion.net [208.70.210.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ABADBC99 for ; Mon, 10 Apr 2017 09:51:48 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 25882 invoked from network); 10 Apr 2017 09:52:41 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 10 Apr 2017 09:52:41 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Mon, 10 Apr 2017 05:51:41 -0400 (EDT) Received: (qmail 29193 invoked from network); 10 Apr 2017 09:51:41 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 10 Apr 2017 09:51:41 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 85F66EC8630; Mon, 10 Apr 2017 02:51:40 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> Date: Mon, 10 Apr 2017 02:51:39 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <585B43F7-D4C8-431A-BFFE-68B48C3214AE@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 09:51:49 -0000 On 2017-Apr-9, at 5:10 PM, Mark Millard wrote: > On 2017-Apr-9, at 10:24 AM, Mark Millard = wrote: >=20 >> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >=20 >>=20 >>> Hmm, could you try the following patch, I did not even compiled it. >>=20 >> I'll try it later today. >>=20 >>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >>> index 3d5756ba891..55aa402eb1c 100644 >>> --- a/sys/arm64/arm64/pmap.c >>> +++ b/sys/arm64/arm64/pmap.c >>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >>> sva +=3D L3_SIZE) { >>> l3 =3D pmap_load(l3p); >>> if (pmap_l3_valid(l3)) { >>> + if ((l3 & ATTR_SW_MANAGED) && >>> + pmap_page_dirty(l3)) { >>> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >>> + ~ATTR_MASK)); >>> + } >>> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >>> PTE_SYNC(l3p); >>> /* XXX: Use pmap_invalidate_range */ >=20 >=20 > Preliminary testing indicates that this fixes the > some-pages-become-zero problem for fork-then-swapout/in. >=20 > Thanks! >=20 > I'll see if a buildworld can go through without being stopped > by the type of issue. But that will take a while. (It is how > I originally ran into the problem(s) that others had been > reporting on the lists.) buildworld buildkernel completed non-stop for the first time on a BPI-M3 board. Looks good for a check-in to svn to me (head and stable/11). This combined with 2017-Feb-15's -r313772's fix to the fork trampline code's updating of sp_el0 makes arm64 far more stable for my purposes. -r313772 was never MFC'd to stable/11. In my view it should be. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Mon Apr 10 14:44:36 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D249D37745 for ; Mon, 10 Apr 2017 14:44:36 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 34579B6 for ; Mon, 10 Apr 2017 14:44:35 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (106-68-100-234.dyn.iinet.net.au [106.68.100.234]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id v3AEiTSw050386 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Mon, 10 Apr 2017 07:44:32 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: Understanding the FreeBSD locking mechanism To: Yubin Ruan , Ed Schouten References: Cc: freebsd-hackers@freebsd.org From: Julian Elischer Message-ID: <56c36e41-e1cb-6e87-dc6e-922dd5abbccc@freebsd.org> Date: Mon, 10 Apr 2017 22:44:23 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 14:44:36 -0000 On 9/4/17 6:13 pm, Yubin Ruan wrote: > On 2017/4/6 17:31, Ed Schouten wrote: >> Hi Yubin, >> >> 2017-04-06 11:16 GMT+02:00 Yubin Ruan : >>> Does this function provides the ordinary "spinlock" functionality? >>> There >>> is no special "test-and-set" instruction, and neither any extra >>> locking >>> to protect internal data structure manipulation. Isn't this >>> subjected to >>> race condition? >> >> Locking a spinlock is done through macro mtx_lock_spin(), which >> expands to __mtx_lock_spin() in sys/sys/mutex.h. That macro first >> calls into the function you looked at, spinlock_enter(), to disable >> interrupts. It then calls into the _mtx_obtain_lock_fetch() to do the >> test-and-set operation you were looking for. > > Thanks for replying. I have read some of those codes. just in case it somehow slipped your attention or has not yet been brought up there is the following overview: https://www.freebsd.org/cgi/man.cgi?locking(9) > > Just a few more questions, if you don't mind: > > (1) why are spinlocks forced to disable interrupt in FreeBSD? > > From the book "The design and implementation of the FreeBSD Operating > System", the authors say "spinning can result in deadlock if a > thread interrupted the thread that held a mutex and then tried to > acquire the mutex"...(section 4.3, Mutex Synchronization, paragraph 4) > > I don't get the point why a spinlock(or *spin mutex* in the FreeBSD > world) has to disable interrupt. Being interrupted does not necessarily > mean a deadlock. Assume that thread A holding a lock T gets interrupted > by another thread B(context switch here) and thread B try to acquire > the lock T. After finding out that lock T has already been acquired, > thread B will just spin until it gets preempted, after which thread A > gets waken up and run and release the lock T. So, you see there is not > necessarily any deadlock even if thread A get interrupted. > > I can only remember two conditions where using spinlock without > disabling interrupts will cause deadlock: > > #######1, spinlock used in an interrupt handler > If a thread A holding a spinlock T get interrupted and the interrupt > handler responsible for this interrupt try to acquire T, then we have > deadlock, because A would never have a chance to run before the > interrupt handler return, and the interrupt handler, unfortunately, > will continue to spin ... so in this situation, one has to disable > interrupt before spinning. > > As far as I know, in Linux, they provide two kinds of spinlocks: > > spin_lock(..); /* spinlock that does not disable interrupts */ > spin_lock_irqsave(...); /* spinlock that disable local interrupt */ > > > #######2, priority inversion problem > If thread B with a higher priority get in and try to acquire the lock > that thread A currently holds, then thread B would spin, while at the > same time thread A has no chance to run because it has lower priority, > thus not being able to release the lock. > (I haven't investigate enough into the source code, so I don't know > how FreeBSD and Linux handle this priority inversion problem. Maybe > they use priority inheritance or random boosting?) > > thanks, > Yubin Ruan > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Mon Apr 10 20:16:00 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3EB7BD3871E for ; Mon, 10 Apr 2017 20:16:00 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-11.reflexion.net [208.70.210.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E2A212E6 for ; Mon, 10 Apr 2017 20:15:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 23069 invoked from network); 10 Apr 2017 20:15:58 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 10 Apr 2017 20:15:58 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Mon, 10 Apr 2017 16:15:58 -0400 (EDT) Received: (qmail 31433 invoked from network); 10 Apr 2017 20:15:58 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 10 Apr 2017 20:15:58 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 96DBBEC7C08; Mon, 10 Apr 2017 13:15:57 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <585B43F7-D4C8-431A-BFFE-68B48C3214AE@dsl-only.net> Date: Mon, 10 Apr 2017 13:15:57 -0700 Cc: andrew@freebsd.org, freebsd-hackers@freebsd.org, freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <876EA1E4-E5A9-411C-AFFD-989713037C19@dsl-only.net> References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> <585B43F7-D4C8-431A-BFFE-68B48C3214AE@dsl-only.net> To: Konstantin Belousov X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 20:16:00 -0000 On 2017-Apr-10, at 2:51 AM, Mark Millard wrote: > On 2017-Apr-9, at 5:10 PM, Mark Millard = wrote: >=20 >> On 2017-Apr-9, at 10:24 AM, Mark Millard = wrote: >>=20 >>> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >>=20 >>>=20 >>>> Hmm, could you try the following patch, I did not even compiled it. >>>=20 >>> I'll try it later today. >>>=20 >>>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >>>> index 3d5756ba891..55aa402eb1c 100644 >>>> --- a/sys/arm64/arm64/pmap.c >>>> +++ b/sys/arm64/arm64/pmap.c >>>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >>>> sva +=3D L3_SIZE) { >>>> l3 =3D pmap_load(l3p); >>>> if (pmap_l3_valid(l3)) { >>>> + if ((l3 & ATTR_SW_MANAGED) && >>>> + pmap_page_dirty(l3)) { >>>> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & >>>> + ~ATTR_MASK)); >>>> + } >>>> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >>>> PTE_SYNC(l3p); >>>> /* XXX: Use pmap_invalidate_range */ >>=20 >>=20 >> Preliminary testing indicates that this fixes the >> some-pages-become-zero problem for fork-then-swapout/in. >>=20 >> Thanks! >>=20 >> I'll see if a buildworld can go through without being stopped >> by the type of issue. But that will take a while. (It is how >> I originally ran into the problem(s) that others had been >> reporting on the lists.) >=20 > buildworld buildkernel completed non-stop for the first time > on a BPI-M3 board. I had been thinking of the BPI-M3 for other reasons and typed that instead of the correct: Pine64+ 2GB. (True elsewhere as well.) I do really mean arm64 here, not armv7. > Looks good for a check-in to svn to me (head and stable/11). >=20 > This combined with 2017-Feb-15's -r313772's fix to the fork > trampline code's updating of sp_el0 makes arm64 far more stable > for my purposes. >=20 > -r313772 was never MFC'd to stable/11. In my view it should be. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Tue Apr 11 07:17:02 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3118D397BD for ; Tue, 11 Apr 2017 07:17:02 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 91F10FCC for ; Tue, 11 Apr 2017 07:17:02 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: by mailman.ysv.freebsd.org (Postfix) id 8E5DED397BC; Tue, 11 Apr 2017 07:17:02 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E02DD397BB for ; Tue, 11 Apr 2017 07:17:02 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: from mail-pg0-x229.google.com (mail-pg0-x229.google.com [IPv6:2607:f8b0:400e:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 69CA7FCB for ; Tue, 11 Apr 2017 07:17:02 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: by mail-pg0-x229.google.com with SMTP id g2so116615572pge.3 for ; Tue, 11 Apr 2017 00:17:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisbowman-com.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=fyoDCcmdvyHZkM4J65ZwAeZDiG/V1Uq5Qsi0MYqj7Os=; b=Rxl7x//Nxw1VOsxCIbXjHaUC/HOLQM3dXio7rQ/cbqX835iVvT+OWrMReDyXPHM6LN 3X+tPb1xkPUClInh1vKL1I2BPkuECh4J3OMbTDw+WA9P4CYUctStJGDSsk5XUu5Wf300 sFnid3NEdlvglKgOSVe3rcZZmpsfN4tSYQhd4L48uWKW973OcieXgQNtupCE0/Ws2ZO9 ax9yFBF6zvFB9nMtxLxTOJ3V3h89Mj94RagWfYEfF1AbQXkxQuHHBQGbQEh3A5qwe07x TdRs5G49Xzq/2XrSJsrPlBpdu/sBjvM0FXYMX9D7fO8Gtx6uoOb+VbIto7tsAU3+HhiW mODw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=fyoDCcmdvyHZkM4J65ZwAeZDiG/V1Uq5Qsi0MYqj7Os=; b=rfiy66sU9kH8CoA/ApN4ypm188xF/2Z8xk//PAv56zWHNGZUz6Ehev7veJM2/JpqQc Re+R/FjXYSCsZr0UWrcfRcJ+IXfl6gF7GnypJblXoQ8bghj1h8ZeVYyf0WoH52ZLFlVH W/UUn9nalV7HTFTkiS64iCQxu/jpXPAe7vtFUNmv1X/ma2vWBtfA2+L1ua3RKbLWRaIx B0MRNN0aWwTp3w1arla6NiACuZSWUNOhCQYBLQV6J0K7EnnlmxASTuZ1gYXU0Ed5QxxQ o3iqyFe6y60AYB+yfbeWSacsAOi5ToTgeT9gSmW2kMSC19N2vf5mLoWz1csjMqJTKdX4 G57w== X-Gm-Message-State: AFeK/H3aC5UxLv9bpMFaSEmiC/R2jKQwXzdtMzktQTSwUHmYn5sN80f3Up4l9xxiR9Bjww== X-Received: by 10.84.140.235 with SMTP id 98mr73654035plt.161.1491895021410; Tue, 11 Apr 2017 00:17:01 -0700 (PDT) Received: from ?IPv6:2601:647:4e00:bbb5:8918:714a:df41:33f0? ([2601:647:4e00:bbb5:8918:714a:df41:33f0]) by smtp.gmail.com with ESMTPSA id n65sm28467853pga.8.2017.04.11.00.17.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 00:17:00 -0700 (PDT) From: Christopher Bowman Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Dtrace oddity Message-Id: Date: Tue, 11 Apr 2017 00:16:59 -0700 To: hackers@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 07:17:02 -0000 Apologies if I=E2=80=99m sending to the wrong list. I have a small test = program shown at the bottom. It tries to mmap a device for which I=E2=80=99= ve written (a possibly incorrect) driver. When I run the program I get = the following output: =20 crb@retread:63> ./test /dev/sp6050=20 argc =3D 2 argv[0] =3D ./test argv[1] =3D /dev/sp6050 opening device /dev/sp6050 open returned non-zero value mmap failed: EINVAL The man page lists a bunch of reasons for EINVAL so I want to = investigate this and I don=E2=80=99t quite know good strategies to debug = the kernel (yet) so I thought I=E2=80=99d experiment with Dtrace a bit. = Here is the oddity: when I run Dtrace and then run my test program I get = the following output from Dtrace: crb@retread:60> dtrace -n 'syscall:freebsd:mmap:entry /execname =3D=3D = "test"/ {}' dtrace: description 'syscall:freebsd:mmap:entry ' matched 1 probe CPU ID FUNCTION:NAME 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 0 63401 mmap:entry=20 I think Dtrace is indicating that the mmap syscall was called 12 times = by my test program yet I can see how the program below would have done = that. Here is my program: /* Copyright (c) 2011 by Christopher R. Bowman. All rights = reserved. */ #include #include #include #include int main (int argc, char ** argv) { int i; printf("argc =3D %d\n", argc); for (i=3D0; i < argc; i++) printf ("argv[%i] =3D %s\n", i, argv[i]); if (argc < 2) { printf("usage: test device\n"); return 0; } printf("opening device %s\n", argv[1]); int device =3D open (argv[1], O_RDWR); if (device =3D=3D 0) { printf ("open of device %s failed\n", argv[1]); return 0; } printf("open returned non-zero value\n"); void *pa =3D mmap (0, 4095, PROT_READ | PROT_WRITE, 0, device, = 0); if (pa =3D=3D MAP_FAILED) { printf ("mmap failed: "); switch (errno) { case EACCES: printf("EACCESS\n"); break; case EBADF: printf("EBADF\n"); break; case EINVAL: printf("EINVAL\n"); break; case ENODEV: printf("ENODEV\n"); break; case ENOMEM: printf("ENOMEM\n"); break; } return 0; } printf("mmap returned non-zero value: %lx\n", (unsigned = long)pa); unsigned int *p =3D (unsigned int *) pa; unsigned char *c =3D (unsigned char *) pa; #define NUM_ITERATIONS 16 for (i=3D0; i < NUM_ITERATIONS; i++){ //BARs are 2Kbytes //*p++ =3D (0xa5a5 + i); *p++ =3D (0x5aa5a5a5); } p =3D (unsigned int *) pa; for (i=3D0; i < NUM_ITERATIONS; i++){ printf("i =3D %d, read_val =3D %x\n", i, *p++); } } Thanks in advance for comments on Dtrace or perhaps program corrections = or ideas why the mmap failed or places to read on kernel debugging or = pointers to a better list to which to send this. Thanks Christopher From owner-freebsd-hackers@freebsd.org Tue Apr 11 12:37:29 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 63AAFD39C68 for ; Tue, 11 Apr 2017 12:37:29 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com [IPv6:2a00:1450:400c:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F10FAB0D for ; Tue, 11 Apr 2017 12:37:28 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: by mail-wm0-x230.google.com with SMTP id u2so59946975wmu.0 for ; Tue, 11 Apr 2017 05:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=WvxxcZgDNDwMwhMhgwSXfFh/eyysDmXlZLA+Jvla+WA=; b=dMG5l+LKqlY/9Ko17ZNKSCdJrd8Ok6RJhFNdML6xnBrfnaY0J61BRSqzozDnO2QYm+ SurW/z1jg9rPoMlt4EKXzQAE7CukQXVreiYkATswU4fD1wJz7bwoj4UrH9WfyLVGJst4 xP4VbN676hKqZgAzwzFqgl9/4bX1fx55a6Ds3s0xeLcYmJ3+pnibiRobp8tiMJwf96lc LEYvi8qGcmznOmXJKbNfWem4GXQC4RzriJaYeHwtFZdR1auoWlnUjhWVsO5RppOIJRcQ Ae4LepPggA3YLURLo1XubefbzakNqtSlXYBu9JyY8ZT9n+G3KkGp7hW5LHuzlad2K++3 HyHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=WvxxcZgDNDwMwhMhgwSXfFh/eyysDmXlZLA+Jvla+WA=; b=ZeBW43nXvZLvxI93OCD9+BsohTC9WU1oUKUdSaBgEi/xHk+xmW/iAUvgUHbMh7dh5O 5OPnmLBonRrWA8ISD6O+x2VNjW1yeNgVGh7s99CT1Ts6GeGKZd5NmEqbEhy2IqPQkDJl Gy++UKS3HDcrBIDUNxrFIDXBkQSclrufWU+qJtUttCYYVZGxtkdaOALz1vxJR74EhKuV Lq10q7Y/nkKYmPHIkfNArU7uGdrVWP2WpkPNdHHM3yagDwvB8tTULJ25R2qRCOaZhbo1 0xayrK+GV+oDiUdzV5Q1SQtqnecHe3ADnjUESQP4T5/XaX7sfi9/vG+uYCR6Fw9kpoZT Xn9w== X-Gm-Message-State: AN3rC/5OCneQ2ZNdIi5Za2WBDaqLHsmkqoBk1kE3SGt6HFAvXVEdJq7TUHgez+/M+yV1rLGlcYx+9wpB05w5xA== X-Received: by 10.28.72.67 with SMTP id v64mr14580261wma.98.1491914246611; Tue, 11 Apr 2017 05:37:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.178.10 with HTTP; Tue, 11 Apr 2017 05:37:26 -0700 (PDT) From: Flavius Anton Date: Tue, 11 Apr 2017 15:37:26 +0300 Message-ID: Subject: On COW memory mapping in d_mmap_single To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 12:37:29 -0000 Hi everyone, I'll start by giving some context, so you can better understand what is the problem I'm trying to solve. I=E2=80=99ve been working for a while o= n bhyve trying to implement save/restore [1]. We've currently managed to get it working for VMs using a ramdisk and no devices, so just vCPU and memory states are saved and restored so far. Last week I started looking into network devices, specifically virtio-net devices. The problem was that when I issue a checkpoint operation, the guest virtio driver stops working. After digging for a while, I figured out the problem is marking VM memory as COW. If I don't do this, the driver continues with no problem after checkpointing. Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When the user space does a mmap on the /dev device, we would like to mark VM memory as COW, thus the VM can continue touching pages while the user space is writing the 'freezed', COW marked memory to a persistent storage. We do this by iterating through all vm_entries from VM's vmspace, we find which entry is mapping the object that has VM memory and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on that entry. You can see the code here [2]. I'm not sure if the above is sufficient for our purpose. In other words, how would you do this? You have a vm_object that is referenced via a vm_entry by process A (the user space). Somebody else, process B let's say, does an mmap() on your device and you'd like to freeze that object, such that process B can see a consistent snapshot of it, while you want process A to be able to continue reading and writing from/to it. I've also read through Design Elements of the FreeBSD VM system [3], but I am still afraid (I am sure) that I have some misunderstandings. Thank you very much for bearing with me and going through this wall of text= . -- Flavius [1] https://github.com/flaviusanton/freebsd/tree/bhyve-save-restore [2] https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/amd= 64/vmm/vmm_dev.c#L862 [3] https://www.freebsd.org/doc/en/articles/vm-design/index.html From owner-freebsd-hackers@freebsd.org Tue Apr 11 13:00:21 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E65FD394EB for ; Tue, 11 Apr 2017 13:00:21 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1B749A83 for ; Tue, 11 Apr 2017 13:00:20 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v3BD0CKo090298 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 16:00:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v3BD0CKo090298 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v3BD0C6a090297; Tue, 11 Apr 2017 16:00:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Apr 2017 16:00:12 +0300 From: Konstantin Belousov To: Flavius Anton Cc: freebsd-hackers@freebsd.org Subject: Re: On COW memory mapping in d_mmap_single Message-ID: <20170411130012.GQ1788@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 13:00:21 -0000 On Tue, Apr 11, 2017 at 03:37:26PM +0300, Flavius Anton wrote: > Hi everyone, > > I'll start by giving some context, so you can better understand what > is the problem I'm trying to solve. I???ve been working for a while on > bhyve trying to implement save/restore [1]. We've currently managed to > get it working for VMs using a ramdisk and no devices, so just vCPU > and memory states are saved and restored so far. > > Last week I started looking into network devices, specifically > virtio-net devices. The problem was that when I issue a checkpoint > operation, the guest virtio driver stops working. After digging for a > while, I figured out the problem is marking VM memory as COW. If I > don't do this, the driver continues with no problem after > checkpointing. > > Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When > the user space does a mmap on the /dev device, we would like to mark > VM memory as COW, thus the VM can continue touching pages while the > user space is writing the 'freezed', COW marked memory to a persistent > storage. We do this by iterating through all vm_entries from VM's > vmspace, we find which entry is mapping the object that has VM memory > and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on > that entry. You can see the code here [2]. This is very strange operation, to put it mildly. First, are other vCPUs operate while you do your 'COW' ? If yes, you are guaranteed to get inconsistent snapshot. If not, then you do not need 'COW'. More, what kinds of VM objects are mapped into the vmspace ? FreeBSD VM does not support shadowing of device objects (which means, inserting shadow objects into the device object chain breaks VM invariants). One of the main reasons why it not needed to be supported is because shadow copy cannot see changes which are performed on the shadowed pages, supposedly done by device. If vmm mmaps some devices into guest vmspace, the devices would kind of 'freeze' from the guest PoV. Next, how do you undo the damage done by your 'COW' ? > I'm not sure if the above is sufficient for our purpose. In other > words, how would you do this? You have a vm_object that is referenced > via a vm_entry by process A (the user space). Somebody else, process B > let's say, does an mmap() on your device and you'd like to freeze that > object, such that process B can see a consistent snapshot of it, while > you want process A to be able to continue reading and writing from/to > it. This is not supported. I have no idea why would a copy of a page which reflects the device state even considered as a good idea. But you cannot make the consistent copy without device cooperation anyway, since device might modify its state while CPU reads. > > I've also read through Design Elements of the FreeBSD VM system [3], > but I am still afraid (I am sure) that I have some misunderstandings. > > Thank you very much for bearing with me and going through this wall of text. > > -- > Flavius > > [1] https://github.com/flaviusanton/freebsd/tree/bhyve-save-restore > [2] https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/amd64/vmm/vmm_dev.c#L862 > [3] https://www.freebsd.org/doc/en/articles/vm-design/index.html > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Tue Apr 11 13:42:46 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 207B6D39747 for ; Tue, 11 Apr 2017 13:42:46 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DE130C65 for ; Tue, 11 Apr 2017 13:42:45 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.167.42] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (Exim 4.84) (envelope-from ) id 1cxvfK-0001wT-NY; Tue, 11 Apr 2017 15:17:02 +0200 Date: Tue, 11 Apr 2017 15:14:26 +0200 From: Fabian Keil To: Christopher Bowman Cc: freebsd-hackers@freebsd.org Subject: Re: Dtrace oddity Message-ID: <20170411151426.3b760182@fabiankeil.de> In-Reply-To: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/Wjcw_RzPb68pzBvBRYkVM4L"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 13:42:46 -0000 --Sig_/Wjcw_RzPb68pzBvBRYkVM4L Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Christopher Bowman wrote: > The man page lists a bunch of reasons for EINVAL so I want to > investigate this and I don=E2=80=99t quite know good strategies to debug = the > kernel (yet) so I thought I=E2=80=99d experiment with Dtrace a bit. Here= is the > oddity: when I run Dtrace and then run my test program I get the > following output from Dtrace: >=20 > crb@retread:60> dtrace -n 'syscall:freebsd:mmap:entry /execname =3D=3D "t= est"/ {}' > dtrace: description 'syscall:freebsd:mmap:entry ' matched 1 > probe CPU ID FUNCTION:NAME > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 > 0 63401 mmap:entry=20 >=20 > I think Dtrace is indicating that the mmap syscall was called 12 times > by my test program yet I can see how the program below would have done > that. A bunch of mmap syscalls occur before main is even entered. Try running your program with truss to see what's going on. > Here is my program: [...] > printf("opening device %s\n", argv[1]); > int device =3D open (argv[1], O_RDWR); > if (device =3D=3D 0) { You should check for -1 here. > void *pa =3D mmap (0, 4095, PROT_READ | PROT_WRITE, 0, device, 0); No flags? From the mmap man page: | [EINVAL] None of MAP_ANON, MAP_PRIVATE, MAP_SHARED, or | MAP_STACK was specified. At least one of these fl= ags | must be included. Fabian --Sig_/Wjcw_RzPb68pzBvBRYkVM4L Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWOzWswAKCRAFiohV/3dU naR9AKC88uaGiPliml1AEINPpCMkoYMAWQCfSPsCr/Gj/fo9J+0zFGmy+EYYvXU= =JFvI -----END PGP SIGNATURE----- --Sig_/Wjcw_RzPb68pzBvBRYkVM4L-- From owner-freebsd-hackers@freebsd.org Tue Apr 11 13:55:04 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50646D39B13 for ; Tue, 11 Apr 2017 13:55:04 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com [IPv6:2a00:1450:400c:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CEA683E3 for ; Tue, 11 Apr 2017 13:55:03 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: by mail-wm0-x235.google.com with SMTP id y18so13906783wmh.0 for ; Tue, 11 Apr 2017 06:55:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=u7sLiW9JSsAD/x7usLCGROHZF40bWzgGIItUmh5O1HY=; b=o3paD3Nd7QOOJmGpfSMRy65xrn+TxyX3RPgDLb454ISBnJ0rPf7LxsRPCmkrZFX0t8 j1lRfShzjuRxcsgf85hBixS434YMO/7SboyWBmVLE9b5TEmB5eJ+uo1fr9I5Ee5nznR+ 1au54a4AJx/pkBe1glxv9mxBt/diCgseLfnLqOxNv4TkAiqcmOP70Csv7oQqdusPqKHj ZMHGyhKigxbQeBo/RApnZDLIdJMYgxR/MY6g25mk0vCfn2tMVYromaYtK590GatX6OxW o49yhVD/+6/R+6LQ0PHK3oFBj//wFmhC+KZPYh3nNhmxQXQMnqGxi06HAZ3km6wTEnKQ hs7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=u7sLiW9JSsAD/x7usLCGROHZF40bWzgGIItUmh5O1HY=; b=gr9OH4517osGZ85W/sGn0J7GXSh3fzopR/fHAzG56XnlLthtPS1Btlv1bPmIaZ6PQC 3jBaSuev32s8X/Lb5nifouMh1MPa6a5lnlU1SpH4BbRAeqToPn6OkA7TngbMBvlk81aF pEsBG11wkZ6D/Xd2RIl4Fk2c241r8SECHC4cW2hc8lfdiTfZ/jPw0CgQTFGMNnZBln9J kW3t0GKeWEs6gCi+uTQdo+oXnfksyefEpV/b5YBGJ0+ZfIKYTO/k1xETyxBxVoi0zNdg HpX0WLXEkqDjoVgSuIieUEc4YbcVuEHAKcji5bRpJkW0IY+u6OpHnQkHerZNjqgOPAMy 7oDw== X-Gm-Message-State: AN3rC/4prRnTdApqPJn/CuCUbLK35xyK2FvM9mGodqCzPvK13s+0bnJzwKvwuCe7FLfBQwNUB6lxerSDdgi2ag== X-Received: by 10.28.7.144 with SMTP id 138mr15014079wmh.125.1491918900958; Tue, 11 Apr 2017 06:55:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.178.10 with HTTP; Tue, 11 Apr 2017 06:55:00 -0700 (PDT) In-Reply-To: References: From: Flavius Anton Date: Tue, 11 Apr 2017 16:55:00 +0300 Message-ID: Subject: Re: On COW memory mapping in d_mmap_single To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 13:55:04 -0000 >On Tue, Apr 11, 2017 at 04:00:21PM +0300, Konstantin Belousov wrote: >>On Tue, Apr 11, 2017 at 03:37:26PM +0300, Flavius Anton wrote: >> Hi everyone, >> >> I'll start by giving some context, so you can better understand what >> is the problem I'm trying to solve. I???ve been working for a while on >> bhyve trying to implement save/restore [1]. We've currently managed to >> get it working for VMs using a ramdisk and no devices, so just vCPU >> and memory states are saved and restored so far. >> >> Last week I started looking into network devices, specifically >> virtio-net devices. The problem was that when I issue a checkpoint >> operation, the guest virtio driver stops working. After digging for a >> while, I figured out the problem is marking VM memory as COW. If I >> don't do this, the driver continues with no problem after >> checkpointing. >> >> Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When >> the user space does a mmap on the /dev device, we would like to mark >> VM memory as COW, thus the VM can continue touching pages while the >> user space is writing the 'freezed', COW marked memory to a persistent >> storage. We do this by iterating through all vm_entries from VM's >> vmspace, we find which entry is mapping the object that has VM memory >> and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on >> that entry. You can see the code here [2]. > >This is very strange operation, to put it mildly. First, are other vCPUs >operate while you do your 'COW' ? If yes, you are guaranteed to get >inconsistent snapshot. If not, then you do not need 'COW'. Yes, all vCPUs are locked before calling mmap(). I agree that we don't need 'COW', as long as we keep all vCPUs locked while we copy the entire VM memory. But this might take a while, imagine a VM with 32GB or more of RAM. This will take maybe minutes to write to disk, so we don't actually want the VM to be freezed for so long. That's the reason we'd like to map the memory COW and then unlock vCPUs. >More, what kinds of VM objects are mapped into the vmspace ? FreeBSD VM >does not support shadowing of device objects (which means, inserting >shadow objects into the device object chain breaks VM invariants). One >of the main reasons why it not needed to be supported is because shadow >copy cannot see changes which are performed on the shadowed pages, >supposedly done by device. If vmm mmaps some devices into guest vmspace, >the devices would kind of 'freeze' from the guest PoV. It's a OBJT_DEFAULT. It's not a device object, it's the memory object given to guest to use as physical memory. >Next, how do you undo the damage done by your 'COW' ? This is one thing that we've thought about, but we don't have a solution for now. I agree it is very important, though. I figured that it might be possible to 'unmark' the memory object as COW with some additional tricks. >> I'm not sure if the above is sufficient for our purpose. In other >> words, how would you do this? You have a vm_object that is referenced >> via a vm_entry by process A (the user space). Somebody else, process B >> let's say, does an mmap() on your device and you'd like to freeze that >> object, such that process B can see a consistent snapshot of it, while >> you want process A to be able to continue reading and writing from/to >> it. >This is not supported. I have no idea why would a copy of a page which >reflects the device state even considered as a good idea. But you cannot >make the consistent copy without device cooperation anyway, since device >might modify its state while CPU reads. I'm sorry if I haven't been too clear. The object that I'm trying to map as COW is not a device object. It's just the object that contains VM memory. That object shouldn't change if all VM vCPUs are locked and I make sure they are when calling mmap(). Thanks for your input on this. -- Flavius >> I've also read through Design Elements of the FreeBSD VM system [3], >> but I am still afraid (I am sure) that I have some misunderstandings. >> >> Thank you very much for bearing with me and going through this wall of text. >> >> [1] https://github.com/flaviusanton/freebsd/tree/bhyve-save-restore >> [2] https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/amd64/vmm/vmm_dev.c#L862 >> [3] https://www.freebsd.org/doc/en/articles/vm-design/index.html From owner-freebsd-hackers@freebsd.org Tue Apr 11 14:30:13 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1ED72D3986F for ; Tue, 11 Apr 2017 14:30:13 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BAB4E13B for ; Tue, 11 Apr 2017 14:30:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v3BEU3V4010382 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 17:30:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v3BEU3V4010382 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v3BEU31S010380; Tue, 11 Apr 2017 17:30:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Apr 2017 17:30:03 +0300 From: Konstantin Belousov To: Flavius Anton Cc: freebsd-hackers@freebsd.org Subject: Re: On COW memory mapping in d_mmap_single Message-ID: <20170411143003.GT1788@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 14:30:13 -0000 On Tue, Apr 11, 2017 at 04:55:00PM +0300, Flavius Anton wrote: > >On Tue, Apr 11, 2017 at 04:00:21PM +0300, Konstantin Belousov wrote: > >>On Tue, Apr 11, 2017 at 03:37:26PM +0300, Flavius Anton wrote: > >> Hi everyone, > >> > >> I'll start by giving some context, so you can better understand what > >> is the problem I'm trying to solve. I???ve been working for a while on > >> bhyve trying to implement save/restore [1]. We've currently managed to > >> get it working for VMs using a ramdisk and no devices, so just vCPU > >> and memory states are saved and restored so far. > >> > >> Last week I started looking into network devices, specifically > >> virtio-net devices. The problem was that when I issue a checkpoint > >> operation, the guest virtio driver stops working. After digging for a > >> while, I figured out the problem is marking VM memory as COW. If I > >> don't do this, the driver continues with no problem after > >> checkpointing. > >> > >> Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When > >> the user space does a mmap on the /dev device, we would like to mark > >> VM memory as COW, thus the VM can continue touching pages while the > >> user space is writing the 'freezed', COW marked memory to a persistent > >> storage. We do this by iterating through all vm_entries from VM's > >> vmspace, we find which entry is mapping the object that has VM memory > >> and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on > >> that entry. You can see the code here [2]. > > > >This is very strange operation, to put it mildly. First, are other vCPUs > >operate while you do your 'COW' ? If yes, you are guaranteed to get > >inconsistent snapshot. If not, then you do not need 'COW'. > > Yes, all vCPUs are locked before calling mmap(). I agree that we don't > need 'COW', as long as we keep all vCPUs locked while we copy the > entire VM memory. But this might take a while, imagine a VM with 32GB > or more of RAM. This will take maybe minutes to write to disk, so we > don't actually want the VM to be freezed for so long. That's the > reason we'd like to map the memory COW and then unlock vCPUs. > > >More, what kinds of VM objects are mapped into the vmspace ? FreeBSD VM > >does not support shadowing of device objects (which means, inserting > >shadow objects into the device object chain breaks VM invariants). One > >of the main reasons why it not needed to be supported is because shadow > >copy cannot see changes which are performed on the shadowed pages, > >supposedly done by device. If vmm mmaps some devices into guest vmspace, > >the devices would kind of 'freeze' from the guest PoV. > > It's a OBJT_DEFAULT. It's not a device object, it's the memory object > given to guest to use as physical memory. Perhaps add asserts that you only shadow default/swap/vnode objects. Then you will see if the issue is what I noted above, or not. > > >Next, how do you undo the damage done by your 'COW' ? > > This is one thing that we've thought about, but we don't have a > solution for now. I agree it is very important, though. I figured that > it might be possible to 'unmark' the memory object as COW with some > additional tricks. You might consider using vm_object_collapse(). From owner-freebsd-hackers@freebsd.org Tue Apr 11 17:15:44 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 20CD4D3AD63 for ; Tue, 11 Apr 2017 17:15:44 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com [IPv6:2607:f8b0:400e:c00::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E4578F01 for ; Tue, 11 Apr 2017 17:15:43 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pf0-x242.google.com with SMTP id o126so553844pfb.1 for ; Tue, 11 Apr 2017 10:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=ovrJSqlRWmu3DimgzZ0CfywoGCZcEcUX70IYYI41pg0=; b=nKoeaKSTcgqgsIODVW/jfh0qH+8dPKqF6+fqPkGDr5LiSC2GPelJW55Az7aQsMfpHs 8YKkUtwA+n8USAeq9hCvYwg1VFZWECvW8Hrh932XKay2mrYRnNEGXZgDqQsdTvn68XOF 39Pd80+6YWl+idLWvF9O4Y/uuf5lFt9J9POKGZg0wTdwDY/UXqwSuXzJrSiNzhgkvtFi xiTxdbhVxyfago2oMgJNOp3738zmSzTbVWAhSiezWPwKU10kVqPTs/vZh8zzxP5nx0la omb2r/vbHICWqWXf1KPnUJy8dySMRVrpuytw1uo9GCL2foitYMmUEpSjm6owyFXsVnPU WCsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=ovrJSqlRWmu3DimgzZ0CfywoGCZcEcUX70IYYI41pg0=; b=oQwTWrQ6Jz7o//joCdz4IOjcemni1ZYtlMgoHEPCqY6ysxs1sYo0c6nrAP7RKmxMqJ SXjkjorDme55rdUtC3vuiP9sJ77o9ktP3a2n10VqzTVtPAXt1owJ/p/kLElewxZMT3uC xgw4w5ehbibj1D86wFBNZg60sKiYZbIYXNgZ1mFy/JMSbM29uk2b0UBZd2a7swfZHA0C uw3uMdMlqClaQ6lkKQ3VVQRJBOMcufGj0kQzSfWFRv09MEiizzD/vYMssRI1P9DEqaQl Q8JGhditGBfDMt3GMvhoixmp1J3NBCXh/apRTmRjIthqC9Yh9BfXt5CFZGXiwj4qIQmW N7Jw== X-Gm-Message-State: AFeK/H32wSbxEjrcvIwOAg/Zy92SUe2bGVCY9dfnXiWs4+QXarFQD9rd1E+XZM/a1/340Q== X-Received: by 10.84.218.68 with SMTP id f4mr76690785plm.146.1491930943443; Tue, 11 Apr 2017 10:15:43 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id 133sm25559138pfy.106.2017.04.11.10.15.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 10:15:42 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704100426.v3A4QR9Q042761@elf.torek.net> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com From: Yubin Ruan Message-ID: <4768e26a-cdec-6f40-1463-ece9847ca34d@gmail.com> Date: Wed, 12 Apr 2017 01:15:34 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201704100426.v3A4QR9Q042761@elf.torek.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 17:15:44 -0000 Thanks for your reply. I have read your mails and your discussion with Konstantin Belousov On 2017/4/10 12:26, Chris Torek wrote: >>>> Is it true that a thread holding a MTX_DEF mutex can be descheduled? > >>> Yes, they can be descheduled. But that's not a problem. No other >>> thread can acquire the MTX_DEF lock. ... > >> Does that imply that MTX_DEF should not be used in something like >> interrupt handler? Putting an interrupt handler into sleep doesn't >> make so much sense. > > Go back to the old top-half / bottom-half model, and consider that > now that there are interrupt *threads*, your ithread is also in the > "top half". It's therefore OK to suspend. ("Sleep" is not quite > correct here: a mutex wait is not a "sleep" state but instead is > just a waiting, not-scheduled-to-run state. The precise difference > is irrelevant at this level though.) I don't truely understand the "top-half/bottom-half" model you proposed, but I think I get the idea of how things work now. Basically, we can assume that if a thread is in the "bottom-half", then it should never suspend(or, in the other words, be preempted). This is the case of the "interrupt filter" in FreeBSD. On the other hand, if a thread is in the "top-half", then it is safe to suspend/block. This is the case of the "ithread". The difference between the "ithread" and "interrupt filter" things is that ithread has its own thread context, while interrupt handling through interrupt filter shares the same kernel stack. So, for ithread, we should use the MTX_DEF, which don't disable interrupt, and for "interrupt filter", we should use the MTX_SPIN, which disable interrupt. What really confuses me is that I don't really see how owning an "independent" thread context(i.e ithread) makes a thread run in the "top-half" and how sharing the same kernel stack makes a thread run in the "bottom-half". I did read your long explanation in the previous mail. For the non-SMP case, the "top-half/bottom-half" model goes well and I understand how the *code* path/*data* path things go. But I cannot still fully understand the model for the SMP case. Maybe you can draw something like ----- ----- | |<-- top-half | | <-- top-half | | | | | | | | | | | | | |<-- bottom-half | | <-- bottom-half ----- ----- CPU1 CPU2 to make things less abstract. Thanks, Yubin Ruan > It's not *great* to suspend here, but all your alternatives are > *also* bad: > > * You may grab incoming data and stuff it into a ring buffer, and > schedule some other thread to handle it later. But if the ring > buffer is full you have a problem, and all you have done is push > the actual processing off to another thread, adding more overhead. > > * You may put the device itself on hold so that no more data can > come in (if it's that kind of device). > > On the other hand, if you are handling an interrupt but not in an > interrupt thread, you are running in the "bottom half". It is > therefore *not OK* to suspend. You must now use one of those > alternatives. > > Note that if you suspend on an MTX_DEF mutex, and your priority is > *higher* than the priority of whatever thread actually holds that > mutex now, that other thread gets a priority boost to your level > (priority propagation, to prevent priority inversion). So letting > your ithread suspend, assuming you have an ithread, is probably your > best bet. > > Chris > From owner-freebsd-hackers@freebsd.org Tue Apr 11 17:17:11 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9B67D3AEE9 for ; Tue, 11 Apr 2017 17:17:11 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pg0-x241.google.com (mail-pg0-x241.google.com [IPv6:2607:f8b0:400e:c05::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 98D0D1320 for ; Tue, 11 Apr 2017 17:17:11 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pg0-x241.google.com with SMTP id o123so579717pga.1 for ; Tue, 11 Apr 2017 10:17:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=S0ugCMLYngrYkykv/9b4twhqQF1hRsfGxiB1t9VdqkU=; b=VqL5kkpFzIb3aEh8SXgrf3YwpwUGlk2XIhNzgIVsJf/Fiz+DCLvro/kMOBxMbpi5I1 X6GKEmFZupM2ZBeFbsGD1/QX6tpqF16MuKAIFrEoC0+j3otgPCR92auUIYHdA0mjRLsC jnM+VaaEVcxXoBmIRCz7zRl3Crbp7qrzqUaTf8e44V+2zRaL2LWqntqBu34aTGcKhZSI xvWOgRqbZfZmGnKF1M+ZnTsx0F+V16NDLvQCBgY3HvMMYCtJpjlMEQraUMIoWHykNfLB 5lfH/4KKRJB6sV7CnSFrpAy+D1kAf7afOvIrF5/RiAxsE2GLj6g2nnfMGP5nWLEFQpoJ 2rew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=S0ugCMLYngrYkykv/9b4twhqQF1hRsfGxiB1t9VdqkU=; b=oFtWPWoJ2KvN3AZC/YUNDLi4LsPk0wC22PezqDIQcXlMtBLXzFIaW4Xfu0taw+He78 s8SW3KwA+mRtqr4wiM/3RGI9FWjItyatPQzMk9+RrfERaLWrjnbUbtVfkqkLMgYmnyJs LnL3cAYpDH+oCXOJWnN893J6TEMoU/3ONOU5JJRnbwb+eBX94WLSrbTBXlG8SYwVh/DX +e01GWICsNMk57fquIrdF/qwxKMNbuTZC0ryRJAEhj7N64Ji4MS+ha2O2NObEcrAUY9h OODlRdWkS4IYGi+jjaQwZgNdsi1lIwyJ6PcYFL3TmkgkDqyqJpDRci7HIIfKXfZ0UGnV pWYA== X-Gm-Message-State: AFeK/H2X85yYknyQkyGG8MiypSu5QDQM5rXH5DVrjAPfY4ZnImPqqmFz7DHH0pXMMcB9Nw== X-Received: by 10.99.212.69 with SMTP id i5mr62137301pgj.36.1491931031107; Tue, 11 Apr 2017 10:17:11 -0700 (PDT) Received: from [192.168.0.100] ([110.64.91.54]) by smtp.gmail.com with ESMTPSA id v17sm31868381pgc.20.2017.04.11.10.17.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 10:17:09 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704100426.v3A4QR9Q042761@elf.torek.net> <4768e26a-cdec-6f40-1463-ece9847ca34d@gmail.com> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com From: Yubin Ruan Message-ID: <04b3328f-7bfb-bb70-c665-b43038cdd768@gmail.com> Date: Wed, 12 Apr 2017 01:17:02 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <4768e26a-cdec-6f40-1463-ece9847ca34d@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 17:17:11 -0000 On 2017/4/12 1:15, Yubin Ruan wrote: > > Thanks for your reply. I have read your mails and your discussion with > Konstantin Belousov > > On 2017/4/10 12:26, Chris Torek wrote: >>>>> Is it true that a thread holding a MTX_DEF mutex can be descheduled? >> >>>> Yes, they can be descheduled. But that's not a problem. No other >>>> thread can acquire the MTX_DEF lock. ... >> >>> Does that imply that MTX_DEF should not be used in something like >>> interrupt handler? Putting an interrupt handler into sleep doesn't >>> make so much sense. >> >> Go back to the old top-half / bottom-half model, and consider that >> now that there are interrupt *threads*, your ithread is also in the >> "top half". It's therefore OK to suspend. ("Sleep" is not quite >> correct here: a mutex wait is not a "sleep" state but instead is >> just a waiting, not-scheduled-to-run state. The precise difference >> is irrelevant at this level though.) > > I don't truely understand the "top-half/bottom-half" model you proposed, > but I think I get the idea of how things work now. Basically, we can > assume that if a thread is in the "bottom-half", then it should never > suspend(or, in the other words, be preempted). This is the case of the > "interrupt filter" in FreeBSD. On the other hand, if a thread is in the > "top-half", then it is safe to suspend/block. This is the case of the > "ithread". > > The difference between the "ithread" and "interrupt filter" things is > that ithread has its own thread context, while interrupt handling > through interrupt filter shares the same kernel stack. > > So, for ithread, we should use the MTX_DEF, which don't disable > interrupt, and for "interrupt filter", we should use the MTX_SPIN, which > disable interrupt. > > What really confuses me is that I don't really see how owning an > "independent" thread context(i.e ithread) makes a thread run in the > "top-half" and how sharing the same kernel stack makes a thread run in > the "bottom-half". > > I did read your long explanation in the previous mail. For the non-SMP > case, the "top-half/bottom-half" model goes well and I understand how > the *code* path/*data* path things go. But I cannot still fully > understand the model for the SMP case. Maybe you can draw something like > > ----- ----- > | |<-- top-half | | <-- top-half > | | | | > | | | | > | | | | > | |<-- bottom-half | | <-- bottom-half > ----- ----- > CPU1 CPU2 > > to make things less abstract. > > Thanks, > Yubin Ruan > >> It's not *great* to suspend here, but all your alternatives are >> *also* bad: >> >> * You may grab incoming data and stuff it into a ring buffer, and >> schedule some other thread to handle it later. But if the ring >> buffer is full you have a problem, and all you have done is push >> the actual processing off to another thread, adding more overhead. >> >> * You may put the device itself on hold so that no more data can >> come in (if it's that kind of device). >> >> On the other hand, if you are handling an interrupt but not in an >> interrupt thread, you are running in the "bottom half". It is >> therefore *not OK* to suspend. You must now use one of those >> alternatives. >> >> Note that if you suspend on an MTX_DEF mutex, and your priority is >> *higher* than the priority of whatever thread actually holds that >> mutex now, that other thread gets a priority boost to your level >> (priority propagation, to prevent priority inversion). So letting >> your ithread suspend, assuming you have an ithread, is probably your >> best bet. >> >> Chris >> > Sorry for the ugly format. The mail client sucks. Yubin Ruan From owner-freebsd-hackers@freebsd.org Tue Apr 11 20:21:28 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 32A5FD3ACB9; Tue, 11 Apr 2017 20:21:28 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-bl2nam02on0078.outbound.protection.outlook.com [104.47.38.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9A4DF1AF; Tue, 11 Apr 2017 20:21:27 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=LERGuygVnhSHc2ek/QihWWNIveDS2IXN6/Tq6XE3U9U=; b=SbrIfIBSbFGgDGuRwT/XQvAmCBsUU/VulI3p2SLOuPM9eYojQipNxzoCjlKvYOEzAjOzUMxpmgQFQd6yskWRlvNmW84lk1j8dPRCY1omCGPBkIvzCeXLbk+a86wB0EsjfpP3l3Ri4GCZxONd0SkX6UwSX4lhyiU/vrNvHo8Su4I= Received: from DM2PR0501CA0040.namprd05.prod.outlook.com (10.162.29.178) by BY1PR0501MB1109.namprd05.prod.outlook.com (10.160.103.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.5; Tue, 11 Apr 2017 20:21:25 +0000 Received: from SN1NAM02FT055.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e44::208) by DM2PR0501CA0040.outlook.office365.com (2a01:111:e400:5148::50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.5 via Frontend Transport; Tue, 11 Apr 2017 20:21:25 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp1.campus.ksu.edu; Received: from ome-vm-smtp1.campus.ksu.edu (129.130.18.151) by SN1NAM02FT055.mail.protection.outlook.com (10.152.72.174) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Tue, 11 Apr 2017 20:21:24 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp1.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3BKLM0s004639; Tue, 11 Apr 2017 15:21:22 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 68001248005; Tue, 11 Apr 2017 15:21:22 -0500 (CDT) Received: from mail-wr0-f182.google.com (mail-wr0-f182.google.com [209.85.128.182]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id 15271248004; Tue, 11 Apr 2017 15:21:20 -0500 (CDT) Received: by mail-wr0-f182.google.com with SMTP id o21so5073596wrb.2; Tue, 11 Apr 2017 13:21:20 -0700 (PDT) X-Gm-Message-State: AFeK/H1vWhcGvsElaH3P+Nd2zlfXYxgh/F/HLMFKTydqZ0wcnkgYM6lXZISW0xZkKNLlaZZUdTNsNVpCXJBKQA== X-Received: by 10.223.154.54 with SMTP id z51mr32463232wrb.76.1491942079266; Tue, 11 Apr 2017 13:21:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.39.134 with HTTP; Tue, 11 Apr 2017 13:20:58 -0700 (PDT) From: Kyle Evans Date: Tue, 11 Apr 2017 15:20:58 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Replacing libgnuregex To: , X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(6009001)(39400400002)(39410400002)(39850400002)(39450400003)(39860400002)(39840400002)(2980300002)(438002)(189002)(199003)(450100002)(61266001)(189998001)(221733001)(38730400002)(93516999)(63696999)(54356999)(8936002)(8676002)(3480700004)(512874002)(46386002)(88552002)(90966002)(45336002)(8576002)(42186005)(305945005)(50986999)(59536001)(106466001)(61726006)(2906002)(7116003)(9896002)(9686003)(356003)(5660300001)(498394004)(55446002)(75432002)(86362001)(84326002)(55456009); DIR:OUT; SFP:1101; SCL:1; SRVR:BY1PR0501MB1109; H:ome-vm-smtp1.campus.ksu.edu; FPR:; SPF:Pass; MLV:sfv; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; SN1NAM02FT055; 1:hq2osXGRM10eSK3Aok7WXISnSxjlzma8d0VxpkQsvBPSF+548kZXwwD9iQkl6ikgcWX7LS/lS1qd2+lEfzLrp00V6QExUDErzUoKZpGUMrQ+MwSGG9lbJ/Ybp0gWiJkqVQgXKT3w6MN2B0sxFDj7tsV3RzDF3/jxFCtXSI62E9vGhPMmD54wjd+VqiSfz8V83ThHYFMghmX/poJ1oVzmZfzmfngQcaTN44DzHl/KeCIh7+wdBT7fcip2YVGPYcp2teH5yDTMPZCn6qCMbWjYz333OWY2/3eakoDVrvWHUBvbVyQ5IBSe1xuvCbr3YKtMuvXP7i7mauba43lLTcZkWcSjNxwzVdMxfhAxosXymIGscVNR7pwc8Zape5WdbDH6fN/9j0//VP1P5wVUG5QAY7kCzjgkkumB+C8J5rv/JsSBKa0ojcWIU6gUm5qgT5GIbFWdGU0iwpsjRkPSaUbsPqynYyQ3vAhAitWWxPXj3w7E9m2/IDGrmUmi7y2zCZneABoTP3/1vpTuCRK8rECkDA2dV4oeOYDxKDn8vVNXvi4= X-MS-Office365-Filtering-Correlation-Id: 42b62dcd-2e20-470b-512f-08d48118535b X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081); SRVR:BY1PR0501MB1109; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 3:rrakiPXfYSx3fiLhj1bPAyROiL3JPKHviEx0tsdOZtxBfX747dl/VbXpBz4UVGb0DX02Hmt+zZsMXE0dKQKhk9VMz52/4y05P4lDnfRSQtJXnWSLKkYX5txLPZOqFShTnoz0LfiAMrX+qIiAnq9lBhY6KWtIm+Neeuu+b+tKO42YOjRKi75qZJgHfSF905j5IjsdHAW9kbt0vRydV+WFLWXTUB75JHFcvQefVSI8JFTPsp8ImahPGtVzYOvd/ti12hmuzYmQd5w+bkndSbXWOJcV3AIO1z/UT4mGxNJf9z+JTplm/q9eS6LStbTEqdQT1l3NJ6njomZaQARtPbJIkMS8iHj8AxgcL/qaPPf8hCtuiKEp0tswJlA1hOLQdFaW1V7rO4RkIoJQ2BZUKjlN9txn3fPcW+C6o0Q/uVl2qOiyK0MJEgZHoujFPIj9Oku7HzDrsEl2EQSJ9+DAfVzFlKxZsCiL3Y/9oIaiJCPAykJgi6CsKQvxrSIQzFWiZmy8 X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 25:dVavJMkz6eGxyAKzcMMkpHeHQi10NpT7Xn8L9W0pxbSP+N+kArIQ+iCA9ZJKrGwNS3mEyhQyeuKlaRTs3lf8CFOWxxhBKeJ5f9hpZcqvnvOVLHkrWrBEjXBNbcFWItbjYkAo1o8NbiWSw6G2p/JZL0g8rJafudms+A+X/2IE2JuAboWJixsxz4E/m0Mjkluj/ffKeeDHFDICbxsKYidByY11v6docWpP4WuWUfbriRLHZ2bqJg6p5CXhB5lwX8o8DtBqFfVPk6r3n8RSZS0EKcKZYN2Bt/sKqlV9ae3BMNe2FCVenqFycObxE95d+qHu3aZfPg9afIN/w3RbQR26gMaQ+1KKtU56HwEjRQ87VZlnAW6dkEbQomG5svRnhbfepFv7lpVwKLIOCmQuRB+xHyysNNnVZGQOYTaCdktfPeZqyR+quy3rWcvHr0v7ZzeRPRgp/wRZp60EAQ/pkNAfyg==; 31:ZMi+ao6IpOtF5la7Ii6fyzQjgh3HqJp8ek7vQmy9ImD+M8f/vwwovVKd9cMXB10BmMCHbWjMgoCoFIttkyP2Sg6y+PtWeTe7YIIkxRrzuJtlmDYI9PijZ6e+rtqXpXuL/kTyPp8JqpCvpSxs8p+orj/4rXubr1Vkl4siXD9IqElHCIa+YCtlB+XFNgH0O/Tqn9kctNDC/7milQAqOPgRbFpdvX9Z0GvOfQcZe+VrQnA8/C/lj56Ji54SWA4Mp0u/vxIX1Td1wy89i00nQ8IcTg== X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 20:zpxIKwzosPh7TqJ7dZ7nKRCJHm4hLJ1TsQL7d1TTa0fr6WnSfZR1OI8WPH2QGHLSA426XXNm8qYShdotGgs41/c9LzcTMfCis3G5WSSORzvwzhQj3YFPpzHcWTXgawe+8bjv/RO6f4ZxNJpH6t/nja+ys6UrFCF8nTT8ahCTZHq2KctTbbBU1LOn1x8t+8w2k7e2MWYa0PI3aIgvjwfb0CZtVOh44mfNM6M5GEonB1UFVbqJsFhGeeXIKNYUVY6FvNjJdwFZ9sbLyiJDKTd4YXh1OSVQ9Unq1KgUpcFzZ63ZhrkACdqlHODmdeh8ziUoB3AvfIwD83iuL3kbX/L5F/xdxIp7MQMTZLWUrQp8TdyjxfU7prEXztxzqc+Q4ukrzElrJuI63sCyj4ZXAqFXd7DyXEy/opd/CZj1JY1ymshwdJLss0wtXGvXsq00JEJ9I/qBXPCxADuOwUeoagXp6BQ2P2O+hDcgOe2uwyqSTEXgbDsBhUZVpZREqhtTBodW X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(8121501046)(13015025)(5005006)(13017025)(13018025)(13024025)(13023025)(10201501046)(93006095)(93004095)(3002001)(6041248)(20161123564025)(20161123560025)(20161123562025)(20161123555025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(6072148); SRVR:BY1PR0501MB1109; BCL:0; PCL:0; RULEID:; SRVR:BY1PR0501MB1109; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 4:svs9ctkkC/s/fBvWc3j0M0p64XOo/ta2l7ONTF1+1o7i/k3hfkHKxeIl3n2lX2EODHiMVnWjwmrZX5+PlejTmtjLuJlo6/GReS37kfYLr/O0HBqXCmuWqvuE90nKtmZdDxwzwYVU9r6IBXkR0wrh0wJ3OLOaLfvj2VQnnQTBU3hNuv4Qj0SpHG+IGN+r+je9iTpSaqeOP4GbbX0RGGbcugmQS34sIC86Mg1kMzx5l5q9QF2WMFgJ5m4mh7pBDHe9jeKD8mebDNIqt5YK5DfSCd+YKemHa3z8AKPJuCGI5f24NFBCQxI4IzC/M027ewX9g5gFQaoS3HdzXOqdLq3htXdK6IKe5IXc6lTh0p3LpCzxr5DlENa0OVagqwUoa8lrEBAFzunVZiz5oKTMV3M6UDeawljm8ykriAU/fUGyJpk0gogZ8Xk/G4ZyoFEQBEvX81TWMTr7winJYLsZLspUj5QNg0cajAPl7tbzJe7neg/AVZIDVTK0I5MrtGPpIRHTAcKvxzcKB7qHpl0WUarrEtOIgUiPw+qanIZPSlbBByf8kJxdOCAi27Cok9qR1FAZx4egmA8i926S/NWsW0mdDx3Nw+SdxKUXlR6Jh4u52UXFoBcSi8VH7Pwt7N0lnWlgcPMp/OGQi2aQ9Zj92Ih9AeQEBlwVkl+URhLoihCjXz7/CHul84EcXdb8fYgGwinEj4hpmufEfwZtOtgGsOiT1hlpNISmtcR45Sh4OWRi90ihDCmBvzw5G4s669NJlcTKST/7YK6KObh7kRyEO4hNID3Hz6SClZX9WmrgkX4+yKXa9QaBksjVyYs4LS4vBuhasr+f7fPUsUolAbT9L9tS7A== X-Forefront-PRVS: 0274272F87 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BY1PR0501MB1109; 23:WMolZbBmBP7KUXB9EDNNxeGt+P4v1HM0/iFSaqN?= =?us-ascii?Q?W5C+omtO7crhMju7dWSWyWYj0LdFtC/2CAZ08u5FvwtHwcorr9RNfkigazL7?= =?us-ascii?Q?5NyTQIiOcUeWaLUCEfXOZOv6haHqONmy5LuxgWPND9lIGZlipFvA+oyuBZ8F?= =?us-ascii?Q?/rAPBUrVonXe3F74pszY5F6B60XmrcZ0Rf67GX3jQaPacja7o9tFakEQb+MU?= =?us-ascii?Q?9JSF3SbZvom5EaQ8suEw+X/uVyPbDN52ELFlUkbjvbEcB7NcEaJJYUQceycR?= =?us-ascii?Q?oh5NSgwVC2kSJcDT0VjbnwcpXuR4Wea/Bo9rSGL4aihH7XjJbto8zhrgUIjq?= =?us-ascii?Q?XK+EGeRNfpAMMNgcnx2nlKdYdAEr3NNYdtP+wbn9gycbsXLvEojL8d5Vywrj?= =?us-ascii?Q?kJ/ZalwDe57C/FO7TsmVqeoLkkEJ6dTBVUcRePCjGC809ybp3f9h63RiNNeA?= =?us-ascii?Q?oEzSwmrTlFPQvyPWyJHL3PQ0FcYECoo7aLDTQ3iyuH6yCUq3bnAEMaMVVjIo?= =?us-ascii?Q?IBqJ5bg9BUDHQh4OpJPbtcyBSgtQfTFr6cMc9WfRLVolrXfACrSGc9PXb5Ef?= =?us-ascii?Q?u/JMQOXYKnx1hWSDOXvOD4w2BqSO8o0bW4qyUsA9V7im8t6oe9mvkEZuTJpb?= =?us-ascii?Q?pzdfXHl9NJdy6F7pqYig0pot/j+7afYNfcVJudVj3549mXIYTVDFZkqjlFNk?= =?us-ascii?Q?WDe2jY1onfXWTUrv6mg4T/0dLG0nG47I5DW+AdaI3Pmxh4onsa8ndlLRnD34?= =?us-ascii?Q?vP58kIYsaEeO9VBlSliSlNMH38c/zxY+X5f5twxDWKPxwUtFI7fxyY8MSpQt?= =?us-ascii?Q?Vn6PPnNbcIas2xGektoqYta0828BmDWI4mQWvJfky5LSPMRI0CUFMvecvfxB?= =?us-ascii?Q?vq6iiEbH3bL/j+Q628kJPZ3exvcKxPJGhZyrNXY4ZHmXGJvUgOaRywOpJrrk?= =?us-ascii?Q?yz+LiJqEl44suw/g+Sjs/nUzaidK1hvBfJzIs/6ZE6yUV1pChsRB/BjaAPXx?= =?us-ascii?Q?Hy0HQGU7wdegFavXKd/K178kBleiyLxYacZB4Pt6ETIwirsFtDnnQrcWb1mp?= =?us-ascii?Q?WAopS48XOQCNqWiqyY1nNEU7gRyPLj3etKkd6WDsPhBiTuTF9K2jqpK7p9I+?= =?us-ascii?Q?wdPsrwGOCXLsX9JeTVzxw24N5+TE6DOPZ?= X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 6:qY8LDc7zBbmnDEk2VDTsGc/HgcDMezpLUFzlw+qwNGt568gAU9nMQQKAVQaa5riDimqbzW1rBFC4jYasDCYCrfI65kpP45/13QX4NQW20pOHyvejn68RTtnlS5goESKYJi34DFEiNcYzKhJjHd7ORyWoQf74x2lbeo6fYVYKih1Et/h4KRKXJxvdAFpkeDJV46D8aLkMzFG0DkO6LQoJAdbJNyLdriQ9Gpj/IuydAXtIU1pT2BUvTN6lQnlRoAyI1AuKBioB+v7H2tDwPfnNybBUZr2NHHeROcu6NdgRKQTW+OSkfRmHqljK9cuhotcznXFwKZp6qgzhmyHG8M97AkxIR8ol2dWEaVhAPcsFrSZCfpKckBrprIx8OVYmSp8582VyFmv5eYIIkrWoaAApDcEtxskak3RA26s4alv9LFo35ryvBGYmwSVLhMcKRcNGS0j8x19Id2pCb/Ga/X5Krw==; 5:OZPbTr4fMeRJ9wq1fw5pPZzfxCzZGJv6ZjYuxneJtSW3bw5R7EKdF3ylfVNNQMqp9kH7afGrIEGw0JWomuPqhFoecLPlBYldpXEv2UIG8C9bXIsA1QaqkaZidNJCodkeGEGaGrWgalHbzcr5dfPP/A==; 24:/3Se69i1nM8uR0B6UZUErGgaua8PkdXdl9//vjf8EvQZxM0YM1Qbux89Z7KyMWJkyEURZktJWPMJf31PZhI7pEhv6dIsyPi7q/rf3TGh/CI= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BY1PR0501MB1109; 7:JCAHkLhqbwha/8jc3tswCD+CVIpHwdG1aa2yYTUpFRPE0AKfAjU0LSDVb/ibU4nOJOzf18dfzQfzuS7/HOgyCeudb00Pay5NeF1Lx8HBx6tlSbS64YjvvhuEHeRn+8GQIBjRaJdKvZplRkBsXDbRBXB/8cvZtUGE8BorFvaT4jfPSOgzuz1YkdsxAF81F5iJMxGP2ByYw0/3HFrIUVbhZElWuBAXhbq9zSLs4W0tFrUKcMl32SHIeBrLlW/C+TxNHuy5z8FBJOkmD+G0h1YIt2e4jHdKUApBSH+XLmWJ94sJ3EHAoXKAcg0RvBTRwCarWraopU/PefryLwX7Lt1agQ==; 20:lb2IcOAyn3s60wYWFwC8D59ma4AjWmXnRJj0pXYTtd26Waqe8qjYb0D7u95Jf1WnzlF7MmxMgGyk06gVIgX51HmlrJpQ9y83CjKVeujQu1fDcvlRgQgMVXqYc6t00PPb8pHKAbxDoGgCqHJor2sFmiawUsj+7iMTe1wHdD6ZXrY= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Apr 2017 20:21:24.1899 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp1.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR0501MB1109 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 20:21:28 -0000 Hello! To start, I'm cross-posting to freebsd-hackers@ and freebsd-standards@, since it seems to pertain to both as a question of how strictly we follow the standards, as well as potential approach. The following e-mail will somewhat outline my questions, then my personal opinion. == Almost objective, obviously biased stuff == The first question we must answer- is it strictly necessary necessary that we maintain a separate library for gnuregex, or would it be feasible/desirable to extend libc/regex to include GNU extensions? There's obvious benefits to both, but the former (a drop-in replacement for libgnuregex) seems like it's going to be more difficult to find. We only have two base-consumers of libgnuregex (at the moment), but one must consider the potential other consumers since this doesn't seem to be a private library. On the other hand, I think I could fairly easily implement most of these into libc/regex. Here's a summary of what this option entails adding to libc/regex, from what I've found: * Empty subexpressions(*) * Add missing quantifiers to BREs: \?, \+ * Add branching to BREs: \| * Add backreferences (\1 through \9) to EREs * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], [[:space:]], and [^[:space:]] respectively * Add word boundaries and anchors: ** \b: word boundary ** \B: not word boundary ** \<: Strt of word ** \>: End of word ** \`: Start of subject string ** \': End of subject string (*) I didn't actually find anything explicitly stating this as a GNU extension, but it's certainly not conformant to POSIX specifications to use, it gets used a tiny bit in some ports, and we implement a workaround in bsdgrep(1) for the simplest case of empty expressions ("") to match everything and produce zero length matches. The main benefit of this is not having to maintain a completely separate regex parser and the potential for inconsistencies that come along with it. The downside is that that would seem to promote expressions that are not strictly POSIX conformant. Is this a problem? Is this a problem worth worrying about? == Opinion == My personal opinion is that we should go the latter route and implement these features into libc/regex as a default behavior. Perhaps with a flag or something so that an application *could* opt out of GNU extensions ("strict POSIX" type of flag) if it so chooses or finds them undesirable, but that may not be deemed necessary. Ultimately, the GNU extensions are just that- extensions. There's no direct harm that I can think of in accepting them in our libc, and they do indeed provide some sensible features with little cost added to our current implementation. I'd personally like to have one parser that does it all so that when a regex-parsing bug does come in, there's no initial triage *at all* of whether it's a gnuregex bug or a libc/regex bug. Thoughts? What all have I missed? Thanks, Kyle Evans From owner-freebsd-hackers@freebsd.org Tue Apr 11 22:10:38 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0553FD3ADB0 for ; Tue, 11 Apr 2017 22:10:38 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D59A3957 for ; Tue, 11 Apr 2017 22:10:37 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3BMAVhu093703 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 15:10:31 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3BMAVSe093702; Tue, 11 Apr 2017 15:10:31 -0700 (PDT) (envelope-from torek) Date: Tue, 11 Apr 2017 15:10:31 -0700 (PDT) From: Chris Torek Message-Id: <201704112210.v3BMAVSe093702@elf.torek.net> To: f.v.anton@gmail.com, freebsd-hackers@freebsd.org Subject: Re: On COW memory mapping in d_mmap_single In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Tue, 11 Apr 2017 15:10:31 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 22:10:38 -0000 >Yes, all vCPUs are locked before calling mmap(). I agree that we don't >need 'COW', as long as we keep all vCPUs locked while we copy the >entire VM memory. But this might take a while, imagine a VM with 32GB >or more of RAM. This will take maybe minutes to write to disk, so we >don't actually want the VM to be freezed for so long. That's the >reason we'd like to map the memory COW and then unlock vCPUs. You'll need to save the device state while holding the CPUs locked, too, so that the virtio queues can be in sync when you restore. >It's a OBJT_DEFAULT. It's not a device object, it's the memory object >given to guest to use as physical memory. Your copy code path is basically a simplified vm_map_copy_entry() as called from vmspace_fork() for the MAP_INHERIT case. But if these are OBJT_DEFAULT, shouldn't you be calling vm_object_collapse()? See https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/vm/vm_map.c#L3170 (Maybe src_object->handle is never NULL? There are several things in the VM object code that I do not understand fully here, so this might be the case.) >>Next, how do you undo the damage done by your 'COW' ? >This is one thing that we've thought about, but we don't have a >solution for now. I agree it is very important, though. I figured that >it might be possible to 'unmark' the memory object as COW with some >additional tricks. I think you may be better off doing actual vm_map_copy_entry() calls. I am assuming, here, that snapshot-saving is implemented by sending a request to the running bhyve, which spins off a thread or process that does the snapshot-save. If you spin it off as a real process, i.e., do a fork(), you will get the existing VM system to do all the work for you. The overall strategy then looks something like this: handle_external_suspend_or_snapshot_request() { set global suspending flag /* if needed */ stop all vcpus signal virtio and emulated devices to quiesce, if needed if (snapshot) { open snapshot file pid = fork() if (pid == 0) { /* child */ COW is now in effect on memory: save more-volatile vcpu and dev state pthread_cond_signal parent that it's safe to resume save RAM state close snapshot file _exit(0) } if (pid < 0) ... handle error ... /* parent */ close snapshot file wait for child to signal OK to resume } else { wait for external resume signal } clear suspending flag resume devices and vcpus } To resume a snapshot from a file, we load its state and then run the last two steps (clear suspending flag and resume devices and vcpus). This way all the COW action happens through fork(), so there is no new kernel side code required (Frankly, I think the hard part here is saving device and virtual APIC state. If you have the vlapic state saving working, you have made pretty good progress.) Chris From owner-freebsd-hackers@freebsd.org Tue Apr 11 23:11:06 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C627CD3A0D1 for ; Tue, 11 Apr 2017 23:11:06 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B27191B24 for ; Tue, 11 Apr 2017 23:11:06 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3BNB45w094086 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 16:11:04 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3BNB4fc094085; Tue, 11 Apr 2017 16:11:04 -0700 (PDT) (envelope-from torek) Date: Tue, 11 Apr 2017 16:11:04 -0700 (PDT) From: Chris Torek Message-Id: <201704112311.v3BNB4fc094085@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com In-Reply-To: <4768e26a-cdec-6f40-1463-ece9847ca34d@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Tue, 11 Apr 2017 16:11:04 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2017 23:11:06 -0000 >The difference between the "ithread" and "interrupt filter" things >is that ithread has its own thread context, while interrupt handling >through interrupt filter shares the same kernel stack. Right -- though rather than "the same" I would just say "shares a stack", i.e., we're not concerned with *whose* stack and/or thread we're borrowing, just that we have one borrowed. >So, for ithread, we should use the MTX_DEF, which don't disable >interrupt, and for "interrupt filter", we should use the MTX_SPIN, which >disable interrupt. Right. >What really confuses me is that I don't really see how owning an >"independent" thread context(i.e ithread) makes a thread run in the >"top-half" and how sharing the same kernel stack makes a thread run in >the "bottom-half". It's not that it *makes* it run that way, it's that it *allows* it to run that way -- and then the scheduler *does* run it that way. >I did read your long explanation in the previous mail. For the non-SMP >case, the "top-half/bottom-half" model goes well and I understand how >the *code* path/*data* path things go. But I cannot still fully >understand the model for the SMP case. It's fundamentally fairly tricky, but we start with that same first notion: * If you have your own state (i.e., stack), you can be suspended (stopped in the scheduler, giving the CPU to other threads): *your* (private) state is preserved on *your* (private) stack. * If you have borrowed someone else's state, anything that suspends you, suspends them too. Since this may deadlock, you are not allowed to do it at all. Once we block interrupts locally (as for MTX_SPIN, or automatically inside a filter style or "bottom half" interrupt), we are in a special state: we may not take *any* MTX_DEF locks at all (the kernel should panic if we do). This in turn means that data structures are protected *either* by a spin mutex *or* by a default (non-spin) mutex, never both. So if you need to touch a spin-mutex data structure from thread-y ("top half") code, you obtain the spin mutex, and now no interrupts will occur *on this CPU*, and as a key side effect, you won't move *off* this CPU either. If an interrupt occurs on another CPU and it goes to take the spin lock that protects that CPU, it loops at that point, not switching tasks, waiting for the MTX_SPIN mutex to be released: CPU 1 CPU 2 ----------------------------|----------------------------- func() { | ... code not involving mtx mtx_lock_spin(&mtx); | ... do some work | mtx_lock_spin(&mtx); /* loops */ . | [stuck] . | [stuck] . | [stuck] mtx_unlock_spin(&mtx); | [unstuck] ... | do some work If an interrupt occurs on CPU 2, and that interrupt-handling code wants to touch the data protected by the spin lock, that code obtains the spin lock as usual. Meanwhile the interrupt *cannot* occur on CPU 1, as holding the spin lock has blocked interrupts. So the code path on CPU 2 blocks -- looping in mtx_lock_spin(), not giving CPU 2 over to the scheduler -- for as long as CPU 1 holds the spin lock. The corresponding code path is already blocked on CPU 1, the same way it was back in the non-SMP, single- CPU days. This means it is unwise to hold spin locks for long periods. In fact, if CPU 2 waits too long in that [stuck] section, it will panic, on the assumption that CPU 1 has done something terrible and the system is now hung. This is also waht gives rise to the constrant that you must take MTX_SPIN locks "inside" any outer MTX_DEF locks. Chris From owner-freebsd-hackers@freebsd.org Wed Apr 12 00:10:58 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB73CD3A509 for ; Wed, 12 Apr 2017 00:10:58 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 9BB3CBD0 for ; Wed, 12 Apr 2017 00:10:58 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 98005D3A508; Wed, 12 Apr 2017 00:10:58 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 979D4D3A507 for ; Wed, 12 Apr 2017 00:10:58 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-ua0-f176.google.com (mail-ua0-f176.google.com [209.85.217.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5B4E2BCF for ; Wed, 12 Apr 2017 00:10:57 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-ua0-f176.google.com with SMTP id q26so7748926uaa.0 for ; Tue, 11 Apr 2017 17:10:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=13A4uRpsv4+eJqty5OBONh8XJhB+haky4HDmP7qTYn4=; b=d5PkPFbrnmO6naA0+o6+D3RMyViH1UCdwk7sZM3YxRDBGVzbrlpac/OYByhMUqBjQz RoJcgGC6yx4rNKhi194NwpdNoFYgbnnRGX5FjUD9/vW0JL32lvVbkFV1ygrlETC7QKgE JPPR0W5i91RHEeumPVnZ4NK6Qgts1Owe5h7nc0aVpFLGrp0oyxc2zzeuF9gA34ZY858p FH9fdV9n0gtz+ZJY7Ct5lMCXbYDqY2NuvYcGmC7buS9JsY2L2My7u6wpOsxe17KI+BIs I6J2Pi9bdRE7FhlEIH63Z9CwiklL9vlQfx+e3p1p0WquhZ4o0pSljeyjcehIV8H0AvbX Gygw== X-Gm-Message-State: AN3rC/4120Bn8fwi/JDZCCCPYRXjLyU/BKt8FVtGi6FKFpqihPoL47PJ61OEU9ueFt3t9Q== X-Received: by 10.176.80.65 with SMTP id z1mr122700uaz.99.1491954434998; Tue, 11 Apr 2017 16:47:14 -0700 (PDT) Received: from mail-ua0-f169.google.com (mail-ua0-f169.google.com. [209.85.217.169]) by smtp.gmail.com with ESMTPSA id 21sm4824243vkg.38.2017.04.11.16.47.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 16:47:14 -0700 (PDT) Received: by mail-ua0-f169.google.com with SMTP id u103so7508393uau.1 for ; Tue, 11 Apr 2017 16:47:14 -0700 (PDT) X-Received: by 10.159.32.163 with SMTP id 32mr139999uaa.160.1491954434017; Tue, 11 Apr 2017 16:47:14 -0700 (PDT) MIME-Version: 1.0 Reply-To: cem@freebsd.org Received: by 10.103.13.3 with HTTP; Tue, 11 Apr 2017 16:47:13 -0700 (PDT) In-Reply-To: References: From: Conrad Meyer Date: Tue, 11 Apr 2017 16:47:13 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Dtrace oddity To: Christopher Bowman Cc: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 00:10:58 -0000 On Tue, Apr 11, 2017 at 12:16 AM, Christopher Bowman wrote: > Here is the oddity: when I run Dtrace and then run my test program I get the following output from Dtrace: > > crb@retread:60> dtrace -n 'syscall:freebsd:mmap:entry /execname == "test"/ {}' > dtrace: description 'syscall:freebsd:mmap:entry ' matched 1 probe > CPU ID FUNCTION:NAME > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > 0 63401 mmap:entry > > I think Dtrace is indicating that the mmap syscall was called 12 times by my test program yet I can see how the program below would have done that. A configuration file for dynamic linking is mapped; libc needs to be mapped (several different regions); jemalloc sets up some memory for allocations with anonymous mmap. So this is not unreasonable as part of crt0 / program startup. Best, Conrad From owner-freebsd-hackers@freebsd.org Wed Apr 12 02:32:29 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E495D3A59C for ; Wed, 12 Apr 2017 02:32:29 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com [IPv6:2607:f8b0:400e:c00::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0E24BBC9 for ; Wed, 12 Apr 2017 02:32:29 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pf0-x242.google.com with SMTP id c198so2482863pfc.0 for ; Tue, 11 Apr 2017 19:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=JpJKDgntHWxojUVtXwn4X9vTiZKDsMvcHbvnhPGm9Lk=; b=TARhOEexxZhw3TGDnXFx/XwLmyLp1+sSKwmIkN9E3a7BJdepmNCD/xS3QALEf0gXcX 4KSVbpHZ1tzCkqsneJpe4OmxTlIdIDWqc7EqftY860WYQ3k0E+2O4on0IciGTDl3Knm8 TwU9VOQcntr47eHq8GvFyqHUkLtQN1GgujLVdZA1DiZAme5mcbM09I5o3WqlP15xItLB ytMTNde4w782ALEashNHOomhgBqBqf/FnZeKoCZLsO3Ok9IvNxSLlcbs6lu39ip2kVaY HchglRzjDo7UxQko5BUdpHC7Eun1Q6QbklRmdbqiCLYjXCmr/3pTlMLarw3K7+ILtQ4V JtFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=JpJKDgntHWxojUVtXwn4X9vTiZKDsMvcHbvnhPGm9Lk=; b=ZNvVAnBGn9GnlUVQyQGFBB/1hmNiUjG3jsiXvRBGOXIawKrLwITKQfv6b+N9ZjuO/N L1VFMNighpBW1A5aiA2Hlw5TB306o0KHSR7FyRVSRDV60eZfOb9Ki5IqS2RPv6UyBiX+ QVcJRoULkhJUArQu9UhKAElmcazImwNBJv+ZI10IlvkrwkrK+wiXyEjwVL2njiCCWvH3 a9B8zpF8LcCLopFwmeqlynBrM1cc4thO3RSbdkM/sU4nvdL0PN6if+gWhXxEmjoegKlw 1q+ijWYi+oGWMs00qKggRkOaa8VMNYf0uI5WUdLIrwP/zv0i7n/QyXtf0KSAV3svt078 Tr+A== X-Gm-Message-State: AFeK/H189taxi12rLIYis3AzSfSELyd6HjAWtWeQFDGLalQjpQ3QqYFZxjxPSrs3EE8z2A== X-Received: by 10.99.94.66 with SMTP id s63mr62735071pgb.34.1491964348499; Tue, 11 Apr 2017 19:32:28 -0700 (PDT) Received: from [192.168.2.211] ([116.56.129.146]) by smtp.gmail.com with ESMTPSA id r17sm32995928pgg.19.2017.04.11.19.32.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 19:32:27 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704112311.v3BNB4fc094085@elf.torek.net> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com From: Yubin Ruan Message-ID: <99e3673e-d490-faef-359d-c6ec8a36ee0c@gmail.com> Date: Wed, 12 Apr 2017 10:32:18 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201704112311.v3BNB4fc094085@elf.torek.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 02:32:29 -0000 On 2017年04月12日 07:11, Chris Torek wrote: >> The difference between the "ithread" and "interrupt filter" things >> is that ithread has its own thread context, while interrupt handling >> through interrupt filter shares the same kernel stack. > > Right -- though rather than "the same" I would just say "shares > a stack", i.e., we're not concerned with *whose* stack and/or > thread we're borrowing, just that we have one borrowed. > >> So, for ithread, we should use the MTX_DEF, which don't disable >> interrupt, and for "interrupt filter", we should use the MTX_SPIN, which >> disable interrupt. > > Right. > >> What really confuses me is that I don't really see how owning an >> "independent" thread context(i.e ithread) makes a thread run in the >> "top-half" and how sharing the same kernel stack makes a thread run in >> the "bottom-half". > > It's not that it *makes* it run that way, it's that it *allows* it > to run that way -- and then the scheduler *does* run it that way. > >> I did read your long explanation in the previous mail. For the non-SMP >> case, the "top-half/bottom-half" model goes well and I understand how >> the *code* path/*data* path things go. But I cannot still fully >> understand the model for the SMP case. > > It's fundamentally fairly tricky, but we start with that same first > notion: > > * If you have your own state (i.e., stack), you can be suspended > (stopped in the scheduler, giving the CPU to other threads): > *your* (private) state is preserved on *your* (private) stack. > > * If you have borrowed someone else's state, anything that suspends > you, suspends them too. Since this may deadlock, you are not > allowed to do it at all. clear. How can I distinguish these two conditions? I mean, whether I am using my own state/stack or borrowing others' state. > Once we block interrupts locally (as for MTX_SPIN, or > automatically inside a filter style or "bottom half" interrupt), > we are in a special state: we may not take *any* MTX_DEF locks at > all (the kernel should panic if we do). > > This in turn means that data structures are protected *either* by > a spin mutex *or* by a default (non-spin) mutex, never both. So > if you need to touch a spin-mutex data structure from thread-y > ("top half") code, you obtain the spin mutex, and now no interrupts > will occur *on this CPU*, and as a key side effect, you won't move > *off* this CPU either. If an interrupt occurs on another CPU and > it goes to take the spin lock that protects that CPU, it loops > at that point, not switching tasks, waiting for the MTX_SPIN mutex > to be released: > > CPU 1 CPU 2 > ----------------------------|----------------------------- > func() { | ... code not involving mtx > mtx_lock_spin(&mtx); | ... > do some work | mtx_lock_spin(&mtx); /* loops */ > . | [stuck] > . | [stuck] > . | [stuck] > mtx_unlock_spin(&mtx); | [unstuck] > ... | do some work > > If an interrupt occurs on CPU 2, and that interrupt-handling code > wants to touch the data protected by the spin lock, that code > obtains the spin lock as usual. Meanwhile the interrupt *cannot* > occur on CPU 1, as holding the spin lock has blocked interrupts. > So the code path on CPU 2 blocks -- looping in mtx_lock_spin(), > not giving CPU 2 over to the scheduler -- for as long as CPU 1 > holds the spin lock. The corresponding code path is already > blocked on CPU 1, the same way it was back in the non-SMP, single- > CPU days. Things become clearer now. Thanks for your reply. If I understand correctly, which kind of lock should be used depends on which thread model(i.e "thread filter" or "ithread") we use. If I want to use a lock, I must know in advance which kind of thread model I am in, otherwise the interrupt handling code might cause you deadlock or kernel panic. The problem is, how can I tell which thread model I am in? I am not so clear about the thread model things and scheduling code of FreeBSD... > This means it is unwise to hold spin locks for long periods. In > fact, if CPU 2 waits too long in that [stuck] section, it will > panic, on the assumption that CPU 1 has done something terrible > and the system is now hung. > > This is also waht gives rise to the constrant that you must take > MTX_SPIN locks "inside" any outer MTX_DEF locks. What do you mean by "must take MTX_SPIN locks 'inside' any outer MTX_DEF locks? Regards, Yubin Ruan From owner-freebsd-hackers@freebsd.org Wed Apr 12 03:57:19 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16F0DD37B11 for ; Wed, 12 Apr 2017 03:57:19 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: from mail-pg0-x231.google.com (mail-pg0-x231.google.com [IPv6:2607:f8b0:400e:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E20D9A3E for ; Wed, 12 Apr 2017 03:57:18 +0000 (UTC) (envelope-from crb@chrisbowman.com) Received: by mail-pg0-x231.google.com with SMTP id 81so8220328pgh.2 for ; Tue, 11 Apr 2017 20:57:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisbowman-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=qq6KklBUCgjaidDLjYU4IdnW+9X6goX8WPcZ5iJ03lg=; b=VEiOU9624Ii7TLINxtJ7dTrnY/ZX9IkM1EzTPbidmVRO0xv0yLmQbpjKWEhnAfqpW3 60dv1yAVnFHlueq+8PGhVPHCLxVBAeMW2IJOy5OUJpH37MHLuE7o6xyrb84msYQTkXCy aS0jPYVtZybGmDzi8tAe7Z8hroJtGF9CF+4JW8wIv9AES88x3Ge8Xu0UrYPbwBfsg47y q0+QVzeTX5pUAwmlGgLdrytSe2VpTsZG6gVRYMrmFLBvyoMHJN3jWr1vdSG45fXAPlRk hC09LK3KtZjpyV33jT/696PmR9se/2GqhpZWBVUFAJIdQuHTkxBCsgwTA06Dx6dJ3h/2 0+XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=qq6KklBUCgjaidDLjYU4IdnW+9X6goX8WPcZ5iJ03lg=; b=hdcG9FZ+kjXr7x6Ir6mzwBwh7xAn1GF4OPNqvtwxn774YzB+RBVkB4igGQjHVuUcp4 cZVY+JA+rv7iMp6lN1xx3FUliUnLbCQntMpLJLwdQY5/dM3fkid/a2FiugrlHK4nipL3 pPqbkz930YNariILzPAHcCYCMGMnYI+JO4QMb5elxodNYPJB8UfXTuLVT2P1/66O6Nba hbI7tpjxkpfn8m5zcQYZg9NjcyjqWz2r+vDyo6DavGIXN6LDYzqcrhYkAdpUpdcJmkvC byz10BPyfH5KUXnUBkC31iHFqskqh4ics882DeugDIsyRZZ0gYqdzRr10fnvpkykZZGK g9aw== X-Gm-Message-State: AFeK/H2dxjjkMDg7GSTNRty2SJ+G99irTbqrioaAiXu991JK70V/BFHLTI9R8NSy7c0+Zg== X-Received: by 10.99.97.12 with SMTP id v12mr65838575pgb.124.1491969438431; Tue, 11 Apr 2017 20:57:18 -0700 (PDT) Received: from ?IPv6:2601:647:4e00:bbb5:a1e9:e0d1:714c:d747? ([2601:647:4e00:bbb5:a1e9:e0d1:714c:d747]) by smtp.gmail.com with ESMTPSA id v86sm33094945pfa.86.2017.04.11.20.57.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 20:57:17 -0700 (PDT) From: "Christopher R. Bowman" X-Google-Original-From: "Christopher R. Bowman" Message-Id: <15DF9D2C-40A4-4341-AE7E-E8A776ED3F09@ChrisBowman.com> Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Dtrace oddity Date: Tue, 11 Apr 2017 20:57:16 -0700 In-Reply-To: <20170411151426.3b760182@fabiankeil.de> Cc: freebsd-hackers@freebsd.org To: Fabian Keil References: <20170411151426.3b760182@fabiankeil.de> X-Mailer: Apple Mail (2.3273) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 03:57:19 -0000 Fabian, That was hugely helpful. I should have known about the extra = mmap sys calls, but sometimes your mind only sees what it expects to = see. Checking for negative values on open is also the right thing to do = (I had mis-read the man page to imply that zero indicated a failure to = open). But the real help was putting one of the flags for mmap. I = don=E2=80=99t think FreeBSD used to check for that as I have a vague = recollection that this code used to work on a pervious version. Thanks SO SO much for the help! Christopher -------- Christopher R. Bowman email: crb@ChrisBowman.com World Wide GSM cell: +1 (408) 476-2299 > On Apr 11, 2017, at 6:14 AM, Fabian Keil = wrote: >=20 > Christopher Bowman wrote: >=20 >> The man page lists a bunch of reasons for EINVAL so I want to >> investigate this and I don=E2=80=99t quite know good strategies to = debug the >> kernel (yet) so I thought I=E2=80=99d experiment with Dtrace a bit. = Here is the >> oddity: when I run Dtrace and then run my test program I get the >> following output from Dtrace: >>=20 >> crb@retread:60> dtrace -n 'syscall:freebsd:mmap:entry /execname =3D=3D = "test"/ {}' >> dtrace: description 'syscall:freebsd:mmap:entry ' matched 1 >> probe CPU ID FUNCTION:NAME >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >> 0 63401 mmap:entry=20 >>=20 >> I think Dtrace is indicating that the mmap syscall was called 12 = times >> by my test program yet I can see how the program below would have = done >> that. >=20 > A bunch of mmap syscalls occur before main is even entered. > Try running your program with truss to see what's going on. >=20 >> Here is my program: > [...] >> printf("opening device %s\n", argv[1]); >> int device =3D open (argv[1], O_RDWR); >> if (device =3D=3D 0) { >=20 > You should check for -1 here. >=20 >> void *pa =3D mmap (0, 4095, PROT_READ | PROT_WRITE, 0, device, = 0); >=20 > No flags? =46rom the mmap man page: >=20 > | [EINVAL] None of MAP_ANON, MAP_PRIVATE, MAP_SHARED, or > | MAP_STACK was specified. At least one of = these flags > | must be included. >=20 > Fabian From owner-freebsd-hackers@freebsd.org Wed Apr 12 07:55:37 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73D27D3A1C8 for ; Wed, 12 Apr 2017 07:55:37 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 51020F45 for ; Wed, 12 Apr 2017 07:55:36 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3C7tYdL016700 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 12 Apr 2017 00:55:34 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3C7tYUH016699; Wed, 12 Apr 2017 00:55:34 -0700 (PDT) (envelope-from torek) Date: Wed, 12 Apr 2017 00:55:34 -0700 (PDT) From: Chris Torek Message-Id: <201704120755.v3C7tYUH016699@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com In-Reply-To: <99e3673e-d490-faef-359d-c6ec8a36ee0c@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Wed, 12 Apr 2017 00:55:34 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 07:55:37 -0000 >clear. How can I distinguish these two conditions? I mean, whether I >am using my own state/stack or borrowing others' state. You choose it when you establish your interrupt handler. If you say you are a filter interrupt, then you *are* one, and the rest of your code must be written as one. Unless you know what you are doing, don't do this, and then you *aren't* one and the rest of your code can be written using the much more relaxed model. >What do you mean by "must take MTX_SPIN locks 'inside' any outer >MTX_DEF locks? This means that any code path that is going to hold a spin-type lock must obtain it while already holding any applicable non-spin locks. For instance, if we look at we find these: #define PROC_STATLOCK(p) mtx_lock_spin(&(p)->p_statmtx) #define PROC_ITIMLOCK(p) mtx_lock_spin(&(p)->p_itimmtx) #define PROC_PROFLOCK(p) mtx_lock_spin(&(p)->p_profmtx) Let's find a bit of code that uses one, such as in kern_time.c: https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_time.c#L338 (kern_clock_gettime()). This code reads: case CLOCK_PROF: PROC_LOCK(p); PROC_STATLOCK(p); calcru(p, &user, &sys); PROC_STATUNLOCK(p); PROC_UNLOCK(p); timevaladd(&user, &sys); TIMEVAL_TO_TIMESPEC(&user, ats); break; Note that the call to PROC_LOCK comes first, then the call to PROC_STATLOCK. This is because PROC_LOCK https://github.com/freebsd/freebsd/blob/master/sys/sys/proc.h#L825 is defined as: #define PROC_LOCK(p) mtx_lock(&(p)->p_mtx) If you obtain the locks in the other order -- i.e., if you grab the PROC_STATLOCK first, then try to lock PROC_LOCK -- you are trying to take a spin-type mutex while holding a default mutex, and this is not allowed (can cause deadlock). But taking the PROC_LOCK first (which may block), then taking the PROC_STATLOCK (a spin lock) "inside" the outer PROC_LOCK default mutex, is OK. (This is one of my mild objections to macros like PROC_LOCK and PROC_STATLOCK: they hide whether the mutex in question is a spin lock.) Incidentally, any time you take *any* lock while holding any other lock (e.g., lock A, then lock B while holding A), you have created a "lock order" in which A predeces B. If some other code path locks B first, then while holding B, attempts to lock A, you get a deadlock if both code paths are running at the same time. The WITNESS code dynamically discovers these various orders and warns you at run time if you have a "lock order reversal" (a case where one code path does A-then-B while another does B-then-A). (This is, in a sense, the same problem as discovering whether there is a loop in a directed graph, or whether this directed graph is acyclic. If you can force the graph to take the shape of a tree, rather than the more general graph, there will never be any loops in it, and you will never have lock order reversals. And of course if you have only *one* lock for some data, there is nothing to be reversed. Not all lock order reversals are guaranteed to lead to deadlock, but sorting out which ones are really OK, and which are not, is ... challenging.) Chris From owner-freebsd-hackers@freebsd.org Wed Apr 12 11:11:34 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8899D3BE55 for ; Wed, 12 Apr 2017 11:11:34 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com [IPv6:2a00:1450:400c:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 61F3DF53; Wed, 12 Apr 2017 11:11:34 +0000 (UTC) (envelope-from f.v.anton@gmail.com) Received: by mail-wm0-x230.google.com with SMTP id w204so17963605wmd.1; Wed, 12 Apr 2017 04:11:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=01JWzR2h3/k+MbRsSu7klNwFdmVaCbHD6sYRuRRfEPw=; b=KK597+ncNU3vNRkDpAZotL6Nlg1MvPfNeweqQG3OCChnWc+bZRWcQZV+xCvvEcbNzE t2ku03xGEPfstF5DXPu+pVZ1w4fv9gtbbXm9y9datLaHVQPmRK5dsSH0vqF9FI/APayo bm4zTTopn4gEm3Pbrl3kYLhy3s0Cc9UDW7rVCj4leU+e382ctSjL+bL8u1SArU6CdwJs USsHb4wYeZmxEo+u4KRI+nnygEyA2uJNBRNEWLOD68oTaEu77u6XwQlO794/88AUuX7h n6D9UXa05p3vc6yNlvMLanhLW+bl+wIG/pHlj80d9pY1YdE8d2eJzpcnb//T76/pMjEQ FnoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=01JWzR2h3/k+MbRsSu7klNwFdmVaCbHD6sYRuRRfEPw=; b=nnbZdCG4sd1Vt2AK/kQeC91dw1TYkl9F+ivbrkUpUwKhBKww6m6dAndXPyJs/RjSFx LhXfEexY4CfQx/fyaS5/ueNLsBvrvD33x6t1Bq4r3q/1I3+8Iov9UmNemmXyuAIZqqRm RmrLG53opiQVy0rz8BspOW+5hvNwgEge1YM7I894gxxwtG4YW5Z2c+3r9VbC8+vhcNBz gLu6SIY/RyFwurVBPYTFY6KYqz1k+ykyZ5UEedqmim/Ezq0CCzYCvlynoXb7womqrKNO QFtmZai3Nq+iYptEyYxZ7mo2EsxF7nE8j6wtK1tjKeAgv1LC1txqpIS7aS3AtVsUS8h2 agew== X-Gm-Message-State: AN3rC/7nIYM0lYVjHBp4+TMIcNcTMJFzBfU+G1U0dXKLaw83JfpHd3jl hlOc6ubllGIfmkBmL58aLrPHLKCB/OAf X-Received: by 10.28.6.203 with SMTP id 194mr20107190wmg.125.1491995491399; Wed, 12 Apr 2017 04:11:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.178.10 with HTTP; Wed, 12 Apr 2017 04:11:30 -0700 (PDT) In-Reply-To: <201704112210.v3BMAVSe093702@elf.torek.net> References: <201704112210.v3BMAVSe093702@elf.torek.net> From: Flavius Anton Date: Wed, 12 Apr 2017 14:11:30 +0300 Message-ID: Subject: Re: On COW memory mapping in d_mmap_single To: Chris Torek , freebsd-hackers@freebsd.org Cc: Peter Grehan Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 11:11:34 -0000 Hi Chris, Thanks a lot for your answer. I've added Peter to CC, as he knows about this ongoing project and some of the design decisions, like the COW mapping, were already taken to some extent when I joined. Please see my in-lined answers below. On Wed, Apr 12, 2017 at 1:10 AM, Chris Torek wrote: >>Yes, all vCPUs are locked before calling mmap(). I agree that we don't >>need 'COW', as long as we keep all vCPUs locked while we copy the >>entire VM memory. But this might take a while, imagine a VM with 32GB >>or more of RAM. This will take maybe minutes to write to disk, so we >>don't actually want the VM to be freezed for so long. That's the >>reason we'd like to map the memory COW and then unlock vCPUs. > > You'll need to save the device state while holding the CPUs locked, > too, so that the virtio queues can be in sync when you restore. Yes, saving vCPU state, vlapic, ioapic etc is done with all vCPUs locked. Memory, on the other hand, may be too large and take too much time to copy. I am working right now on saving virtio queues and device state. >>It's a OBJT_DEFAULT. It's not a device object, it's the memory object >>given to guest to use as physical memory. > > Your copy code path is basically a simplified vm_map_copy_entry() > as called from vmspace_fork() for the MAP_INHERIT case. But if > these are OBJT_DEFAULT, shouldn't you be calling vm_object_collapse()? > See https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/vm/vm_map.c#L3170 > (Maybe src_object->handle is never NULL? There are several things > in the VM object code that I do not understand fully here, so this > might be the case.) I saw those functions: vm_map_copy_entry() and vm_object_collapse(), but I didn't have enough understanding of the whole system to be able to tell if they might do some other things that we don't want them to. I'll read them again after this e-mail. >>>Next, how do you undo the damage done by your 'COW' ? > >>This is one thing that we've thought about, but we don't have a >>solution for now. I agree it is very important, though. I figured that >>it might be possible to 'unmark' the memory object as COW with some >>additional tricks. > > I think you may be better off doing actual vm_map_copy_entry() > calls. > > I am assuming, here, that snapshot-saving is implemented by > sending a request to the running bhyve, which spins off a thread > or process that does the snapshot-save. If you spin it off as > a real process, i.e., do a fork(), you will get the existing > VM system to do all the work for you. The overall strategy > then looks something like this: > > handle_external_suspend_or_snapshot_request() { > set global suspending flag /* if needed */ > stop all vcpus > signal virtio and emulated devices to quiesce, if needed > if (snapshot) { > open snapshot file > pid = fork() > if (pid == 0) { /* child */ > COW is now in effect on memory: save more-volatile > vcpu and dev state > pthread_cond_signal parent that it's safe to resume > save RAM state > close snapshot file > _exit(0) > } > if (pid < 0) ... handle error ... > /* parent */ > close snapshot file > wait for child to signal OK to resume > } else { > wait for external resume signal > } > clear suspending flag > resume devices and vcpus > } > > To resume a snapshot from a file, we load its state and then run > the last two steps (clear suspending flag and resume devices and > vcpus). > > This way all the COW action happens through fork(), so there is no > new kernel side code required This looks perfect to me, this was one of my first questions when I joined. However, I am not sure if it's ok to fork the entire bhyve memory space, I remember that I've seen some discussion about this, that's why I CCed Peter. Right now we have a checkpoint thread that listens for the checkpoint signal (via a UNIX socket), then it proceeds to locking the CPUs, saving some state, requests COW mapping (via ioctl), unlocks vCPUs and copy COW memory to a checkpoint file. I haven't done anything about unmapping the COW entry yet. > (Frankly, I think the hard part here is saving device and virtual > APIC state. If you have the vlapic state saving working, you have > made pretty good progress.) Thanks. I am almost sure it is not complete yet, but I have vlapic state saved. Actually, I am able to restore VMs using a ramdisk and no devices except the console. I'd like to open a pull request for review as soon as possible, but in the meantime I started looking on virtio devices and save/restore virtio-net too. -- Flavius From owner-freebsd-hackers@freebsd.org Wed Apr 12 11:56:55 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD77AD39CE6 for ; Wed, 12 Apr 2017 11:56:55 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pg0-x241.google.com (mail-pg0-x241.google.com [IPv6:2607:f8b0:400e:c05::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8D80DB18 for ; Wed, 12 Apr 2017 11:56:55 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pg0-x241.google.com with SMTP id 79so4917595pgf.0 for ; Wed, 12 Apr 2017 04:56:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=nxeN4e5qP1PDoFMH8fkHZssfH3T8E3KVVH241a9rSBg=; b=LNLPD/WmNWQtmgoKvmxgt86srQQ0XLlc9iY7hCfuj5CLSE6g6/J7EZBB9UgA1s9/M+ LBY8PYw4l1npYhGw1kkAEB2RLT3MVSJpVVzwY1wTEPoB7UXJgmCPDdeYL9u803ztaai/ r9GmAKh45Za6NL19X5qRJ33hf6TmCMFDpbq6azIYxoEcN+hZ18w8hfm7wLrNqpFVmrsN YTmZExkNP6HaLGwghpkyHtBS89342QdtV0GducSBJSi8/s12tr1FkPMz3mZvuXizTt8V AVOLfo6waeXkbfAmM4JzeA2gYxmmpwnGe59S0sFdF9KJtfhK+TbmB+60WjQT90NFtQfM COEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=nxeN4e5qP1PDoFMH8fkHZssfH3T8E3KVVH241a9rSBg=; b=CGEsVTesrtebwm6gNBdRbTIylCgrX6jx6ytPmuOYFXgExwIhIKhVfglyI8nZojoHTa YrfAH8Xz1dmgNUQdcp2LyvQC0aeSUh21rpksusrEBKyrzS71WNdtGog+GUq0A7vLMXEF BPaq8KY2ps7B0y8NjbXTKlZWN3F1KRD2eJUOjPoYRMnbyr6OICinzVT/v0KxuVf5X7Pl BlGsyWm3/tQC4r7zz2+VSWNVirg6xJO6bLIRRQfacSe2cGB6pbzr/JT4PV8vbO2UzuJQ z6R+KnV3O/IuMzYFbzua9AdzFzGpzqrn2gLkp0CJFGCqU3utHe2fQM/8wDUHcoJk0lbj u8/w== X-Gm-Message-State: AFeK/H3QTuy1NCKMfizJSWdEuH64lNB3H5hKxSsFUa1tIOP1ikdsFrYwXBgBIwTCKR1wAA== X-Received: by 10.84.212.8 with SMTP id d8mr82135131pli.152.1491998215161; Wed, 12 Apr 2017 04:56:55 -0700 (PDT) Received: from [192.168.2.211] ([116.56.129.146]) by smtp.gmail.com with ESMTPSA id b10sm5238515pfc.27.2017.04.12.04.56.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Apr 2017 04:56:54 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704120755.v3C7tYUH016699@elf.torek.net> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com From: Yubin Ruan Message-ID: Date: Wed, 12 Apr 2017 19:56:50 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201704120755.v3C7tYUH016699@elf.torek.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 11:56:55 -0000 On 2017年04月12日 15:55, Chris Torek wrote: >> clear. How can I distinguish these two conditions? I mean, whether I >> am using my own state/stack or borrowing others' state. > > You choose it when you establish your interrupt handler. If you > say you are a filter interrupt, then you *are* one, and the rest > of your code must be written as one. Unless you know what you > are doing, don't do this, and then you *aren't* one and the rest > of your code can be written using the much more relaxed model. > >> What do you mean by "must take MTX_SPIN locks 'inside' any outer >> MTX_DEF locks? > > This means that any code path that is going to hold a spin-type > lock must obtain it while already holding any applicable non-spin > locks. For instance, if we look at we find these: > > #define PROC_STATLOCK(p) mtx_lock_spin(&(p)->p_statmtx) > #define PROC_ITIMLOCK(p) mtx_lock_spin(&(p)->p_itimmtx) > #define PROC_PROFLOCK(p) mtx_lock_spin(&(p)->p_profmtx) > > Let's find a bit of code that uses one, such as in kern_time.c: > > https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_time.c#L338 > > (kern_clock_gettime()). This code reads: > > case CLOCK_PROF: > PROC_LOCK(p); > PROC_STATLOCK(p); > calcru(p, &user, &sys); > PROC_STATUNLOCK(p); > PROC_UNLOCK(p); > timevaladd(&user, &sys); > TIMEVAL_TO_TIMESPEC(&user, ats); > break; > > Note that the call to PROC_LOCK comes first, then the call to > PROC_STATLOCK. This is because PROC_LOCK > > https://github.com/freebsd/freebsd/blob/master/sys/sys/proc.h#L825 > > is defined as: > > #define PROC_LOCK(p) mtx_lock(&(p)->p_mtx) > > If you obtain the locks in the other order -- i.e., if you grab > the PROC_STATLOCK first, then try to lock PROC_LOCK -- you are > trying to take a spin-type mutex while holding a default mutex, Is this a typo? I guess you mean something like "you are trying to take a blocking mutex while holding spin-type mutex". > and this is not allowed (can cause deadlock). But taking the > PROC_LOCK first (which may block), then taking the PROC_STATLOCK > (a spin lock) "inside" the outer PROC_LOCK default mutex, is OK. I think I get your point: if you take a spin-type mutex, you already disable interrupt, which in effect means that no other code can preempt you. Under this circumstance, if you continue to take a blocking mutex, you may get blocked. Since you already disable interrupt and nobody can interrupt/preempt you, you are blocked on that CPU, not being able to do anything, which is pretty much a "deadlock" (actually this is not a deadlock, but, it is similar) Regards, Yubin Ruan > (This is one of my mild objections to macros like PROC_LOCK and > PROC_STATLOCK: they hide whether the mutex in question is a spin > lock.) > > Incidentally, any time you take *any* lock while holding any > other lock (e.g., lock A, then lock B while holding A), you have > created a "lock order" in which A predeces B. If some other > code path locks B first, then while holding B, attempts to lock > A, you get a deadlock if both code paths are running at the same > time. The WITNESS code dynamically discovers these various orders > and warns you at run time if you have a "lock order reversal" > (a case where one code path does A-then-B while another does > B-then-A). > > (This is, in a sense, the same problem as discovering whether > there is a loop in a directed graph, or whether this directed > graph is acyclic. If you can force the graph to take the shape of > a tree, rather than the more general graph, there will never be > any loops in it, and you will never have lock order reversals. > And of course if you have only *one* lock for some data, there is > nothing to be reversed. Not all lock order reversals are > guaranteed to lead to deadlock, but sorting out which ones are > really OK, and which are not, is ... challenging.) > > Chris > From owner-freebsd-hackers@freebsd.org Wed Apr 12 18:53:45 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9ED29D3BB1A for ; Wed, 12 Apr 2017 18:53:45 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A0E1B29 for ; Wed, 12 Apr 2017 18:53:44 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3CIrgrQ055169 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 12 Apr 2017 11:53:43 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3CIrg5d055158; Wed, 12 Apr 2017 11:53:42 -0700 (PDT) (envelope-from torek) Date: Wed, 12 Apr 2017 11:53:42 -0700 (PDT) From: Chris Torek Message-Id: <201704121853.v3CIrg5d055158@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Wed, 12 Apr 2017 11:53:43 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Apr 2017 18:53:45 -0000 >> If you obtain the locks in the other order -- i.e., if you grab >> the PROC_STATLOCK first, then try to lock PROC_LOCK -- you are >> trying to take a spin-type mutex while holding a default mutex, >Is this a typo? I guess you mean something like "you are trying >to take a blocking mutex while holding spin-type mutex". Yes, or rather brain-o (swapping words) -- these most often happen if I am interrupted while composing a message :-) >I think I get your point: if you take a spin-type mutex, you >already disable interrupt, which in effect means that no other >code can preempt you. Under this circumstance, if you continue to >take a blocking mutex, you may get blocked. Since you already >disable interrupt and nobody can interrupt/preempt you, you are blocked >on that CPU, not being able to do anything, which is pretty much a >"deadlock" (actually this is not a deadlock, but, it is similar) Right. It *may* deadlock, and it is definitely not good -- and the INVARIANTS kernel will check and panic. Chris From owner-freebsd-hackers@freebsd.org Thu Apr 13 01:26:24 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D3795D38DA8 for ; Thu, 13 Apr 2017 01:26:24 +0000 (UTC) (envelope-from otacilio.neto@bsd.com.br) Received: from mail-qk0-x22a.google.com (mail-qk0-x22a.google.com [IPv6:2607:f8b0:400d:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8CF29DCB for ; Thu, 13 Apr 2017 01:26:23 +0000 (UTC) (envelope-from otacilio.neto@bsd.com.br) Received: by mail-qk0-x22a.google.com with SMTP id f133so37619793qke.2 for ; Wed, 12 Apr 2017 18:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsd.com.br; s=capeta; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=N383ORst11g2PiAMLlRNgwH1Bxa1LHUEDkL69gyEK+g=; b=NGJrwtu2HcImIkgWg/AupeZmZPg22XGB2Hnf4fVc8hiTSmyVfy3Nbx479DvChINQEv etgNJbhVbrkQugs2EWIb9t/kk4B185f3DIiUxO2O3tMw6cUXia7XtWAmdftaXCvZSuKO L88+9L3Cuix3mQJrhdghn2d1YQ0bYtMVAyx9E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=N383ORst11g2PiAMLlRNgwH1Bxa1LHUEDkL69gyEK+g=; b=nnfJninENWqWQGgX2JR1UEVjK42PCPYns6vcj2lUxpOH7ZRGoRHH2J+71N0eCaTNpb OC8PAlVgh/RwIAX6FAi8qznaC5Ak0LOwY//9BmbK/eSS+YWu6fAFaU/k5A11AZxbPlrE ikzlfbB6dI8Bs5g4opFNN1EhaC3lGLluW5Ljbj2ZNqFq04ytznhZsRa2irTvHON7boQ+ 68nA5OcOEASQg0xFGAlaf502V3//9qhgQ46NOIVJLgNoBilB0+25wFKH5Drw8LbcIIiL uJHDQq2j+RyY2qftGBbtFX2pXUB6J0LMaMkhnyzsAIW75GZYXVcUH/sM8Nlno9ra77/7 26CQ== X-Gm-Message-State: AN3rC/7zGUzpA8zheX4hm6O6At+QyxC/hEA4osAHp8plF876ZNc3cl+O HWrVcvS0N4iiuxTL X-Received: by 10.55.102.193 with SMTP id a184mr411486qkc.309.1492046782712; Wed, 12 Apr 2017 18:26:22 -0700 (PDT) Received: from ?IPv6:2804:54:19ef:cc00:c47d:8860:c52b:6c79? ([2804:54:19ef:cc00:c47d:8860:c52b:6c79]) by smtp.googlemail.com with ESMTPSA id q80sm14735457qkq.16.2017.04.12.18.26.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Apr 2017 18:26:22 -0700 (PDT) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them To: freebsd-hackers@freebsd.org References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> <585B43F7-D4C8-431A-BFFE-68B48C3214AE@dsl-only.net> <876EA1E4-E5A9-411C-AFFD-989713037C19@dsl-only.net> From: =?UTF-8?B?T3RhY8OtbGlv?= Message-ID: <7adada71-e089-e105-eec8-6136d4b8c083@bsd.com.br> Date: Wed, 12 Apr 2017 22:25:43 -0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <876EA1E4-E5A9-411C-AFFD-989713037C19@dsl-only.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 01:26:24 -0000 Em 10/04/2017 17:15, Mark Millard escreveu: > On 2017-Apr-10, at 2:51 AM, Mark Millard wrote: > >> On 2017-Apr-9, at 5:10 PM, Mark Millard wrote: >> >>> On 2017-Apr-9, at 10:24 AM, Mark Millard wrote: >>> >>>> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov wrote: >>>>> Hmm, could you try the following patch, I did not even compiled it. >>>> I'll try it later today. >>>> >>>>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >>>>> index 3d5756ba891..55aa402eb1c 100644 >>>>> --- a/sys/arm64/arm64/pmap.c >>>>> +++ b/sys/arm64/arm64/pmap.c >>>>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) >>>>> sva += L3_SIZE) { >>>>> l3 = pmap_load(l3p); >>>>> if (pmap_l3_valid(l3)) { >>>>> + if ((l3 & ATTR_SW_MANAGED) && >>>>> + pmap_page_dirty(l3)) { >>>>> + vm_page_dirty(PHYS_TO_VM_PAGE(l3 & >>>>> + ~ATTR_MASK)); >>>>> + } >>>>> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >>>>> PTE_SYNC(l3p); >>>>> /* XXX: Use pmap_invalidate_range */ >>> >>> Preliminary testing indicates that this fixes the >>> some-pages-become-zero problem for fork-then-swapout/in. >>> >>> Thanks! >>> >>> I'll see if a buildworld can go through without being stopped >>> by the type of issue. But that will take a while. (It is how >>> I originally ran into the problem(s) that others had been >>> reporting on the lists.) >> buildworld buildkernel completed non-stop for the first time >> on a BPI-M3 board. > I had been thinking of the BPI-M3 for other reasons > and typed that instead of the correct: Pine64+ 2GB. > (True elsewhere as well.) I do really mean arm64 > here, not armv7. > >> Looks good for a check-in to svn to me (head and stable/11). >> >> This combined with 2017-Feb-15's -r313772's fix to the fork >> trampline code's updating of sp_el0 makes arm64 far more stable >> for my purposes. >> >> -r313772 was never MFC'd to stable/11. In my view it should be. > === > Mark Millard > markmi at dsl-only.net > Dears Will this patch be committed to HEAD? []'s -Otacilio From owner-freebsd-hackers@freebsd.org Thu Apr 13 04:59:37 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 402ECD3BBA7 for ; Thu, 13 Apr 2017 04:59:37 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-43.reflexion.net [208.70.210.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0296680C for ; Thu, 13 Apr 2017 04:59:36 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 18806 invoked from network); 13 Apr 2017 04:59:34 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 13 Apr 2017 04:59:34 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Thu, 13 Apr 2017 00:59:34 -0400 (EDT) Received: (qmail 15295 invoked from network); 13 Apr 2017 04:59:34 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 13 Apr 2017 04:59:34 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 65DFFEC8B66; Wed, 12 Apr 2017 21:59:33 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them From: Mark Millard In-Reply-To: <7adada71-e089-e105-eec8-6136d4b8c083@bsd.com.br> Date: Wed, 12 Apr 2017 21:59:32 -0700 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <4DEA2D76-9F27-426D-A8D2-F07B16575FB9@dsl-only.net> <163B37B0-55D6-498E-8F52-9A95C036CDFA@dsl-only.net> <08E7A5B0-8707-4479-9D7A-272C427FF643@dsl-only.net> <20170409122715.GF1788@kib.kiev.ua> <9D152170-5F19-47A2-A06A-66F83CA88A09@dsl-only.net> <9DCAF95B-39A5-4346-88FC-6AFDEE8CF9BB@dsl-only.net> <8FFE95AA-DB40-4D1E-A103-4BA9FCC6EDEE@dsl-only.net> <89D6D677-3BE2-45E2-A902-CC6A0305F3F9@dsl-only.net> <585B43F7-D4C8-431A-BFFE-68B48C3214AE@dsl-only.net> <876EA1E4-E5A9-411C-AFFD-989713037C19@dsl-only.net> <7adada71-e089-e105-eec8-6136d4b8c083@bsd.com.br> To: =?utf-8?B?T3RhY8OtbGlv?= X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 04:59:37 -0000 On 2017-Apr-12, at 6:25 PM, Otac=C3=ADlio = wrote: > Em 10/04/2017 17:15, Mark Millard escreveu: >> On 2017-Apr-10, at 2:51 AM, Mark Millard = wrote: >>=20 >>> On 2017-Apr-9, at 5:10 PM, Mark Millard = wrote: >>>=20 >>>> On 2017-Apr-9, at 10:24 AM, Mark Millard = wrote: >>>>=20 >>>>> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov = wrote: >>>>>> Hmm, could you try the following patch, I did not even compiled = it. >>>>> I'll try it later today. >>>>>=20 >>>>>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c >>>>>> index 3d5756ba891..55aa402eb1c 100644 >>>>>> --- a/sys/arm64/arm64/pmap.c >>>>>> +++ b/sys/arm64/arm64/pmap.c >>>>>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, = vm_offset_t eva, vm_prot_t prot) >>>>>> sva +=3D L3_SIZE) { >>>>>> l3 =3D pmap_load(l3p); >>>>>> if (pmap_l3_valid(l3)) { >>>>>> + if ((l3 & ATTR_SW_MANAGED) && >>>>>> + pmap_page_dirty(l3)) { >>>>>> + = vm_page_dirty(PHYS_TO_VM_PAGE(l3 & >>>>>> + ~ATTR_MASK)); >>>>>> + } >>>>>> pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); >>>>>> PTE_SYNC(l3p); >>>>>> /* XXX: Use pmap_invalidate_range */ >>>>=20 >>>> Preliminary testing indicates that this fixes the >>>> some-pages-become-zero problem for fork-then-swapout/in. >>>>=20 >>>> Thanks! >>>>=20 >>>> I'll see if a buildworld can go through without being stopped >>>> by the type of issue. But that will take a while. (It is how >>>> I originally ran into the problem(s) that others had been >>>> reporting on the lists.) >>> buildworld buildkernel completed non-stop for the first time >>> on a BPI-M3 board. >> I had been thinking of the BPI-M3 for other reasons >> and typed that instead of the correct: Pine64+ 2GB. >> (True elsewhere as well.) I do really mean arm64 >> here, not armv7. >>=20 >>> Looks good for a check-in to svn to me (head and stable/11). >>>=20 >>> This combined with 2017-Feb-15's -r313772's fix to the fork >>> trampline code's updating of sp_el0 makes arm64 far more stable >>> for my purposes. >>>=20 >>> -r313772 was never MFC'd to stable/11. In my view it should be. >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >>=20 > Dears >=20 > Will this patch be committed to HEAD? It was: Author: kib Date: Mon Apr 10 15:32:26 2017 New Revision: 316679 URL:=20 https://svnweb.freebsd.org/changeset/base/316679 Log: Do not lose dirty bits for removing PROT_WRITE on arm64. =20 Arm64 pmap interprets accessed writable ptes as modified, since ARMv8.0 does not track Dirty Bit Modifier in hardware. If writable bit is removed, page must be marked as dirty for MI VM. =20 This change is most important for COW, where fork caused losing content of the dirty pages which were not yet scanned by pagedaemon. =20 Reviewed by: alc, andrew Reported and tested by: Mark Millard PR: 217138, 217239 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Modified: head/sys/arm64/arm64/pmap.c Modified: head/sys/arm64/arm64/pmap.c = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D --- head/sys/arm64/arm64/pmap.c Mon Apr 10 12:35:58 2017 = (r316678) +++ head/sys/arm64/arm64/pmap.c Mon Apr 10 15:32:26 2017 = (r316679) @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sv sva +=3D L3_SIZE) { l3 =3D pmap_load(l3p); if (pmap_l3_valid(l3)) { + if ((l3 & ATTR_SW_MANAGED) && + pmap_page_dirty(l3)) { + vm_page_dirty(PHYS_TO_VM_PAGE(l3 = & + ~ATTR_MASK)); + } pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ There was a patch ( -r313772 ) committed to head back in Feb. for interrupts sometimes trashing a special register during fork. It takes both of these patches to get fork working reliably. [stable/11 should eventually get both of these patches so that fork becomes reliable there for aarch64 (armv8.0).] =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Thu Apr 13 09:28:17 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1B93ED3BAA3 for ; Thu, 13 Apr 2017 09:28:17 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com [IPv6:2607:f8b0:400e:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DFB02C5F for ; Thu, 13 Apr 2017 09:28:16 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pg0-x242.google.com with SMTP id g2so10283164pge.2 for ; Thu, 13 Apr 2017 02:28:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=a2j1IuhOD6HMFFaoSk4s7VuV09u2nX42VOKZHelFk04=; b=QOHWhM+dLG6bmRWRId5jeVy5IGIVufHvKfJf7+5e/zLci7cd7S0XL9mS2IcOLl8sms nzHrjXoyepdwZ5QL4k4q1ia9PDd7VOc/BmBUmIxDdhOVL32jWP+nM+I4ZQWWg430VojY TWg47ObeVpqpnIw4U/t9N654BRrOXSkCNmRDBBTbxLp6MaT5O5Kqbq/o8+8r2l0VUHQ5 mtt8uMXqW+SauLqro/ctLvlfnl2WKlLiTMqs8FcrSPcVjRAwI4g1bKSjBKzKeqU7Ulx8 ohN0TgKWsENRqN6O5LSrsyTfUQO22KfOzugPa8RD9tZsZp+rjKpozFMGs9ydNrUNF1xv 00LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=a2j1IuhOD6HMFFaoSk4s7VuV09u2nX42VOKZHelFk04=; b=fgogqMPPKh+I9oSeQ2CcH8QP1yoLZIXdczIYv4+0aUPLVjDY0HaUAYjktqItqCOQJJ MkD7USaYf7NW10wPsxbhKoCSBtBe9vsIEmFslP1QVyxU1yTZDmaTt9v8UKWwnxBKpjgy g1DOeD8JHCwHav7FP0LKVcqAOUvsIInzJPQ4of87p/upB0tPEoaxo/ZHJ+QZ43U9Qw6o DktxTvTgxOdobxga3t3eUTxabt/TAkL7hqq7m6lO5b+PfQLPKBz6P4NYKGjpPxP3wxxF 2FVKc/gApvjNQpxuyFz6iN0s1sbOWBt4HW86wlVYdIpGcq/k/DHWl12f9anM+qi3SGZL mSsw== X-Gm-Message-State: AN3rC/5WRQd6ot8lARfXkM2Ek1hW8IJLQm/YlrugUk8ynChTYzoEmVdl iQbq4eR1eMJuzSUjBpw= X-Received: by 10.98.14.28 with SMTP id w28mr2343869pfi.59.1492075696305; Thu, 13 Apr 2017 02:28:16 -0700 (PDT) Received: from [192.168.2.211] ([116.56.129.146]) by smtp.gmail.com with ESMTPSA id l127sm41436091pga.7.2017.04.13.02.28.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Apr 2017 02:28:15 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704121853.v3CIrg5d055158@elf.torek.net> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com, kostikbel@gmail.com From: Yubin Ruan Message-ID: <06a30d21-acff-efb2-ff58-9aa66793e929@gmail.com> Date: Thu, 13 Apr 2017 17:28:04 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201704121853.v3CIrg5d055158@elf.torek.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 09:28:17 -0000 On 2017年04月13日 02:53, Chris Torek wrote: >>> If you obtain the locks in the other order -- i.e., if you grab >>> the PROC_STATLOCK first, then try to lock PROC_LOCK -- you are >>> trying to take a spin-type mutex while holding a default mutex, > >> Is this a typo? I guess you mean something like "you are trying >> to take a blocking mutex while holding spin-type mutex". > > Yes, or rather brain-o (swapping words) -- these most often happen > if I am interrupted while composing a message :-) > >> I think I get your point: if you take a spin-type mutex, you >> already disable interrupt, which in effect means that no other >> code can preempt you. Under this circumstance, if you continue to >> take a blocking mutex, you may get blocked. Since you already >> disable interrupt and nobody can interrupt/preempt you, you are blocked >> on that CPU, not being able to do anything, which is pretty much a >> "deadlock" (actually this is not a deadlock, but, it is similar) > > Right. It *may* deadlock, and it is definitely not good -- and > the INVARIANTS kernel will check and panic. I discover that in the current implementation in FreeBSD, spinlock does not disable interrupt entirely: 607 for (;;) { 608 if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) 609 break; 610 /* Give interrupts a chance while we spin. */ 611 spinlock_exit(); 612 while (m->mtx_lock != MTX_UNOWNED) { 613 if (i++ < 10000000) { 614 cpu_spinwait(); 615 continue; 616 } 617 if (i < 60000000 || kdb_active || panicstr != NULL) 618 DELAY(1); 619 else 620 _mtx_lock_spin_failed(m); 621 cpu_spinwait(); 622 } 623 spinlock_enter(); 624 } This is `_mtx_lock_spin_cookie(...)` in kern/kern_mutex.c, which implements the core logic of spinning. However, as you can see, while spinning, it would enable interrupt "occasionally" and disable it again... What is the rationale for that? Regards, Yubin Ruan From owner-freebsd-hackers@freebsd.org Thu Apr 13 12:18:20 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70AF4D380E3 for ; Thu, 13 Apr 2017 12:18:20 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id DA38CF1C for ; Thu, 13 Apr 2017 12:18:19 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3DCIBg4093208 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 13 Apr 2017 05:18:11 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3DCIBJg093207; Thu, 13 Apr 2017 05:18:11 -0700 (PDT) (envelope-from torek) Date: Thu, 13 Apr 2017 05:18:11 -0700 (PDT) From: Chris Torek Message-Id: <201704131218.v3DCIBJg093207@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, kostikbel@gmail.com, rysto32@gmail.com In-Reply-To: <06a30d21-acff-efb2-ff58-9aa66793e929@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Thu, 13 Apr 2017 05:18:11 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 12:18:20 -0000 >I discover that in the current implementation in FreeBSD, spinlock >does not disable interrupt entirely: [extra-snipped here] > 610 /* Give interrupts a chance while we spin. */ > 611 spinlock_exit(); > 612 while (m->mtx_lock != MTX_UNOWNED) { [more snip] >This is `_mtx_lock_spin_cookie(...)` in kern/kern_mutex.c, which >implements the core logic of spinning. However, as you can see, while >spinning, it would enable interrupt "occasionally" and disable it >again... What is the rationale for that? This code snippet is slightly misleading. The full code path runs from mtx_lock_spin() through __mtx_lock_spin(), which first invokes spinlock_enter() and then, in the *contested* case (only), calls _mtx_lock_spin_cookie(). spinlock_enter() is: td = curthread; if (td->td_md.md_spinlock_count == 0) { flags = intr_disable(); td->td_md.md_spinlock_count = 1; td->td_md.md_saved_flags = flags; } else td->td_md.md_spinlock_count++; critical_enter(); so it actualy disables interrupts *only* on the transition from td->td_md.md_spinlock_count = 0 to td->td_md.md_spinlock_count = 1, i.e., the first time we take a spin lock in this thread, whether this is a borrowed thread or not. It's possible that interrupts are actually disabled at this point. If so, td->td_md.md_saved_flags has interrupts disabled as well. This is all just an optimization to use a thread-local variable so as to avoid touching hardware. The details vary widely, but typically, touching the actual hardware controls requires flushing the CPU's instruction pipeline. If the compare-and-swap fails, we enter _mtx_lock_spin_cookie() and loop waiting to see if we can obtain the spin lock in time. In that case, we don't actually *hold* this particular spin lock itself yet, so we can call spinlock_exit() to undo the effect of the outermost spinlock_enter() (in __mtx_lock_spin). That decrements the counter. *If* it goes to zero, that also calls intr_restore(td->td_md.md_saved_flags). Hence, if we have failed to obtain our first spin lock, we restore the interrupt setting to whatever we saved. If interrupts were already locked out (as in a filter type interrupt handler) this is a potentially-somewhat-expensive no-op. If interrupts were enabled previously, this is a somewhat expensive re-enable of interrupts -- but that's OK, and maybe good, because we have no spin locks of our own yet. That means we can take hardware interrupts now, and let them borrow our current thread if they are that kind of interrupt, or schedule another thread to run if appropriate. That might even preempt us, since we do not yet hold any spin locks. (But it won't preempt us if we have done a critical_enter() before this point.) (In fact, the spinlock exit/enter calls that you see inside _mtx_lock_spin_cookie() wrap a loop that does not use compare-and- swap operations at all, but rather ordinary memory reads. These are cheaper than CAS operations on a lot of CPUs, but they may produce wrong answers when two CPUs are racing to write the same location; only a CAS produces a guaranteed answer, which might still be "you lost the race". The inner loop you are looking at occurs after losing a CAS race. Once we think we might *win* a future CAS race, _mtx_lock_spin_cookie() calls spinlock_enter() again and tries the actual CAS operation, _mtx_obtain_lock_fetch(), with interrupts disabled. Note also the calls to cpu_spinwait() -- the Linux equivalent macro is cpu_relax() -- which translates to a "pause" instruction on amd64.) Chris From owner-freebsd-hackers@freebsd.org Thu Apr 13 13:46:28 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8E57D3BC97 for ; Thu, 13 Apr 2017 13:46:28 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com [IPv6:2607:f8b0:400e:c00::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 898F9151 for ; Thu, 13 Apr 2017 13:46:28 +0000 (UTC) (envelope-from ablacktshirt@gmail.com) Received: by mail-pf0-x242.google.com with SMTP id o126so10966280pfb.1 for ; Thu, 13 Apr 2017 06:46:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=zAOW7bioTJCjAtfqpCe+yL3nIsRs0wAHd4JyPxzGVCc=; b=SmEWB0yDWfm8bSmLNHEwOk2vhhL1uTNREjQ3cidPWLOuCy9+x9dq6mok9m6/aJKr+Z x/AKmDQatPScz/E8abVAu+GSSp20+ntTAC5PAv/QJvivSq7ZuBIXW9Lz/NIDLipkMvXZ okuf7/SsVMR3DWGwUVyRCWbNk6a7gx1LJZy8lLS1qSSgpdVpRDOGSCj3SBnSJonlBUWP 2o8jY0Pg2G6Fzd6bm8s33np+beU03qzHoJkdbFPXBeR0D7Mx2AoFkayOoa/xndhbvpDW mcA94T92PhAaf/pUQ2dNm3Oq049KAxO3qbOJ+HJ79Tc06/iCvhUoN7iQzgxo1k5ufy18 MAkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=zAOW7bioTJCjAtfqpCe+yL3nIsRs0wAHd4JyPxzGVCc=; b=apiRI+zU/ZS6uDXKFDTWN9+e/l93gnkBNJ2wodRBzOdOjiTZtoI4X4aPuSBWecnO6G P6hhZDcPMhB/O+NZJf8knnJlyqMiqDk2boFomTQWYWOxqmOZ+kDCxfm/mxROteHo1Oqx 45xfBFdFJSxNhnzvjL1vmyXn4hyrJzx4V5J1SuS2FFKRpbuSkOYMDd8Wx7rRRcEx+qT4 Y16RAF4CsdWLNJTwN4SB0az3ZnNUNcta2zaKyh71iP8Cg7UVCFW4c6lC/bJKJQJFCCmp lvff5quHt6cJujz+RQhfLB5Vkj+6OojsfS7OhrehS1+6VegvzrO7ajeG6u09TR6NvTYG 4G1w== X-Gm-Message-State: AN3rC/4y0cDgk2SfY0EUd5Uv9aVj9FAxRjuvfNfh81mYvZDt03dbewiB m1VEpVf5cqMZRA== X-Received: by 10.99.65.4 with SMTP id o4mr3478761pga.90.1492091187944; Thu, 13 Apr 2017 06:46:27 -0700 (PDT) Received: from [192.168.2.211] ([116.56.129.146]) by smtp.gmail.com with ESMTPSA id 74sm42776530pfn.102.2017.04.13.06.46.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Apr 2017 06:46:26 -0700 (PDT) Subject: Re: Understanding the FreeBSD locking mechanism To: Chris Torek , imp@bsdimp.com References: <201704131218.v3DCIBJg093207@elf.torek.net> Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, kostikbel@gmail.com, rysto32@gmail.com From: Yubin Ruan Message-ID: Date: Thu, 13 Apr 2017 21:46:26 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <201704131218.v3DCIBJg093207@elf.torek.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 13:46:28 -0000 On 2017年04月13日 20:18, Chris Torek wrote: >> I discover that in the current implementation in FreeBSD, spinlock >> does not disable interrupt entirely: > [extra-snipped here] >> 610 /* Give interrupts a chance while we spin. */ >> 611 spinlock_exit(); >> 612 while (m->mtx_lock != MTX_UNOWNED) { > [more snip] > >> This is `_mtx_lock_spin_cookie(...)` in kern/kern_mutex.c, which >> implements the core logic of spinning. However, as you can see, while >> spinning, it would enable interrupt "occasionally" and disable it >> again... What is the rationale for that? > > This code snippet is slightly misleading. The full code path runs > from mtx_lock_spin() through __mtx_lock_spin(), which first > invokes spinlock_enter() and then, in the *contested* case (only), > calls _mtx_lock_spin_cookie(). > > spinlock_enter() is: > > td = curthread; > if (td->td_md.md_spinlock_count == 0) { > flags = intr_disable(); > td->td_md.md_spinlock_count = 1; > td->td_md.md_saved_flags = flags; > } else > td->td_md.md_spinlock_count++; > critical_enter(); > > so it actualy disables interrupts *only* on the transition from > td->td_md.md_spinlock_count = 0 to td->td_md.md_spinlock_count = 1, > i.e., the first time we take a spin lock in this thread, whether > this is a borrowed thread or not. It's possible that interrupts > are actually disabled at this point. If so, td->td_md.md_saved_flags > has interrupts disabled as well. This is all just an optimization > to use a thread-local variable so as to avoid touching hardware. > The details vary widely, but typically, touching the actual hardware > controls requires flushing the CPU's instruction pipeline. > > If the compare-and-swap fails, we enter _mtx_lock_spin_cookie() > and loop waiting to see if we can obtain the spin lock in time. > In that case, we don't actually *hold* this particular spin lock > itself yet, so we can call spinlock_exit() to undo the effect > of the outermost spinlock_enter() (in __mtx_lock_spin). That > decrements the counter. *If* it goes to zero, that also calls > intr_restore(td->td_md.md_saved_flags). > > Hence, if we have failed to obtain our first spin lock, we restore > the interrupt setting to whatever we saved. If interrupts were > already locked out (as in a filter type interrupt handler) this is > a potentially-somewhat-expensive no-op. If interrupts were > enabled previously, this is a somewhat expensive re-enable of > interrupts -- but that's OK, and maybe good, because we have no > spin locks of our own yet. That means we can take hardware > interrupts now, and let them borrow our current thread if they are > that kind of interrupt, or schedule another thread to run if > appropriate. That might even preempt us, since we do not yet hold > any spin locks. (But it won't preempt us if we have done a > critical_enter() before this point.) Good explanation. I just missed that "local" interrupt point. > (In fact, the spinlock exit/enter calls that you see inside > _mtx_lock_spin_cookie() wrap a loop that does not use compare-and- > swap operations at all, but rather ordinary memory reads. These > are cheaper than CAS operations on a lot of CPUs, but they may > produce wrong answers when two CPUs are racing to write the same why would that produce wrong result? I think what the inner loop wants to do is to perform some no-op for a while before it tries again to acquire the spinlock. So there is no race here. > location; only a CAS produces a guaranteed answer, which might > still be "you lost the race". The inner loop you are looking at > occurs after losing a CAS race. Once we think we might *win* a > future CAS race, _mtx_lock_spin_cookie() calls spinlock_enter() > again and tries the actual CAS operation, _mtx_obtain_lock_fetch(), > with interrupts disabled. Note also the calls to cpu_spinwait() > -- the Linux equivalent macro is cpu_relax() -- which translates > to a "pause" instruction on amd64.) Regards, Yubin Ruan From owner-freebsd-hackers@freebsd.org Thu Apr 13 22:46:48 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 58531D3CF29 for ; Thu, 13 Apr 2017 22:46:48 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 42DA09DC for ; Thu, 13 Apr 2017 22:46:47 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3DMkjGK027792 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 13 Apr 2017 15:46:45 -0700 (PDT) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.15.2/8.15.2/Submit) id v3DMkj27027791; Thu, 13 Apr 2017 15:46:45 -0700 (PDT) (envelope-from torek) Date: Thu, 13 Apr 2017 15:46:45 -0700 (PDT) From: Chris Torek Message-Id: <201704132246.v3DMkj27027791@elf.torek.net> To: ablacktshirt@gmail.com, imp@bsdimp.com Subject: Re: Understanding the FreeBSD locking mechanism Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, kostikbel@gmail.com, rysto32@gmail.com In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (elf.torek.net [127.0.0.1]); Thu, 13 Apr 2017 15:46:45 -0700 (PDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 22:46:48 -0000 (This is getting a bit far afield; let me know if we should take this off-list.) >why would [regular read, vs CAS] produce wrong result? There are both hardware architecture (and sometimes individual CPU architecture) and compiler reasons for this. First, compilers may try to optimize load and store operations, especially on register-rich architectures. What's coded as: ... some code section A ... x = p->foo; y = p->bar; ... some code section B ... might actually move the loads of p->foo and/or p->bar into either the A or B sections. The same goes for stores. The compiler makes the (somewhat reasonable for most programming) assumption that only the instructions the compiler itself emits, actually access the data -- not some instructions running on some other CPU. For any lock, this assumption is automatically wrong. We can defeat part of this with the "volatile" keyword, but we need to insert compiler level memory barriers to make sure that the operations proceed in a temporally-defined manner, i.e., so that time appears to be linear. Second, the CPU itself may also have both temporal and non- temporal loads and stores (with arbitrarily complicated rules about using them). In this case there may be special instructions ("sfence", "mfence", etc; "membar" on SPARC) for forcing order. For more about non-temporal operations, see, e.g.: http://stackoverflow.com/q/37070/1256452 http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/CJACGJJF.html There are some lock algorithms that work without most of this, but they tend to be a bit hard to set up. Even then we usually depend on an atomic compare-and-swap: see http://wiki.c2.com/?LockFreeSynchronization for instance. >I think what the inner loop wants to do is to perform some no-op >for a while before it tries again to acquire the spinlock. Yes - but the point is that it tries to "gently" read the actual mutex lock value, and inspect the result to see whether to try the more-savage (at the hardware level) CAS. Some of this gets deep into the weeds of hardware implementations. I had this in my earlier reply (but ripped it out as too much detail). On Intel dual-socket systems, for instance, there is a special bus that connects the two sockets called the QPI, and then there are caches around each core within any one given socket. These caches come in multiple levels (L1 and higher, details vary and one should always expect the *next* CPU to do something different) with some caches physically local to one core and others shared between multiple cores in one socket. These caches tend to coordinate using protocols called MESI or MOESI. The letters stand for cache line states: Modified, Owned, Exclusive, Shared, or Invalid. A Modified cache line has data not yet written to the next level out (whether that's a higher level, larger cache, or main memory). An Excusive line is in this cache only and can therefore be written-to. (I'm ignoring "owned", it is kind of a tweak between M and E.) A Shared line is in this cache *and some other cache* and therefore can *not* be written to, but *can* be read from; and finally, an Invalid line has no valid data at all. As a rule, the closer a cache line is to the CPU, the faster its access is. (This rule is pretty reliable since otherwise there is no point in having that cache.) *Writing* to a cache line requires exclusive access, though, so we must know if the line is shared. If it *is* shared, we must signal higher level caches that we intend to write, and wait for them to give up their cached copies of data. In other words we fire a bullet at them: "I want exclusive access, kill off your shared copy." Then we must wait for a reply, or a time delay (whichever is architecturally appropriate), so that we know that this succeeded, or get a failure indication ("you may not have that exlusively, not just yet anyway"). This reply or delay takes up to the *worst case*, slowest, access time may be. For dual-socket Intel that means doing a QPI bus transaction to the other socket. (This is true for any write operation, not just compare-and-swap. For this reason, we often like our mutexes to fill out a complete cache line, so that any data *protected by* the mutex is not shot out of the cache every time we poke at the mutex itself.) Note that when we *read* the object, however, we're doing a read, not a write. This does not need exclusive access to the cache line: shared access suffices. If we do not have the data in cache, we send out a request: "I would like to read this." Any CPU that has the item cached, e.g., whoever actually locked the lock, must drop it back to whatever level accomplishes sharing -- if it's dirty, writing it out -- and take his core-side cache line status back from M or E to S. Any other CPU also spinning, waiting for the lock, must go to this shared state. Now all CPUs interested in the lock -- the holder, and all waiters -- have it shared. They can all *read* the line and see whether it's still locked. There is no traffic over inter-cache or inter-socket busses at this point. These are the "gentle" spins, that only *read* the lock. Eventually, whoever owns the lock, unlocks it. This requires a write (or CAS) operation, which yanks the cache line away from all the spinners (that part is unavoidable, and slow, and causes all this bus traffic we were avoiding) and releases the lock. The spinners then take the shared cache lines back to shared state and see that they *may* be able to get the lock. At this point they attempt the expensive operation, and *that* produces a reliable answer -- which may be "someone else beat us to the lock so we go back to the gentle spin code". Note that some architectures do not have an actual compare-and- swap instruction. For instance, PowerPC and MIPS use a different technique: there is a load instruction that takes the cache line to exclusive state *and* sets an internal CPU register to remember this. If the cache line drops back out of exclusive state, a subsequent "store conditional" instruction fails the condition, does *not* store, and lets you branch to a loop that repeats the load if needed. If it is still exclusive, the write succeeds (and the cache line goes to M state, if the cache is write-back). This lets you build a compare-and-swap from the low level cache-line synchronization operations that are what the hardware uses. There is more on this at: http://stackoverflow.com/q/151783/1256452 and specifically for x86 (64 bit Intel and ARM) at: http://stackoverflow.com/q/151783/1256452 (These are not the only ways to implement synchronization. Some more exotic architectures have special regions of physical memory that can act transactionally, or that auto-increment upon load so that each CPU can "take a ticket" and use a version of Lamport's Bakery algorithm: see https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm for details. However, the BSD mtx_lock() is designed around compare-and-swap plus optimizations for MESI cache implementations.) Chris From owner-freebsd-hackers@freebsd.org Fri Apr 14 18:56:13 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B964D3E24D for ; Fri, 14 Apr 2017 18:56:13 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0042.outbound.protection.outlook.com [104.47.33.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8FD1B9C6; Fri, 14 Apr 2017 18:56:11 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=3YvMaCZrxv8xPxTq608KPk97saL0YW9nNR5OHRmqoRQ=; b=oDXfapV1WTsHXvApjmpZr5h2phVcH+ZR6c2NleGAIEQhdfs42p/MOz6h4gSOUK30oe77LxQ2ffNXSIQbG3lvMtXbuZWNoLSAg7PYzmmKhIvvl8ife8HaIHFNHekoV9d8HbXJW71DtP/vEgPVwB7Kp3stTUanJkFJccJZNOVJ9DU= Received: from DM5PR05CA0023.namprd05.prod.outlook.com (10.173.226.33) by BN6PR05MB3570.namprd05.prod.outlook.com (10.174.234.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6; Fri, 14 Apr 2017 18:56:09 +0000 Received: from CY1NAM02FT050.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e45::209) by DM5PR05CA0023.outlook.office365.com (2603:10b6:3:d4::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6 via Frontend Transport; Fri, 14 Apr 2017 18:56:09 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp2.campus.ksu.edu; Received: from ome-vm-smtp2.campus.ksu.edu (129.130.18.151) by CY1NAM02FT050.mail.protection.outlook.com (10.152.75.65) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Fri, 14 Apr 2017 18:56:08 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp2.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3EIu8k8006637; Fri, 14 Apr 2017 13:56:08 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 1F9F3248318; Fri, 14 Apr 2017 13:56:08 -0500 (CDT) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id C28402482FB; Fri, 14 Apr 2017 13:56:05 -0500 (CDT) Received: by mail-wm0-f51.google.com with SMTP id t189so69721399wmt.1; Fri, 14 Apr 2017 11:56:05 -0700 (PDT) X-Gm-Message-State: AN3rC/6FioYP8KBcIDQZ2s5zBGQnFyEHzz4PuPcdqpmPCbdfFULfCY+7 2+HR6PWbO28NSOdb1rcEk30YDj2OzQ== X-Received: by 10.28.98.66 with SMTP id w63mr36546wmb.33.1492196164629; Fri, 14 Apr 2017 11:56:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.167.206 with HTTP; Fri, 14 Apr 2017 11:55:44 -0700 (PDT) In-Reply-To: References: From: Kyle Evans Date: Fri, 14 Apr 2017 13:55:44 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replacing libgnuregex To: CC: Pedro Giffuni , Ed Maste X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39400400002)(39410400002)(39840400002)(39860400002)(39450400003)(2980300002)(438002)(24454002)(377454003)(199003)(189002)(69234005)(8676002)(38730400002)(110136004)(106466001)(8576002)(63696999)(76176999)(54356999)(42186005)(7906003)(50986999)(93516999)(498394004)(8936002)(450100002)(3480700004)(4326008)(53546009)(305945005)(356003)(55446002)(9896002)(606005)(75432002)(7116003)(5660300001)(90966002)(229853002)(2351001)(6916009)(512874002)(45336002)(46386002)(2950100002)(61726006)(236005)(2906002)(88552002)(6306002)(189998001)(221733001)(86362001)(61266001)(9686003)(6246003)(966004)(84326002)(54906002)(55456009); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR05MB3570; H:ome-vm-smtp2.campus.ksu.edu; FPR:; SPF:Pass; MLV:sfv; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; CY1NAM02FT050; 1:cZRVwphsQks9bL/iXgw6MuMgwgbEkYKAw9IOnbkKuqctzfiVz/r52/AegAOoHlS2MnDFtUQrCnDetAmlEZNk8V7LGPobgFtRJoFYERJKvVtYhOmiI1MoDlSguqRExbFaP910XHtK3nWpkbLpz47hdikNjo1o1DTlynKFWeO8xwm7doZ3VeZz4uO8WSv3X/aDl84M5A51/cQkbBGwIzSFAqen+Tt3SZK/jd2wEbyLOcuEyV4p5sUoaZjCDf1tQQJcmAM38STCriPvffCYFolCIGxG6PCSDawT1SvPcR5qoBJ+9rPe7B8qTXHwTe50/hKVDcyhDMUsrEua//nFCnTzLhAOV57bwKSepaCrZdZVT5BYk+8UcHPFZCGwkkFTzHk3jjuxYZ0oh1HhW0ws00Q2Yab94CD2MJtMkXmmK8J91WO8hRxiiwc526XxCGw84oaTfB61i59JHGjAvZsCS6pEbWtB/AEi5J9ksIKVFDLET7weiNhRTh9s7qrEv+nha5shF1HQXDY+4hC15FmQNFJC+ezew4NGXheQWeWYqSc/ebs= X-MS-Office365-Filtering-Correlation-Id: a88d3fa4-c546-480e-a03b-08d48367e9a9 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081)(201702281549075); SRVR:BN6PR05MB3570; X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 3:88qUcyNQMuH6ubxRyeuzP8MVxpKvW3tLEXQYdIIWy4zRng3O70EHpotNtEGeWxvUEsWX34CukIW8hyO0HUNL6bhu8d06FTNxGxoxrR3oqE5I5FFKxH+7WoNDXZsHj14vRkvi6YxwHimM3UBVZ+9I35SYtRircbnadZcDVUj3zNycQUnsKcA0SZVtgKmJjmBqNND1raw5DMd9HKSFhXk6j8dIzvhrSW6f+HeN6P5uaHmjQrMverk3DCeXk5gHFFZuW7CvbIKvOWpy07PadWajzrgonHPjND8ddu1U8BFERqJK4XcmBxSxwgA7SXMchUHL0YMSPr4UvPWY4yeggAkfh7z9HuIf+NZt9Bvkp+e4Q6QF/FkVJ/kG1f+pnHOMsCy1L9h5D0HkkJTsW9zj5jcv2yUA7tTUcgENaTmSJ+FJzTYohqaknDI9SO7nll1VZ0D3BexUzEoJnT6KvmEKM+oxudmd42LCi3yfti0meWMWL9aWhu1+OkMWTq29kvtNLg05pDS79lIJUJth+rFpV8kQpA== X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 25:Um8cMkVtPikaBachp+u5NqEcZljWHF6gYvRLtUugX+s3bTIQGqXKTibz25KtSSSqJOYJSZpHJJw2TVD1w8xt2QKXx2+fD5OjnRJ1ioppzLSC/3jGv/lmYvvYmJYOg7u0gEEr600Gm7AqHjXIh7aQZrR4sWoEfTwBaDxRWptNgVMTW/3feKXD2McfX4QGWahQRD8fQN4lk4MNZbhKGfWvFkv6HghoUPCPtxFqNncQcdRUwzUvGVGwa/txJ+JukTWj6RM+WG1sKYAUB9r/4TXK7Geyv/2JHe9QaxMmBYkNJ5gzPLt0Ue2EjsZvcCIkSFTcgYrhI6VxRKW6lXR+WlikAdOnaMuEih5/UqqTFG+QngN+wIJ+xuojR/CFSVrk5kAJM5DrTCyvPXv4aa6HwRVPpArLJ0CpvI0DqBgGaHhih3SEbcoSrAqNFtgo43ekUzjqYbA1vSL5dr4z0NIw4flynv4ipn5eT43qnPS05NXiKow=; 31:u826m6AbEdb4xS1joNfnrqLxViKb+ULKcq3cClSc7nYzgRj3Gfu1tu+5INakMSNtWCSA4wSIVMmOFOt0ArfenDfzgtYJGZPmtxpEAu5MkcavpvZP+rXx3EjrZI20iM1VowRI9giXiRcMeDq5bUk2dCa+SjIL9ti4O65oCFjH5PRtINwikfsU93vmxKMItKooCMOQdnrp/v2z0CA0qfHTJtwHA2W3vq+LDE4BCN/uxCNW1r5UVAuobziKdhvgrY+NBZganYOl/hC4v57jhnoxBoVeD20JVsV9aAwMn/QAuns= X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 20:W9oAu16mAzuvixWNEj6z2G6b1dJa1ONFaevQTnNouIntSfCMsQY7mwFiCEjlOHPmG5smbYXvyj4sILen748qtMe8rwTVzs+fa5HpZ0xqGxYp/yaiTdvCstai2F+eyt9zLa1+EhNsdmtTuHY92qwkJKFz5zqvkOgL0Smhe818r3EUP7ibZov0trEKP78RhOEYDfCRjagGkPDkNQqU/CkCHFre3Zm2bxcI/x4iCakVuPwH7a4h2iGGl2TKhiZnftrSVjLqRDYZ3stz3yJWGWpUaijFHUGKnzx0mnoz8n0VMgTxgbpNjizLggAqSO7cHJAdtNwH/7c0m/yDWFtYpKvaYTlLO24A+Aw8Fi5iVZEQQxbadvSfl+HG2vVUJLT67a9WvRnTHL9XZWS/7QkTTiqNbfRxhPKHU9Arh35a/7zO6mervZTdFwBEMn1+8JJnOMzxlezr+Zlyt522RiZOrmOaWaKb7NBend6Pe/SDL6QV8ytRKm3C0o2LXhe3eZjHrgYd X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(112903893386949); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(13024025)(13023025)(13018025)(8121501046)(13017025)(13015025)(5005006)(93006095)(93004095)(3002001)(10201501046)(6041248)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(20161123562025)(20161123555025)(20161123564025)(20161123560025)(6072148); SRVR:BN6PR05MB3570; BCL:0; PCL:0; RULEID:; SRVR:BN6PR05MB3570; X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 4:ibEdhJnpoZ4eosClq+sRohctfPJCK1yqgAa9axC0dwOBc81TH5jwER8DK3zQKZ30j/P5klZvAMi2RKGTUZ9lu/JGT3IppgMwMrX7SgHLiQAj1jwRDvwfcQYbhK0KYAMcbsv4O3m7eXMLyApelWusDEemSHW7YR34e/JTin0W9FZJd0C+3TBbLBhxaRqI4pRY/+0x8v2jjAz7nHUyMJ9jQ2V9jxWU6TqysKvfQYaSVe7IF77DCxYVqJLJsJEV33TvfcybLw8nPyLz6gN1Q/w/LuyIwAXEAAgiAmVbDSCG3qCugqzreZHVKdVqCMLe54x8DMMfTzAXeArSUTNhwBSSfimO+fadazCi1HEwC4oyFSr5noYPwSH66p3liHSlh4ohunxUJ+X1ZDpJ3D604Knnc7MfaJ5sOjVnYvvlroqO4DoXNhqaSAwmTcWJasCvCoXuxaBGHu7/5bQmKOlMa/3ytAFjYe736ReCVL2Zw7CQyeHDQSfs5Mzp72kFKVkROq5p1elrYeQjeWZxtF1R12W92ZPhas3Mw65INrE7yoSlHrhYI3i989DUPTDLcuFOFeZdKdjxONN9/R1T9JMhm77yL6GJ3BtI0yexm8FcKUAxKa6deSkh4NfbBPzDewHN1Yese43PeMy6gGcjXBh1Ih8sec1gUFQW/OPsWQCto5RNs1+Lagb2p8KaqVuTeHnBjHT0FitU29EGv12js3SxGYKAxN9gW3Pra58JWvcPAL41aIQ4sTZpF7WuHZeoqDCBTi/hCBsVywyXyMBt3eVirt3k06DzUiMHk836M48tvfXdyhtgsCgyFIh0PVABDlm5QS9S4/hiGrG6jU5Wyw1ihDF7fKCJZIR3gzyDXL3CysA1fHjYRJ7Cv/N5c5OWbVXKY/hnjyzYs9ufxGxft6ck7TtxDA== X-Forefront-PRVS: 02778BF158 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN6PR05MB3570; 23:ocPHmsllCDfIkwoxSh2sCy4ynnGGObQhDNMUJFnGA?= =?us-ascii?Q?GLBrP6rVE4JP9Fqxd6vMVCC7ev2Fmqe1SsnOeeW3smFDS48GlKDHBjDetB3m?= =?us-ascii?Q?7IDn5ddLdityUaMFmHXhmOp2oML2TMUPQ0BeTPH5Q2eztmNCyjOA3H2i/GrK?= =?us-ascii?Q?cZZV51Jkq7BIvyVU15e67vD73w/WYKmaUd954sOr4bj6a4hZR6lVsr1wAwCO?= =?us-ascii?Q?9arMyLovodv+TTJ+KXEOPoglcGRE2rlCC7by4e2ioMbmGpCocOp2ojTY4mXe?= =?us-ascii?Q?KRI7dk/kwJM/2MwASsF2RcevaZxqRSOJR7tFEwhuaSCbdcKr1O1aztXO8ZxC?= =?us-ascii?Q?miEtEHZv1aAYxqqdNqHn0DqG8CwtEwf7AKMszkNvVtN0+DtfRY19Y72lzvw8?= =?us-ascii?Q?TTvVT9Y5b7GqbFtU1WIWFGGrCUC14vqYLb6acxSNO8gKm0zzubhC2IephcY1?= =?us-ascii?Q?Ay4tZpqB1pqxVXrxiUKl32N8TZh3aMM3+GbFmaiPoWaz5/zH8L7gpzWpKEOY?= =?us-ascii?Q?y2tNf/t6irpym1v+adJRQYuYX7vWnaAEtZX+/h4CTaWC59wsKq/qnERAGJGI?= =?us-ascii?Q?uxsVz7T9MiwvtuLR4vNb++i7+HHyjgOWKQF6woKGMKQ7jZd4qhPVCFy6bmiH?= =?us-ascii?Q?HFzZqXRynJsOJkwoA9I2Ixw376uvsetkPwHp4Q1vLbpiBfuoONRg5pJInO1S?= =?us-ascii?Q?uhNu3GffvjMDOmLeTvCXyUoUgsBqoHT3XPJ0vqLhcAuusqK0UXf4xD1Cyxcu?= =?us-ascii?Q?hFu9ltOv/54v/iXNCnY8OsnD4v+TEIVV2a1qTX7kiA2yff5Unxi6N9MbZXBw?= =?us-ascii?Q?wqcvDJ0T/VJeUELbewDzw1pCxGqxvcRw6Uhh2+aUh8ZfiCMqRfOZOiEs7Ycw?= =?us-ascii?Q?X83XPVp5GebE1EEeq/LkdacsJiRbzQtTgWiVPxE0Us2YNzuTkcJyyIRCoR5R?= =?us-ascii?Q?uXg3FavZ/rnsOMDefZrbHt3pX4LbVmQkx2hrYRoOCjsZChWl+7CeKV3zbjL1?= =?us-ascii?Q?7pMQgqhlELD4rDCvNZ77QnbI/RHmgWqZwnOlxCPGFGoG40PSt1DxmfTGyIok?= =?us-ascii?Q?/LdNCmPkbrkuUp9kReKvQfL1r4WTT44KXminruPBu5+zObJLR1lH2ZS+4drF?= =?us-ascii?Q?/qKFLa+QYL7EFSvZAaGyTRhCmphXlg0xIm1D6hYFuErv7Cu7ncoE/AcG6HtF?= =?us-ascii?Q?H+u2bCoseLGp1wujgNRNRjEUqzt8oR5fHRQjMqwryZW05anv4HKuXx/pUmnM?= =?us-ascii?Q?/wumeTkkc1/FxvmB+c4sGJEpAmHErFBvb9eQTgBpozQosLt3ds8kx6nWi6tK?= =?us-ascii?Q?wzzboeHWRpWVQB1rHheGq6p/dXg+jGbVRWejVMPZMhWHuceF6eJcMVHDLfxF?= =?us-ascii?Q?oJRbqJYFsgH6M45ui25dM/sikISY4+uH1Zdu7pTu7Pf7s4NsWSZX4eooPdY6?= =?us-ascii?Q?l/rVbELlJHbnkfofutVaRrjPDA+wxI=3D?= X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 6:1hqFSIIgY3VtnHxEQkTGmhik55Cgs1jAdeS5u3KX6k81VoNGVbFFMwrGBTvP8MYTLkoRFKzJiDNyFe3rf3v9hBkMe8d5Tg/GDKch3V3X4TU9CsBM/oIXqAxCLhJ1G2oNI0frQcf8Z7iiB/gZXwtBjMXoTHCyx27x86QZjQgSBL0JgzaW2Ak+Uci7i+0CKJ7tKBCCPKYoA/OGm0mj+/iSiwa9yrj3xGQRsM9KTMojH9bzEkysp4pAq0ttjwZg+5PJWAWznkH9xPN3JAw2SPBYn9u+Ux5s0yl475Z6fiP0od0Deq37qgb8PZJWCR72fyGarNXOu7ct+nU4mb4nwzEXQjHZRITpgsAxrVb5JkMjmfuE9cm3wuSMWMCd7wwypmzVUCIqK/pTVt/G2R/bDM5G3WKoFYK66tw2gFJZ+tGDIMpeITmp8cDY11V45h8OI+pa/aBuzkHiC/uIxGewCze1YA==; 5:xfgnGkhrZIsynwdzCdoXY9P2hGWu/2dezAHlWSfnEh6w8Sk+BU49KGq0Iii5s/PEhNlTkN+DTW/lzfAaUFkIss7+2RuiZYcOgf6Wsi0Xob1mU5QxB7Slb+yWfGqw60y3J59GTik0Xl7X0ekXwPFKxQ==; 24:kAga1hxYnflteNGzDQP3/2sKPaO0OQN3Bq7k+P/oJpIrg3XQKnkkCf5iyXa4whxP0u05lVkBG71e3m3UbpIQ+Peh+vmgm0QahGzxojaAtAI= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BN6PR05MB3570; 7:RN8s8QPqX6KY/sz2hCYd/tUjcEP9cu6WSra2CRrSBGMqRYAUL7TAF9DAxKQNc/BujZ5X3SuVdD9khcQrE3fzm237iMgnJySosmEo5heAdOy7p0Ys8V9Fs12jKsaSS1FCMGSjgzzV0r8OzE1DrSUqMMXgYlBlUs9HxJQgCl6z4gAlBWYS1sDjpCKF7NflODNC75E5NNTRtmpTbrx0AWvdXLNrmA8MlMTkNhmNhd2sMtcrD3Q/hxzT1EwltUXiWSnRMsj2YhaBAq9OJSKHOHNm5/6BuLVu35tFVGuRFXPEifhXSs/WP6vDE6XT/LReuNygp7UFgtIYpPIwXguvRHRCVA==; 20:5/HjOUXBIYkJ/1/oL8snETnFk3LZ6nki51peconWfz0z6T5eM4GH45KN/ufMRII0MWRaaPemug+zY/LFdnBluelJipRFXuUYW0OCOmQXqD/UJGq6s+9dUI7XuaQ5XfWV59b8LFx76uGiMOhf8R+wWd5ukpRTIwgNLMOz3gz0310= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2017 18:56:08.8660 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp2.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR05MB3570 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Apr 2017 18:56:13 -0000 On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans wrote: > > On the other hand, I think I could fairly easily implement most of these > into libc/regex. Here's a summary of what this option entails adding to > libc/regex, from what I've found: > > * Empty subexpressions(*) > * Add missing quantifiers to BREs: \?, \+ > * Add branching to BREs: \| > * Add backreferences (\1 through \9) to EREs > * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], > [[:space:]], and [^[:space:]] respectively > * Add word boundaries and anchors: > ** \b: word boundary > ** \B: not word boundary > ** \<: Strt of word > ** \>: End of word > ** \`: Start of subject string > ** \': End of subject string > > (*) I didn't actually find anything explicitly stating this as a GNU > extension, but it's certainly not conformant to POSIX specifications to > use, it gets used a tiny bit in some ports, and we implement a workaround > in bsdgrep(1) for the simplest case of empty expressions ("") to match > everything and produce zero length matches. > > The main benefit of this is not having to maintain a completely separate > regex parser and the potential for inconsistencies that come along with it. > The downside is that that would seem to promote expressions that are not > strictly POSIX conformant. Is this a problem? Is this a problem worth > worrying about? > > FYI- A patch showing what the implementation for all of the above into libc/regex looks like [1]. Some cleanup is still in order and the test set is not exhaustive, but this should implement all of the GNU extensions and it's at least functional. It will break some things (like one of the tests, for instance) that relied on being able to escape an ordinary character (e.g. \b) and get an ordinary character. This is specified as producing undefined behavior [2], though, so I don't feel terrible about breaking it. If this seems desirable, I can work on cleaning it up and splitting it into more consumable bites for FreeBSD's libc. Thanks, Kyle Evans [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap09.html#tag_09_03_03 From owner-freebsd-hackers@freebsd.org Fri Apr 14 20:41:31 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C367FD3AAC9 for ; Fri, 14 Apr 2017 20:41:31 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-bl2nam02on0060.outbound.protection.outlook.com [104.47.38.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49BC837A; Fri, 14 Apr 2017 20:41:30 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=L4Zye8I6kY+zyo7jVNmXGvFaqfvTSvMoarOMrpWVSJ0=; b=eo+G77EXn3Ek8qN15bZCx0ZUSrz7iXxCvGJ8lGXmHHgwpG1TuK3YF2EtVT4DF+heHZsJNyrOYoynUwvEa5dgH96zqvu22+WR7IDQwo590hnqdtDIoBSKn3KQbmuN2ElSlRVLNxUy+j1IqcujFXA8WDgT8bvBvixDC4qmjg4JqjA= Received: from BLUPR05CA0061.namprd05.prod.outlook.com (10.141.20.31) by BN3PR0501MB1107.namprd05.prod.outlook.com (10.160.113.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.5; Fri, 14 Apr 2017 20:41:28 +0000 Received: from CY1NAM02FT011.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e45::209) by BLUPR05CA0061.outlook.office365.com (2a01:111:e400:855::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6 via Frontend Transport; Fri, 14 Apr 2017 20:41:28 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp2.campus.ksu.edu; Received: from ome-vm-smtp2.campus.ksu.edu (129.130.18.151) by CY1NAM02FT011.mail.protection.outlook.com (10.152.75.156) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Fri, 14 Apr 2017 20:41:26 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp2.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3EKfQlL030735; Fri, 14 Apr 2017 15:41:26 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 68A11248319; Fri, 14 Apr 2017 15:41:26 -0500 (CDT) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id 102DB248318; Fri, 14 Apr 2017 15:41:24 -0500 (CDT) Received: by mail-wm0-f53.google.com with SMTP id t189so1116178wmt.1; Fri, 14 Apr 2017 13:41:23 -0700 (PDT) X-Gm-Message-State: AN3rC/4d0vkFlwi5dcxA4jLMZFa401CIA5W/ahb9Uxq4KVN7gc3A+LM9 eyNjU+Y7RJD8b8cojnQSUQQbUFXlwA== X-Received: by 10.28.88.2 with SMTP id m2mr316897wmb.12.1492202482966; Fri, 14 Apr 2017 13:41:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.39.134 with HTTP; Fri, 14 Apr 2017 13:41:02 -0700 (PDT) In-Reply-To: <10004f0d-acb7-f81a-f3d5-b368e606a105@FreeBSD.org> References: <10004f0d-acb7-f81a-f3d5-b368e606a105@FreeBSD.org> From: Kyle Evans Date: Fri, 14 Apr 2017 15:41:02 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replacing libgnuregex To: Pedro Giffuni CC: , Ed Maste X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39860400002)(39400400002)(39840400002)(39450400003)(39410400002)(2980300002)(438002)(24454002)(199003)(189002)(377454003)(50986999)(61266001)(4326008)(93516999)(90966002)(7906003)(305945005)(54356999)(38730400002)(2906002)(110136004)(88552002)(55446002)(53546009)(63696999)(76176999)(189998001)(450100002)(2950100002)(6916009)(221733001)(229853002)(6246003)(106466001)(42186005)(512874002)(8936002)(61726006)(54906002)(9686003)(7116003)(84326002)(356003)(5660300001)(606005)(236005)(498394004)(86362001)(8676002)(9896002)(45336002)(98316002)(75432002)(8576002)(46386002)(6306002)(3480700004)(55456009); DIR:OUT; SFP:1101; SCL:1; SRVR:BN3PR0501MB1107; H:ome-vm-smtp2.campus.ksu.edu; FPR:; SPF:Pass; MLV:sfv; A:3; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; CY1NAM02FT011; 1:bXFPhA2g0KkbMpybI2rcQoJ43x6gw5a3+HR5ubKD+yjoNAopnHcxPggB6rjta4OAaKJz41R9cGRy39FkU/REvNckjunLpbSOIF50zwZm6hRE5Ajq2BY8ImYbv9oEELbIqQWytw7hNYTg8jmQ++zakqIzXFH0AKzLD3vXGi+OMjfLO/BjRRY+w6fCMpGmqXC+WBu9a1+wlypzLe2kdm+vDwqsdQgRZ9/0BOARhp4dkGsL7rcWiF+FR6LZgo4mTIqDP26b4/F5juhp/NM9ByjkVIMD2LhRqU950ZneORxV3WGHp0fOsUR+7rlV+ou5ZFa/1w9mHl4fiXV5MHnAxfFb7TT+lK/vbK9aivqJw0EzAQqx3Y7EL6fU7CoZDp/HQDoIDpcxbJ1k3LrPydAyegYl5goLPVHU/dWuH2Ajwp/IZoIfGrpAIe99aMrUdtBiUeDgkDkWl1KemPoqTUrfXOyHqA4OwT++XXRZwCMhz5NAhqXtAJPIoo7FqZx5ZxHb0Tjs+4fnug3s4pF0F5QpqeTRS4UCeVglqxj6EsYjTsuNlek= X-MS-Office365-Filtering-Correlation-Id: 281fd37b-0217-47a5-808e-08d483769f87 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081)(201702281549075); SRVR:BN3PR0501MB1107; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 3:k5Lq8D9ro1XxMqeKm46OFdJpqrsA5PrROFDy6sNUNpPz+iFNbY0fVLL4DNCOqggGg+9fT27hezVv78RS5vZLMX3RhPDrQ9b6pr5h6ypboW/n4AiXq0VRGW3sm462F938rqr/sxXkfWMVz00kjk6dHMf95Ap2Bv+UHKo9DFlM9a8n6VODvNOKa4mIMYUhC/prYiG4bMd1yZBmSgZkNwgt/VYQ6maW2rHt+YyDsYOlCzMcYPlTnPY1J1VPBZWdkJLSGIpOLNJid6zXI7Fy0nueRZT7WCl6Etdu5a7EMKzLi3+JzijsthAZMJpABV2jcdAOdtR8TuCqliRDgaY6B/nfRwNSoh7xHE4ZXeBbrPbC5ZWB8xmUP57CghCxeG84jXfQ/UM6eJrPZpqWBhJSeyrBF/1nthveybNDZ8NFeiHdGpTaewyHN3s04ZbShcxxK5ns+0GB7U9H0ahekeuPBqbyPdKyRT5KHFp9XvERyD7vLuLAs7ZwzlXfFkXT+uSiAyqL02iCw2cQhq3pI4wnewBq+g== X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 25:RCjD6Bky+vwMU/n9AUcuQZeDkgogd5Jtl2BUhO1f1CySFLMJIgbywVPV+/C+F3LsYDhzeQw20F8xcya9cynckAR5HkJwnH//hl1jLlfWspPDqL9gNrsy/dBsvIGCjC/hgvU1H7H0DsfoXWH2+T6MIrHKT0aMyVQOOCXJW8hLP7MIl+v6ikLwYnszU14HxANbkRl2SFU/azA3jN6nPv/MM+CUEppE3xNaenYE0YG5SYBeM9AAlZcYwXjXANiL/d3TSGDKNUNQJVT6L/9dQyJEPVlltUJ5CkGCPFputgEu8Ue/2J8JrZpwyjd8tl0u0H9xn0+DGM1Q20B0YHww/vGiOv0rkciAYRMSGirD5N2YbYpyK5YosM4WVNaO744TqmfnsauroLvP8vd7fE8RetVgXNy9tJdeGpY26se99Irb+4glXS+fJGGTCPPVZvggsoZZksjRxEBYq/RH9chOIEC4Kg==; 31:yjYQWlTORSeQveRHyCtAYcJP9kJz+Ie+l3zFzRmtdP+7Aql0Qm1LupSjbFZgPNIvIlPmhsQUSfRqjpuxLK/i4rIa9cab4Z2dn4GXUE4+bILwTAFidHVwzVWfrUYQ44dpqwRRPo8qxs+lRpQjeekANgAXtoSMqac8miXjbFL6eJvBLo/BrwlZOMGVv9+3NXD8OV4nkl7KfSLU+g7JkrESM/9QEYtLvi9Csr6yu/oUS1Ne9kWfnnKxTBqx5atSbD/Ke7h7LKbT25SnCUkExXohi6GVIkqmf3Jwvns3N+QoN7s= X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 20:vRihyUKn9fccvU51/F227tGYRaOw+kOBbVanX4QYo7R8BT5VNgsMzgK28m5CTOAGyeHzMjZ8zQkRh4WsScQKzfhjJOUQgWNyIRPHXZfN1VhbGm9RXnZqEOruYSdoyoZyfr/VFtcrNzJ3CYDB8+wek7sGAHX12heeLdfd2dm8I0Jrs/PPGcfU4dz3JxVLqwnMAysN4/ee6pMoQkKiceKxW5MXr+nRPDLm3aQCyGq4UXChXsG9ug0uZPTg2nlW0tzGZdL40wx9fEXRf841YckjncOfpW5zjXscz040u+VJNfjhAhLqhJ9vvjP1G5bNn8qbVD+zKts9Etfpy/oqRat1j/mopXlPMowt2CYV+uce4JZ6KTjW9CEQiNlHqcQHPDelU0JI8dftMLLJ7w0uKdaecui3YFyI0garSGwj1CYGbgE7g9jOyVpcnDwO+HP7RiABs3ZcF+PApez6Em6Jp/Q/mtwkUG+RgYMUXhHkHcqQ0wc+FyvAuL2VTOvZsqrr+Xm0 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(13023025)(13024025)(13017025)(13018025)(8121501046)(13015025)(5005006)(3002001)(93006095)(93004095)(10201501046)(6041248)(20161123555025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(20161123562025)(20161123564025)(20161123560025)(6072148); SRVR:BN3PR0501MB1107; BCL:0; PCL:0; RULEID:; SRVR:BN3PR0501MB1107; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 4:UhTCm4JycDc2hfF1Ur8YH2dgSSjDnQs8KgNgFnftFGu/0hQLNQig7vCDB+P79m+8VD1ODeSGMbMGdNDy3We81ancuunSCcXUdI0APwibrEUAtcUZqiI1+W/P5wd4FsCMCh70fQQVR2dAwKlSHaVa5rqgSRISX7cZWLC78mWl0TVg4T/JtEeVauaZGRtwkt3z+REb4mWqdNSKLxSLuDzHUb17J6UxcBvSC4i1oLcIhl3xbe5YU7d714pKECRsN9u1Tp2ZZ5afzp9kjbuLbrK9GtIcr0jyBiCYWW1jRed5agOfLVU2Z8eT8ZetDWNLUoSAym/EFKjsnaezldzs8v2OM82h8Z+GqpSNIGOejeDKob8X93EBvjK3ACMmVJT3yNmlKV+CD8/1q3VTgMU368KBcrqsRH2oiIKOn6shW9nYoom57nysWQYlwvsrJQeiDFx0KVe4pxBSO1GGjwTZ5IVuhgdOacDXReiEb6P//l/3Ti/UuwbVxZ75awSnybNr0dVfpownaraQri6EZMtk2lJ/NTmTJZlmS0pDYb9Th+6wOlSA1tG2we0QuJ9rFjRKfIOgOg7yAY2Igok479zRpkXazaY66mDlNS1Q7tUlhbbHFAQ+yYYI4qKnWtY1W1DiCtIu56EGJ04q7qeiw+ql0L8jEd47xsOwOl7iwvsJ6MisDFEvIvrEBXt5HOaB755F5Czq/5pzmGRHtdoYx+jv9ac59nnUZ+YoMFcZKAXWjVgs3WAwnn5fozJtLeuPou8T7dUD6CwMqBQTwhZgMU7aTp/W9OqAXhA9lAb5X/folo3oLWwyF5e1tY93A2RI0OYf9gf0piCH3iEO1D+0YWj2QVajI2c5Nga1I6Ne2/2QkqBiMzc= X-Forefront-PRVS: 02778BF158 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN3PR0501MB1107; 23:JiUBbFdMMK8FCDvXSr4RSuVeNB8XXzLgMKpt0vC?= =?us-ascii?Q?DVAffuhu7Geux50GhqbdNkj5l8r2v+fKlPybw1oSDPgkttnEh9Nx4s9nDpRO?= =?us-ascii?Q?TRK0bZfhAsTOmqwUT4Z6fFwRiAXiX4aYVDG3zdSKbVm9Bd+OmBto+DX2PO/U?= =?us-ascii?Q?W251YE5tHvTluXqzCaRCuwgVSXYxRJ6iwxr4v2JOXvBCbC8djZZqGoRnC+sl?= =?us-ascii?Q?JiK5d3K/zduVfbtYytSjJADjUGasg2Or3sNwD/bcA4OloQXjuH8UE9w8VmWP?= =?us-ascii?Q?WyioEJVHWD2FIIvE5qK1UH1GvDvT20+FWhSZXtYF/I/JB3JsxrigbLXrui+h?= =?us-ascii?Q?TO3Z3grbCWtmZvvqZFPNglLSsx+2oMOMdS8UesILj7FLDpxe+uV1Lqmd4sDM?= =?us-ascii?Q?71tj8iu1QmGtnhxzozzuVpFrtpuo13b9QKkftJwRv1Ad144Hy9E3vM1IAzQs?= =?us-ascii?Q?rKcKq2hYWi9JseUnNhBdVi6pa+DNf1ycizq/VYPOW6wIVOCnWpL2rCydCTtV?= =?us-ascii?Q?Ko3HE7z5/8jdfaDpxrWUj4Um6bd+YDmQn/sGchXDLEZtYwcaOkTgkCvgDbwL?= =?us-ascii?Q?XfF38LVwoDzCSjxoKUGqbcmH1zDXFI4SA0jpPJSy7XO6E/mjZ0rbWZB4gJlB?= =?us-ascii?Q?uYGLJNKSli4W+FwZXW+zQd3PEXu8vefJy21Xx0d4q5WaAytr6k4dCuhWweIZ?= =?us-ascii?Q?TNzKguy7OsSerzXaDQ7e1MFdsUdIQiGXKH/Yypqxmpdm1nH7KMcwO2vbCSmR?= =?us-ascii?Q?YbPt/04KPy8t0LffkXmgoo1EKlzCeogv2gOdCBRjWB9jCR7m0yc3QhBmEJ7c?= =?us-ascii?Q?17CrxjtvocNUDt2IzcN5mmn/1kO1oEPsX5WGJ4UWxYPwjfQaWzdvF8Dypbwf?= =?us-ascii?Q?UjwwgRPib6wu31KKHaJu5rsGEUvmlYSTxohrD60OFTbd5e+FHGQBA4Ouc1OL?= =?us-ascii?Q?iPEpFKkdTnWH0UuP+QHjEwLlo1o2GBqEhtAkM1fguELnWakAdiDfpyv0GVD8?= =?us-ascii?Q?klws7GTucRjGIE0QfiD7xwiEApNw9OBwJzWMW6klf6IsDJChMPujEa1YxZ3k?= =?us-ascii?Q?cPK3Fmid6wq+yh8vrE9pAK8uVMM0EmHdg9mG4vHC5MOC3G28HIIK8I1IXxPK?= =?us-ascii?Q?0ja0/TZiTn+P7pBuwgEwcp6hqIA3jAQuayF6IUpkFw1zt+ffmABmbNdGSNq+?= =?us-ascii?Q?Xs0kk467mZgKIKdt2qxndnzZRS9ue32byqAofK6xGAWZ+40GeLDyNh3jOHTT?= =?us-ascii?Q?U48cF6MiKDsVjv2brzHkbdCNtgvTTdynDNkQnZOhdld7KuL13ZcZddlH/RI2?= =?us-ascii?Q?tqvMwH+pol3OroAdH3UPrD9SIRce+dcEY02pIG+BNdx84YUWhHZGHApeLqcX?= =?us-ascii?Q?CegdV78U5absOSZ1mbnVapYk9mLEZvXcn88/VWLwroUPBMfy8?= X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 6:xoc2Y9q6FXQUkDYjY1kk5DPrYRJdNLDWou1GdM8NSzjCFNiCGLgmi4iczDTOLqUhmbmF7t+7+UD34pPsXdjioy819hgBKT1vn3A737S6eJxM+KwtxS9r50JvWhGVobkKj5GjzqKa75HzPIEXwAS6xWJiUB3axVz2gd+PwCQSY1LRPh4ppFy4Jvgr+XNXvPsIrSlPKgXFVHP8RaUrsdyb7WmgzrqlPc8PLDzjDaZYGWvkJsDzMG4m7GlO+Cy57ycidW1nipycfiooGG7D2Ln9QOZmvxiKiweSI5VcHKCLQe/RUiWpz9imSZ4mUtF4aO7gK7Swk2uAATFQJ7jo5fYZ1tzmj4UVbLMYQdmRtJYzuMmxNelCgj1bhm3KaPlnqLurpJESXDPM169tFpziDjg1iFDk9HWjIAljw4Zt13eOoKOVgrSWTFAE+dhildZd8BLfANJ/pRv519wPT00Bmc5b7A==; 5:jrqIPwRrqiujeMGONCZInpkXh7eGjpzrauMDuXBeXUPslsONlUnV6NVfIYOxUOieQGWWHqfrPG9TXsSvRTKQI8Y8qU2ec+amjZAeeVYorK86y7B/8D9PqrWa1Qtj50qbJdn33n5gJaA5VY3Q2UVrp8WpX6gLe6P+jWoiqlx3k2s=; 24:eKtTwljrdN4s0tKHV2mNrjzL6jr2lpfpmtzKHPzY26SazJhl+fNQC4BOd+jAfxfcOhC2o3GAJD8xlfL0Vq2nY/5/yHjIBDPZTjTS67WhOx4= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BN3PR0501MB1107; 7:khH6ojj6xFb7mYVUy/QMuPkOlC5DdhzDJQR2KXZKy+tbb1Tl84dRllCiMX8mXWF8/Kon0lR5KhOahAhNjcQWzcWFdxmXJROHVZw0G+qTibFs+xAvgvXA6QtubGUcS/KDddhmwczJYiNFMW/ilnV/xigbB4izePdL+E01RadfxoLXEOJriEQNjUT3r41DdDCbWCdQX0c4WhHFSeVzV8vFE6E6bPtbDAyjvm0CbubqQZBenqsfgHiuQX9qoJXlzZSvXn0XDwUp+xKWbWJCJeWDIkRhogMRqpSq+ByJs4XHAiXs0REIYqC+ZMX05UFO5//uyAlCNnZR0DpfwiSpIrli0g==; 20:pzVrc8QNL3bEvJ90u/goTwnDSNRu3hvDi0/OdCHHDCdyWrA59NiLS1lNzOmBZq+dfHu12flKCOltA5VxU9LpWwqucx0tRLJodGp21508/rp47YPZTzCeazfC3KF0xwDUFQqkDUPZCSds0qP0IJLtRhre0p66upGKZSi5UHHs/2g= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2017 20:41:26.9263 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp2.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0501MB1107 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Apr 2017 20:41:31 -0000 On Fri, Apr 14, 2017 at 3:24 PM, Pedro Giffuni wrote: > > That doesn't seem good: anything that breaks tests is very likely to have > other side-effects. > Keep in mind that any regex change will likely have to go through a ports > exp-run and > ports will still have to work fine in three versions of FreeBSD. > Yeah, I anticipate other side-effects from this. Fortunately, there aren't many ports relying on GNU extensions, and as a part of [1] I'm trying to get them to start using textproc/gnugrep since this is more up-to-date and well-tested. As far as sed goes, the only potential breakage should come from \<, \>, \b, \B, \w, \W, \s, and \S expecting to be ordinary. This is easy to fix in a way that is actually POSIX compliant (unlike expecting them to be ordinary), so no worries there. It's worth noting that I have absolutely no intention of changing anything to actually expect GNU extensions, but I tend to use them myself in my own daily grep(1) usage- some of them are nice to have. > > It is difficult to know exactly how far we want to keep the GNU grep > behavior. It is perfectly fine for BSD grep to keep a slightly incompatible > behavior as long as we keep within standards. > > Just my $0.02, > > Much appreciated. =) Kyle Evans [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218385 From owner-freebsd-hackers@freebsd.org Fri Apr 14 22:28:37 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 585BAD3D252 for ; Fri, 14 Apr 2017 22:28:37 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on0069.outbound.protection.outlook.com [104.47.36.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D8387E82; Fri, 14 Apr 2017 22:28:36 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=gGR2MTCgceHizSh9kfwfvaYGN36EWBe4svNjhFw0PpI=; b=HqVNA0LXZC/974eZ/EgHXgDEM6KVWqM0mbZKSiv0gnJ6bU5hbM7boYJIMBfY+jvhcaJNx/HI8aLso9n+uneBhidqkwlnH3d6dTho/J/6JJD9cFde5lNAlebX9VUeypO487v80WIxOikbNWbq6x7k9wIDgI/UR+M3M4VfePxEJq8= Received: from BN6PR05CA0007.namprd05.prod.outlook.com (10.174.92.148) by DM2PR0501MB1049.namprd05.prod.outlook.com (10.160.25.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.5; Fri, 14 Apr 2017 22:28:34 +0000 Received: from SN1NAM02FT006.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e44::203) by BN6PR05CA0007.outlook.office365.com (2603:10b6:405:39::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6 via Frontend Transport; Fri, 14 Apr 2017 22:28:34 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp2.campus.ksu.edu; Received: from ome-vm-smtp2.campus.ksu.edu (129.130.18.151) by SN1NAM02FT006.mail.protection.outlook.com (10.152.72.68) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Fri, 14 Apr 2017 22:28:33 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp2.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3EMSXWk028436; Fri, 14 Apr 2017 17:28:33 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 099B0248319; Fri, 14 Apr 2017 17:28:33 -0500 (CDT) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id AB02B248318; Fri, 14 Apr 2017 17:28:30 -0500 (CDT) Received: by mail-wm0-f44.google.com with SMTP id w64so2144884wma.0; Fri, 14 Apr 2017 15:28:30 -0700 (PDT) X-Gm-Message-State: AN3rC/7EfT19nhx70pJnorlb9WBlP7AunJQXgQaOXxj3CBWdQnA/3bJ5 uDzXe9yRZ2VEFP4/HhfA11y3Z/tEmA== X-Received: by 10.28.98.66 with SMTP id w63mr488539wmb.33.1492208909847; Fri, 14 Apr 2017 15:28:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.39.134 with HTTP; Fri, 14 Apr 2017 15:28:29 -0700 (PDT) Received: by 10.28.39.134 with HTTP; Fri, 14 Apr 2017 15:28:29 -0700 (PDT) In-Reply-To: References: <10004f0d-acb7-f81a-f3d5-b368e606a105@FreeBSD.org> From: Kyle Evans Date: Fri, 14 Apr 2017 17:28:29 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replacing libgnuregex To: Pedro Giffuni CC: Ed Maste , X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39400400002)(39410400002)(39860400002)(39450400003)(39850400002)(39840400002)(2980300002)(438002)(24454002)(377454003)(189002)(199003)(55446002)(9686003)(90966002)(88552002)(54906002)(356003)(236005)(221733001)(512874002)(229853002)(8576002)(110136004)(4326008)(450100002)(38730400002)(8676002)(3480700004)(6246003)(53546009)(8936002)(86362001)(2906002)(54356999)(50986999)(2950100002)(63696999)(76176999)(42186005)(498394004)(61266001)(6916009)(106466001)(93516999)(45336002)(7116003)(61726006)(46386002)(305945005)(93886004)(5660300001)(75432002)(189998001)(84326002)(55456009); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0501MB1049; H:ome-vm-smtp2.campus.ksu.edu; FPR:; SPF:Pass; MLV:sfv; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; SN1NAM02FT006; 1:Wcz5L6yu15tcei3G9vTbY5s/CxXUYIc6LWToLbGq9zwJa0e9r12UWjcYGq8nAJuqTZMfp1ubydINW9Cdev4KrnoL9r2oZyskCbQYGQKI6BkOj7CW1LjCFCuIs3wyHHioAI5AWerAdJ0pHpCwVPNgl57UHEGAvt69fDCggjPeTtru91CEB2t2aKNxb0A+SG4BQtSoKZU0uzsTTbakpPu/jgWo7omZeH3Zf4T3u653RmQpwkqMVbpFdhm1eMZ3/7qCme8+a+AEdqcnpAWZMqP5JMiBydJzWZyZpVaVqrGBWM+7G9JiHdPJNe8uFvqNAUEap9eLBSNEMQUkuu9iiPm0539C8YRNjLmoedkAPku1wD3vdObTniiJlhVnnnohFB8d7ywabtsEzWXNRyn+wl3vHYDDCHUj5De77TrBcsW/2B8JPuypqU4sSXv64gT0Zl/Lugdycd+5KXIM4+3P/V65lY2WShRnkgjogEBLOccgyN5nniawVPynDYYSyObp3/6v9KPjrhvQyxZM1yONDBdSqQfPqYfK5e+wBtaMX1K6gbE= X-MS-Office365-Filtering-Correlation-Id: c0737644-330e-49bb-c1f8-08d48385961b X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081); SRVR:DM2PR0501MB1049; X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 3:s70eIdh3mCeqwvYJNy19i/s/77jq4xEUuWQGPIJNexsjKciMLy+FGDV1mcwgnZQUE4XtGnb3ihG0d93UZHpJibxMJJ1NWowPgvOTJPJuwaPjYhJGBzH0VFk80CzFhpNr8H8/zokuNCUqzeKnqw5t0x7q4+ch3l7Re7btW8jD1m+Ig74oMes3yHTeUQVimvz+2R1F23/WWoa9vjc5udjrmHt9HBszPC8u37Jn0oRlQpw0UqoksMdjOu/dRogCJ3E6YEAJ9jm6qzsuGERP+S7Ym9mkmoLqGwEM/V1X5PAKZ9hgtutVGbFsyzygZ3FB73HjB8yCn3ztlt51fzVKbLr78dUqc/QVuJ+qX7l8MtBu90JrKL6ZMZEcTIG7ym5sylls7NdzdsT4MtC+cQ8JPRrMWqhbi6IpE9pdBUMRhwGenvXdpxwtme2FRzmX6ruQtQPTbmSsyYnpB/4tIGSOUw4ta24MezuNmVh4Mo33KVTKzhKbkBzveOfjAkK3l4RPif0F X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 25:MCMHs4LUXllaCgELsrTIQuE6RsphtnHHPbGodcQA9eOjLTE6eZmU09Mh0B4tgb4Rtu1V1oagE3CQyZA1myz1rha6gIMMOAO4QXDndDUaiHivt9AA7bCRYnfquuoaJ/gA4qVsQmSuXE2xr6oQjvbxERqqlQEZhsUfNj5gSr8yJWziMo97zsOV/1/ZKQdwU/hk6YxhIiXWDeObOh/CURqOisPTbFdfbnmTWlevHal7lH4lM/irACS1tE9sCiWHwuFZRXo1Noij7z+8gT5P7gv1qXrLdSEdGCZvblGMS0Fda2GweoF7dypxFHHGdvhFmx277d7K9x+Ilf+WnbhmdJVsabK7aMIDOxIEEPHEdXzGxGJYhV1S2y65lHqMbNWL5hRTJ4UhnJ2+hv/BWvykwS3I79RzM0d1FyzbZZzLGrSfvFLTI0ERN+i4Alv8wmHP1FzZJbtYaRxc+h4sngh0h7PY9w==; 31:eC+4dqpEMDw3ezWeA1J1Rb63Pp1Be95joL+122GzZhFcIuFdIrQiK2wBqYf8w9+Vzr/icxNzMOdR6uOwBTI5M3s8qeTxdJmmWjdHGMyfKt1VFe6f/zJJy8QFofkivl6XppbdmaZvE21UKRubdfXiCPPsUOydoYVfwg1hrAFaD+9n8dO4jJYWDRwcZ8iZ0YYWz9RcBMpYJQG91wldQvL6IUWQMq8xIRcoSDknn4w/xO2ci7SvXt7YmPbD9ssoUKEbRovAnrC/snj/1NdiNDlqMg== X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 20:hHDFdskJLCwDItI89FLRdAnoK6eQmF8dIlFPE1LtB/WybmB2aXCjOAt2i7MG58sY9mDSq8NMHc7tB6VS+B8Ue4LvD1O7goJbcJMcA5BtQrQ9xS0lSqJTjB+S6Bi2gaQrUJ2PVhcww52ggQ3w9jLFTlQSKObJNxUmKVwi6+J3tcssMfj1Xb4uuxapKl+KweiPKd+B/049Y4I6wLnfKMLZnu8SaXGQ7NXyAMAIOMKBopVkF8ROggd/WLOX3mE6vYbibByEso2WLqVAWmmZ2qIJySyrnyV7Qd0VoeCf8kzso8NwPTRuN1YA0IUEVrJ4zQYOGlej5PaZolMw9EPUJv3ng6P5vO9u1SzJMVaBVkHjOQoumX0+f/9ZBTUPI242nCvzhaBPb9KH9iy0zwggPRXJaX04C0PWbk2yIqJMlU9ZmiX2zVxcHwgG0ll1PfqvmTQfj9jj6B3z2MCdK3eVg9qiL+m4qX+hZjLswlAjjBcSnQgfgEYbeKbuxoval7tjrE4Q X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(112903893386949); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(13017025)(13018025)(13023025)(13024025)(8121501046)(13015025)(5005006)(93006095)(93004095)(10201501046)(3002001)(6041248)(20161123564025)(20161123560025)(20161123562025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(20161123555025)(6072148); SRVR:DM2PR0501MB1049; BCL:0; PCL:0; RULEID:; SRVR:DM2PR0501MB1049; X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 4:0j7QluHyQ9IYe6zAMl8Teo8Hn2DMITfJHikNS5PitRa0gm9cIFEd8escNliH2KEGDdrljWv2dmib4MEE39MQxHaFolU4mKK/4vVOx4p5OAaEOp79LFy8F4QycD78HeNXRNjMaje5Hca5/b3n3bqg+Jywu6JK25lXV9E/FMuv3+ssvvt089AuDDXXiIRG8RcxKQyOfI+bqqG8bo2mGtLuDLbQPODGWKJC3ZEpFRwliwiWwrnrNempbAS9VjRMGDOHzOwSP+a2ZidnrX7BOOsaQVOOo5KknRVPGbIcINmEdAEl+ldhDa8uu/qdkpoONdnTgOmUywj/ZbHNJYpkjuv74xvpEDs3uRY0YBRXgoUda6QftLsL1Dy8GzrMEOIZ8l+ND6Jypp1dLfbLj6fLC53qOoYf8GbLQNjJEwyXfUPrTqkhw4rQwL9ZiFIjQkZLmbiki7173P76v+oFqkr8MH6SC5lUVkeuJhFak5a7vUJORiamvnEOWs7xBTVwS2VSGDNufKdtzdTHTB+D/v//GwaA9mIKd4NJzuJ80wDar6104i+Mqp4HwINInsH5sQLHhzuezAUFlKlEV6teRkOZrK9h/MyIwFT25sSeGUTHEJ2nj8yXvOBzdxTWdUDa73Npgs1E6xA+n5Z/UQax91nLqmqC2cDdgzLlzsdJzEHAdkdb2kXl9yMpI1ClcFu/Gt0avyESoG8lkVobN+7k09/5YBVJqYtN4qiFdbU9edolWyxg/vQgSCGO+g9tCwXTZq8/cuhpLj6+JO27ZWb7qR+9BhKdO5UgV4FQgI0q76H4LYE/sA8olXw/nPgQk2l2a06jFObUC5Thi6KqFrgHUqTn7t+ewnMSC3woAQcI5r4xmJGoNvDNPEC/LPo83ZyvANHDmn3LjGZPvJMtM1M7bbceT2q3fQ== X-Forefront-PRVS: 02778BF158 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DM2PR0501MB1049; 23:nTO1UYMgUqU51TkD37jFHgL+Pr7HF0cJfAOE53f?= =?us-ascii?Q?J2r0n5bQghQAua6yDiEGgw9v2uwmZqCj0aR7Rhz3eYxRQfBLfa0/B/IRe+iy?= =?us-ascii?Q?IsaDSHyRyHr5aAbvPE4yk3Vv2mE086mLue70c9tP5AZ/9n7hmqp7Bq4lQ+Bk?= =?us-ascii?Q?IMRpU4m8TL9Bw20a1qUeIFeC0GUJ73whh9tdz46Bh/e2IauO3JKFbaO8hlFe?= =?us-ascii?Q?0ADD48mAPDnhXE4skCcY++3Am5Z6TlCSiIaQfBTUl89XteTKLoMYhDw0OOOL?= =?us-ascii?Q?ImCyRspMI9eDkozE95GKGe39SrSZd+CD14vzHShdAokGhkffKzYELzcqL5/H?= =?us-ascii?Q?ViZUxoVnifLWZulXGPJfcIJYnGWmZp7t7v7PjQbCldQ73HzreEv5V4nXHFsM?= =?us-ascii?Q?S/ngO/hrRU9Wx2Pu6/VUs97Qsur3BfApKeMgg7Fk5s1VV2pbqpbsqJsyYJtZ?= =?us-ascii?Q?Ntw/SHifJr3Z6ToQvXEkgV6OOmjl+BpBdzSoxVezJcDBKdOajM8HdtmgXwfG?= =?us-ascii?Q?W8TFN/gU2Anz/Qpq8VFUI2AMJeB306ws9KsGaUNOK8VqWk7BH1I4Cqetkx0J?= =?us-ascii?Q?xs/LhkANpuuzDY2dNVjdaoiX5tmyTRRB/L7KkXYGPLanOjPzxhGssT2hI6QK?= =?us-ascii?Q?+1sa4WkwfslPde5Z79jcBuRVxQDAQkZ+6DmJj+4q3R3RVm10dlXhncTdVEWr?= =?us-ascii?Q?RIDlxnYUQ65JggijCbUsXnqet2VXCfW+kucIf/YHKX2K3Ti5LEuhspLK5CST?= =?us-ascii?Q?ivtKbCiavRIM+8qGoJgTDCwbwId8Tblx0qnWelBph5avLd16hsmJOkQ8GHd/?= =?us-ascii?Q?UT/KRiLSLBfQwALhcWIo4BWpY/aWOULwtAr7s9EMyHp4IfayKbmJ8DEmc+LQ?= =?us-ascii?Q?ZHRO9L837GnhFOZGfKTeZvwMMiUAbg+wLPQyIrhVYErTqCHG4Ccw9UMGCD1V?= =?us-ascii?Q?pa5lDMmj0bGomCTwVqHy1fARwD2oom4qpMDMq+fuvGOKatI1mKdD+IHJFvXG?= =?us-ascii?Q?0u9rjPQKXGItvBITfgbs1h77lfOv/WTQnrz6WLBKD4Zn9aSiCR900EnuQo5g?= =?us-ascii?Q?7SFuXjEZrxsMjow7x0O7xUQRHtLVxLBaQrZYAtWRVV5rZYxgrslbBkGUEJzn?= =?us-ascii?Q?0t1Xbs00ZCZAmUE1oqxyfV1skSSgsJ6g8cg7GejjFWj+I7qpcyFSlGPP2oC3?= =?us-ascii?Q?zJu0elMVJlpLTOzTxzUWzBiepBmKxPgpW/wrIxlOjQZgern10p09llYd7sLv?= =?us-ascii?Q?d9dgDu99rsdmHfrZqiSYiaET5pSNkliN8XnqrYbnM4bA26lsm7Fa83lhT5sT?= =?us-ascii?Q?KmUiw1lnN03y1gFRK8G4FmxGPyuI90751uDEqFcyrAaBwEHNzNF6AfnCv3qL?= =?us-ascii?Q?C0XzZIA=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 6:1jZ2OQuHIRWE9ftya8jHFbSZh/ay42B4eCVK3Q7yjk+Nfod6gBm8Ftz7/Sw+iSdff2IANy4YOpA6gBfq4dGflPfUhTwxMPpwwlLkb8jcdCiNxMFbcbGXd3g8CRdovLsHip5j1UZ8L5RBoAqmwdFm8++BKrunYjc11WiXi1Qx7NCAtgvxYfoaWUAd6ebQ1zYl+FiIgu4JGQKJDitbhr5mx3U5MTozlKsp2hwANsCsXbleQNPz9i2nQ2P3hVcY+UnSuFC0JsVAjyhiVI2W0Fjg2Fw0NVpCqLat67eA+JcOg19p9axpFw/DjywW233e9G3HKf4R5BcpqmJ2EVW5ar83AfG8reYPaxFCaIuoJ1pVkTmwXZINnZLzRsBPwtCHUpXkCg1xvziRWZHS2ZBs+y7x/CGM+l7qdMi4T2qriw4Y21vn7eUcpLTHFHSnkAy2BoBdQDI7AYjQPkgty8VDpu3Eww==; 5:o82h+/W84Nm5mSsKwQHGgoYSQ6etC1kqp9Xo6PZkA4jItw429WZJ2B7SI02VFyNrI/LV8xBdy4VO+16PZZ2EWE8wYwJEEWF1b4DBIpdrqAYzroq0CYeCTLTvv1MEDTfoiY81/AIhzipMaYpb8bNRHA==; 24:6Ishjuj0zvNzzd7Xhm88iGZIzZzfV5XMRIS3rtzgqTKRpxWRh0P+e3lrTLss61x17grlnUP8ZoQwon+5hnrphCIr+MO1RxjKUlDOolYIDAc= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DM2PR0501MB1049; 7:JZ059vuBAfMaM6eVMilsJ46QB97iHxrc1wrBrjbyiaNS9IaKfsE2WpDOPMhL5w7J027WvUG11IHl7eEJ2Nr+WQAiwjCygtbN5q/OpfBuus6J2lUoeq1Ux31EjS80bYgJm/mdwQWkPCZBSKwAd1e/6DRDgbfqArnx2/wRvN9VLXQt8w3KQRsMSItYpAFVgfz0ZW1oNRum6YBGNBxbRNmGDk+oPDFqbn470hFn78guNQtB/2XpSKq/maYx0EUCjegbRkT/nhYl4inb/WBdj9WVPCsflDACdgEPl1UK6TdYgA2kvVUDCjHyxYQnh8wbHDGHzhfLIGmzuHxuuyXGkxKUBQ==; 20:gKPZ4JjpJNCAlnMlpf/2oGXsxW+j6A1D6ejDIkPuGrivoaFikww+N8jVlj2fhwz1U1VcbItofcokE6i34v1+2WaSWWIiQnTGO6phbTuPO/FCWOV8WBjc0ZYRB2dJu47uGILob+cy5ZJBoG2OiDvo/y8dGuDpEoX9aYVW5+dw0xc= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2017 22:28:33.5560 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp2.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR0501MB1049 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Apr 2017 22:28:37 -0000 (apologies, wrong email, resending for list) On Apr 14, 2017 3:41 PM, "Kyle Evans" wrote: On Fri, Apr 14, 2017 at 3:24 PM, Pedro Giffuni wrote: > > That doesn't seem good: anything that breaks tests is very likely to have > other side-effects. > Keep in mind that any regex change will likely have to go through a ports > exp-run and > ports will still have to work fine in three versions of FreeBSD. > Yeah, I anticipate other side-effects from this. Fortunately, there aren't many ports relying on GNU extensions, and as a part of [1] I'm trying to get them to start using textproc/gnugrep since this is more up-to-date and well-tested. As far as sed goes, the only potential breakage should come from \<, \>, \b, \B, \w, \W, \s, and \S expecting to be ordinary. This is easy to fix in a way that is actually POSIX compliant (unlike expecting them to be ordinary), so no worries there. It's worth noting that I have absolutely no intention of changing anything to actually expect GNU extensions, but I tend to use them myself in my own daily grep(1) usage- some of them are nice to have. > On second thought, I should add a REG_POSIX flag so that we can make sure to maintain POSIX compatibility instead of removing the tests with expectations that cannot hold. I think it should be opt-in though for the sake of, say, gdb, which expects GNU extensions. I do still intend to fix the regressions that occur because of undefined behavior, though. From owner-freebsd-hackers@freebsd.org Sat Apr 15 06:03:13 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35BC8D3ECE3 for ; Sat, 15 Apr 2017 06:03:13 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM01-SN1-obe.outbound.protection.outlook.com (mail-sn1nam01on0069.outbound.protection.outlook.com [104.47.32.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CC9A938A; Sat, 15 Apr 2017 06:03:12 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=VJFfAlO+msErk2NxE7A4+/32btE1s8PRk6PbvaBiLAQ=; b=ER5GG7hNJuYmtitJHdnkqtoQpUY4w8HrG//ter3WGuylaw2/L0tADqlN2YNJy7O2xbsb1JtEyIJIZZcMIYonLJzASk8fp0lMOlUTkrzQjztE7pSJwQq41BCXRI2h/QMkCwHwRF3VnxOuHZyeTBOFkLLA9r5VxfYTuvU3ggYYXMg= Received: from SN1PR05CA0024.namprd05.prod.outlook.com (10.163.68.162) by SN1PR0501MB2047.namprd05.prod.outlook.com (10.163.227.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.5; Sat, 15 Apr 2017 06:03:09 +0000 Received: from BL2NAM02FT044.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e46::202) by SN1PR05CA0024.outlook.office365.com (2a01:111:e400:5197::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6 via Frontend Transport; Sat, 15 Apr 2017 06:03:09 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp2.campus.ksu.edu; Received: from ome-vm-smtp2.campus.ksu.edu (129.130.18.151) by BL2NAM02FT044.mail.protection.outlook.com (10.152.77.35) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Sat, 15 Apr 2017 06:03:08 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp2.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3F638fa015535; Sat, 15 Apr 2017 01:03:08 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 57FA3248319; Sat, 15 Apr 2017 01:03:08 -0500 (CDT) Received: from mail-wr0-f179.google.com (mail-wr0-f179.google.com [209.85.128.179]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id D7499248318; Sat, 15 Apr 2017 01:03:05 -0500 (CDT) Received: by mail-wr0-f179.google.com with SMTP id o21so59175369wrb.2; Fri, 14 Apr 2017 23:03:05 -0700 (PDT) X-Gm-Message-State: AN3rC/76pMIK9NG8BnohSPllYBvmW1WO1W44i9cAwoZYsR2xHZa3tU3O Rqifv3LaiskvMXRoQtYEsin1WDU/zw== X-Received: by 10.223.154.54 with SMTP id z51mr9936411wrb.76.1492236182773; Fri, 14 Apr 2017 23:03:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.39.134 with HTTP; Fri, 14 Apr 2017 23:02:42 -0700 (PDT) In-Reply-To: References: From: Kyle Evans Date: Sat, 15 Apr 2017 01:02:42 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replacing libgnuregex To: CC: Pedro Giffuni , Ed Maste X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39850400002)(39410400002)(39400400002)(39450400003)(39840400002)(39860400002)(2980300002)(438002)(24454002)(69234005)(377454003)(189002)(199003)(9896002)(61266001)(55446002)(512874002)(93516999)(59536001)(189998001)(6306002)(9686003)(966004)(84326002)(6246003)(6916009)(54906002)(236005)(2950100002)(229853002)(606005)(63696999)(53386004)(54356999)(86362001)(110136004)(2906002)(42186005)(90966002)(50986999)(98316002)(45336002)(88552002)(2351001)(46386002)(4326008)(76176999)(356003)(38730400002)(3480700004)(450100002)(7116003)(8576002)(221733001)(5660300001)(61726006)(8936002)(106466001)(305945005)(8676002)(7906003)(75432002)(55456009); DIR:OUT; SFP:1101; SCL:1; SRVR:SN1PR0501MB2047; H:ome-vm-smtp2.campus.ksu.edu; FPR:; SPF:Pass; MLV:sfv; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BL2NAM02FT044; 1:NH6RnNj+T4wkRW0TOTpG0r124MwsOh87KShcjrdSTufLAfUgAa+IwVurvj6bgjGTNZAPi6f2NXiXujzoIdrGX8AG9INQwcMvuSs9MLdoINTEkRmr9aAegScpqxbcS93vlWYNjQ3f3ELC2XKT15GHqMbniRiiInvpXUdvawJrk0x7P/HQccoR0nN3WeFrbGye7zME4Qaue1KuOjtYYaOMUV8jARAI2E1QZEq86/kAQ2UsL3hhsHx+/UpkAD4+hvv/Hbux5cNXd/9Bs2Mx50tIq+yNoTFUXnyjJYKifKaAvCGp5wVpOH3O6rbdbQQd8/fRebMOrCCav9IKvP4WhTIc+89VW35FSnoFD6v9G9YHVzfEeUWlRJv35a/VjuodVGK9Mnr7x0/lqqS7XrGPZYgm/RsfWLeCTT8HYHp6bwFAQ8A2dSZ/VyfnYcIsfVi093PMGlKDbVYRoI6r+nHpqTwp3LaoB9JfdtIJA0VbYm9w05i+VNhl3n2EJJzABKeExeNMtwAyBA2XyTaDxMG7vH/6/JO685HkpGTmt/xvXeXmfMQ= X-MS-Office365-Filtering-Correlation-Id: 420d314f-af14-47ab-f3c9-08d483c51772 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081); SRVR:SN1PR0501MB2047; X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 3:mqqXYen4PwoqUFaO1T5x2QaeaUmW08HGR2EtehvxZRVauRZNmVi/Rz/IX+f29/2KEohppoihD2o9qWfFDyAdCpcv7cO3nNr/mIeHTN/tg/OnIahXacWMw78R30hnszEjMd+rlN/CSnE5L2U0uvhUF2LM42+YMPo4B/oecCttT6YUwswfYWnS4zPYBRpQH3ZNi0B2Upl0qwhK6dV6pCj43j7imoOxqiqkiLSSB/fahVzvVv1NdE/IhdJT/YfovUNHZMBCLW1M0oGygLnsbUAf+3q+p7F8wvyBQTbnzBhx15otHk+Fs2R4dew9LkuHr1ThB+GZj/5YunQdJkqKI4i2IwXgBNo+nMlVeIzHL/GAjt2wM9n4COxkNW5Y36H68gLmLxycMp0Oe3yZigkgedOJjGBn6fxFCXI/ZbvDXfSHOBU0qVf3Pmf5Vcln5PP05Xgj7AIeuGClpxES/QBgGQ43zFfItd5hIomG4WJL2mXJo+sjueCSrsiCqblaYU6toW01 X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 25:yLAPJsfnM+FVvA9nMZedDqwZlYARdDp0JaICeV+T0Jh8KyqfKWukrp8TkPH2OEYaWAjc9l3viapk6g0c4BlVJMinMpyJHbctXntXANV3hxI8zLGWi7NZpSm+HNljkVOzcBq5IVri24/Rh7+NxxnNDYeYdAfERFIWHRWqVcifMpXjC0gJQtzynxWuoBwbAoFhISBiD6C3YE6jjLllPkqye8NL7XshdbXSFArxP0MTSd6DZHN08lx+KqbXaFVW1/gSnX8ZFawJ102eVqtzYrMVMU/x29qeXGogGk5MH5xddU0DPNDYTF02OdBrYVHknxSGW73yDWT1jkMCNq3OmxPrCmEpRVRj4zmN6RvJyK8pc6beRuQxH5iqdfM1wdgSjLeAlYUUVJdpSo4DTBMnFG6zIwnNRiJ4jGvpZUYPK2cX20svSPLJHOe4SfXXGhXWEUWl6NJWix2acYLkUzV5v2ArwQ==; 31:HeCTC/jg5OsZQEfSRJOiWXImusR5tDOpPRvZPhQPPSd61Qw+x2iCyP/yGvwgoY4IaR5z0sDij7ZCEdRW2skbpPD5muNXi9xMpVnkkRLNCPQ9vPKIhGqIFltQJorKWHXjnmDTZ+Vx6PA4HGRjraVpeisZoM2bx6tfAIoBq6i760rEXCGNe0agbuR0nedtfSZXb/lHXmxBMInGveZGBPXdwPeSGsxMBz0WVMhdhg2B275AOpoKExGpXwWbPKUkj/04wZaXbL59u+uFXL977i4fNA== X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 20:Vmh7jyWOCwZ2juiuSddPTSBEukAgfd9XiszhCHR1nhkCf+ai38rEcWw2EgQZ7DEHuUaxRENDuQBSeTy5+rqPVU2tRO70H67obpggqJy0k+aNleZZ8GLIfNGj6zUopIzs35pwua8tRf6pMbo6AwZv9SAvnJSDC6nNGRynhaigehn4x1zVybEjJn7Cq5HlJaYSDO2zVzfWSH7U3bNZiQh97SzB7nz47DeBkTrwT+dcr1E7Tyqt5UXXk+6G7XgNhn8L2ePFpmA5vx3jFQq0HlJP5eyT1QyF6HL2wbsgmASzhlQLP7tQ58QgOWR1Nrk6aGz6cFlVWYirQ5lI3pjDGIe3jxCHexwXoc8NKaMHEP/tI3jtqKcO/xeqX/QC4Rr4s52/Q35hQUhgLHzxnELj5ZJ82IyI/bfN7vlSbgCZNiJDvHeBNvS98eg4xEas38L5K3aWReb7PoIThie1kkwq3RkSLpLYr7HsgYbUi6KeBo2I4mjX5oWzW09pmlTr78GgI5/E X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(112903893386949); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(13015025)(13018025)(13017025)(13024025)(13023025)(93006095)(93004095)(10201501046)(3002001)(6041248)(20161123555025)(20161123562025)(20161123564025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(20161123560025)(6072148); SRVR:SN1PR0501MB2047; BCL:0; PCL:0; RULEID:; SRVR:SN1PR0501MB2047; X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 4:DdInQOExtNcSzi5XEWJj4IA7xrLNRBbStHVn+bmeR/xnDgoKhyKR9vGJd9atkWx6wPEUxv/+hrnZquWZZdpCq2QqeC2Ys1w8i5jcvzIdSPntRcE1WqJY5NP2sdFo6LXidSXwLSlMLunWC1BbBCiiFmx6S8IOPR4nfbRXe9VKGxu32GK86IJIg9FysE4bRdE84tdeTbpZYzBCUhxNLAX8rTqamyPCV3OVOuOFasJB+QvRTqR/Wh3ZIPV2mkKNMnZ+KsYYMq8G0n1lW+gfHOMQAM6GZl0gZlyihb1U9nq0txB6HcFvfllMpJVDZIMD0QxhaEkRhENXWdh+nqgtQo2QyXL2SCw3OWrMg3jfTofGEIs6/BFSueZ0YXHinKO3woRGWljrTXDWBAGAp33rjGYGJIFFRu9SR0Xfa/XCBhcoMA6ZniHLqkkMG7J7g+SQq0/aK5p6dIu2GqP1ybk1eLLciCIFo0pvq37rcGz/RPuMhQj89ajRsm2wqyY8392d3vDvUbFz74mOb99Z4PEWcE4ZYPgGAWa1wbunRv8Da0BksL/T/0c/qfwK2+eOc0CeXuAWYKhRj4GcvR3GMm7NPfPhBX0hJnR4DL3ChuQIptDHnhhea7E/TRwXFqO56bjjafKeKxByPGlw09x4XvXvrYfzcAgbaXTgMIkmpe2Plc+GTMOhySEfV6X6I6Tz5cEOA3+RWfCRBO/0ba93wOLR+X3jva81saL4IYJRlJtz81qah5cqC1WS162s9CA4Fq5h0Bo8o4/XfXk9y3gpkSemPuZHOgFok1lS9Ey713NfbkqoCt2yRhZNYytBKTv2hCALNtPW2Ul7kUwJeKWEzqaS25KWz6TDwdhI+SaM5dm+gRxvBY/aowrtF3UCFgYQE+InEyjupVj07L4Ct5iCAkWtEYwMMw== X-Forefront-PRVS: 02788FF38E X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; SN1PR0501MB2047; 23:UC9ovcyIJXvjr6I5nI7JsV3hGKCpoYbB0Rvs1K4?= =?us-ascii?Q?vpgn6NdKg5Vug2Gk5V3sUJWUTbmpT7A2EMOpAuyx3eq/zBbosftFPpO/p12l?= =?us-ascii?Q?M5+zoHmkLG2ScAYXgOlqOpbHiPx32ahcheaOoQhbhIXjGkBevcx1xLnlcDVK?= =?us-ascii?Q?HBpfdPSi9t4HhCNjAfIcaBObzJHUjVbCm91SzOXbgn/bfVbKjKJN0fFEyp5P?= =?us-ascii?Q?PK9GZM76Qp/KqFhDuDjcre17zRqbcoTl9bZmLpx/tt5qC6H0fgv+ezOhHLrt?= =?us-ascii?Q?FfaPC4HUH0M0iaW+4cL6bi/enWfxZzV6xhGhihAlne8eaW4o4DtPy0VDPoVz?= =?us-ascii?Q?9Q/J0DeSHAb46PoabRDrt/mUsFcWwTpYVnxWgOF+tqc/ieLwiqfy7UCfr40H?= =?us-ascii?Q?ac9Viw72rJ/gFwE+0glIGBDoXjaPwUQ/T21BbZ8NrpgVBhsC6hJDqePMqPpQ?= =?us-ascii?Q?hd43sCXddwRidp+sW2/JJAlUYCkraFxSevYg2GJvg4RLg+Ju/6j8LYq0LB43?= =?us-ascii?Q?/XSZs6gFkSe3/rT9Mm7QWM3wtwsXSQpozYGWYLDbtzYpVck2HBv2vnLTWQuH?= =?us-ascii?Q?tB+oBbHI8e+AGQq/fQ3VZaluZ7FbclAqj+tyPyAsN10HPOsM1P4lEIr7UF9C?= =?us-ascii?Q?+HqBmlxCYqDCVfGoxcnqsqWWtp6a9I6Hc91EJus+zyd9XEpxYmpgE/oHkIUt?= =?us-ascii?Q?Hk0K2mi0mFcS4PVHlLiwcPRMwi5H4u1HcDWtilTrvllzguRHPRMlY5/d+xxH?= =?us-ascii?Q?rZf5pROC6y03FnbEZgpYOmTxiT2vq+R5vMlP3rhLqaA5ZHnn+KF99cnKLOVb?= =?us-ascii?Q?Jm4uTirGjTIsjA5o8kA+EZkI/hIB4/pyjRe1rLohwAWeMINn2eAGX3/TMgK3?= =?us-ascii?Q?xgd9X21mOOBOb6au8IiVw26743r/D/4ZkmiwKQ21jGhiCyLvVLGg7jFtGYn0?= =?us-ascii?Q?KwqlF+nAPhmk0tbiZqupuq1XYn9ztNONS1T/tNkq5xlrUCZ2H6Q0ZDasstGo?= =?us-ascii?Q?1JNwVpxJZdNCsDs3k6e8Ozu7pNAyimg2OMnAVYwLiwgqv0Vv/XYh1IpxUGE9?= =?us-ascii?Q?8jozkPEE6Gm0dr/hQpbQ+ESIpena3h1T3WlQ9mwirLhcm/rjw3r7rpbV56/K?= =?us-ascii?Q?VS7mDR1AvYMvQD5TYPFJuWOE8wJHVagmqvIBY+Yx/hioqdjlaIChC83Kbido?= =?us-ascii?Q?pEkxiS+LUzhr/kTYTjZBgLg61wtLKKyVt80Y4ZDx991wOfRJZh9ccRTi0oUB?= =?us-ascii?Q?YsQbZ03VDS5z5WS1iRZNsXrvoCfNzSaR6cuQpadsMLkNvdsl9gLZhE0qfgFG?= =?us-ascii?Q?2wpNuWRP/Vhy37WLa6EcRIl2RCkKEJlha39x2BJi6wrgJiGehLLOVVx3RxKT?= =?us-ascii?Q?7I0oPpknBpEsq/AQfX+K4NGJXooRoMjDbvC2hWr47V7P/aVoFxP05ClnZ+7i?= =?us-ascii?Q?YJ0k7Vp6egjF+R0tmLCR/3FlkZ8EWjVB9spm7ag/tK1DpaqkorKH6y2mnVJE?= =?us-ascii?Q?E4sWtcK25ZmD2VQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 6:lzOkE2WXF8DKMImREkfDD/q+NlPPDwPRW6CG0Nyk3O2KDbKcwVDnUHG3LHHdODGi7inzbw/HAffg7+ZtrPp/bzU8cFGzrXZO5h9hEOk0htx3h1FCBBwtGqwi+EFNPqFt+nkXGefz/GFMGAkr9/w2fgJJKZhAuBhZcqTW2/OtxwHM8t3NPEP6bhdL5wEkuFJpxe3G8Iag4pLJRGxzpuePWjjdsC/W2iAUOta7w4/BAIm1QkzHafp1xGKpn0D1mIVw0ulNF7mA05VImnfcxwCdEppcjRJohoJio/+hX+yr6zQ/17kbyjd7tSOVn3O3ftJSrwcszoDXQaR6BZcXKCk2i90IUbon3ISPkI3XiFW7YFkPZ0o6huSNujfFt98Z/wzMW8ruz0dnCfMEEHjixvIHWvrxemPVqwReDcqUvZ8awqVrFhDxjGx3XX0AhkanlIsICQMA+M10Gm4tLVop1dBkWw==; 5:d15GdEgKnIYNwfRovDpDwJU4Ybc6gpUWdbZUPiVdYz4uM2Nxhg6kttSUAvYlptfFvEegMsgQj+NlHktyl01jl5h4moy9JefDoprsH0wt+ko0DuqXIHDYKgK1xhjTCuibT9+oVAQqs3Z3Coog/NdgqA==; 24:jFsOI2BKBwQZjoBexuQzVS7A+8szfokucioydoK4cmDM2Z4SajPKfgO4HpiHjpe8MRaxCCPf0Q6P7QGzIR2Ui9P21kIA0sioGnxwYdDa1xg= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; SN1PR0501MB2047; 7:VPdwnbhDwdQ69yHrTEv3nsp17lR3YQvollZmdobN5fAOoJFargCGFLCEYyKcY1Pml66RwoZjDp2dWOEZWoVuUkp/VvK9Loqn8UXHEiRJRaxFQw5i0amjLIkpfrni4A+fn8/MVSTVIMqtAXwW0Htk2RkE8I/3cOXyJmQRk6BUti8t46ynLenwJuWbl85dqsaa6dDTTUhk+qsFmW9oREpT++g+p5jGegvRTs+Aqz9nkGheG6F4WLH8RoupmfTAGNDjWXeZN75QzPO0wPsEhldPDvTLFJ2DQL2UJoLrxlKGiAwbFDW1KdDCbDZJzP38gAuga+ItTiZu9NCnzfSLB5JNOw==; 20:gFPByWaBZNDnkURmzzCymQsz6U7YQRa5i1PgOV7GtEbsB4TIiTzbP5iR5pgVOwfHh+ZCHLxfI5blTcmNU0mL+Mgz0YFZ2d9zoBJIUvx3o6k2mtPCN6tRts9nhZPeT7eML7SAaA57hH3Hha0obesdIXK7ANEyOIDkWYCrk1/HF/0= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2017 06:03:08.9412 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp2.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR0501MB2047 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Apr 2017 06:03:13 -0000 On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans wrote: > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans wrote: > >> >> On the other hand, I think I could fairly easily implement most of these >> into libc/regex. Here's a summary of what this option entails adding to >> libc/regex, from what I've found: >> >> * Empty subexpressions(*) >> * Add missing quantifiers to BREs: \?, \+ >> * Add branching to BREs: \| >> * Add backreferences (\1 through \9) to EREs >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], >> [[:space:]], and [^[:space:]] respectively >> * Add word boundaries and anchors: >> ** \b: word boundary >> ** \B: not word boundary >> ** \<: Strt of word >> ** \>: End of word >> ** \`: Start of subject string >> ** \': End of subject string >> >> (*) I didn't actually find anything explicitly stating this as a GNU >> extension, but it's certainly not conformant to POSIX specifications to >> use, it gets used a tiny bit in some ports, and we implement a workaround >> in bsdgrep(1) for the simplest case of empty expressions ("") to match >> everything and produce zero length matches. >> >> The main benefit of this is not having to maintain a completely separate >> regex parser and the potential for inconsistencies that come along with it. >> The downside is that that would seem to promote expressions that are not >> strictly POSIX conformant. Is this a problem? Is this a problem worth >> worrying about? >> >> > FYI- A patch showing what the implementation for all of the above into > libc/regex looks like [1]. Some cleanup is still in order and the test set > is not exhaustive, but this should implement all of the GNU extensions and > it's at least functional. > > It will break some things (like one of the tests, for instance) that > relied on being able to escape an ordinary character (e.g. \b) and get an > ordinary character. This is specified as producing undefined behavior [2], > though, so I don't feel terrible about breaking it. > > If this seems desirable, I can work on cleaning it up and splitting it > into more consumable bites for FreeBSD's libc. > > Thanks, > > Kyle Evans > > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/ > xbd_chap09.html#tag_09_03_03 > An amended version of this patch can be found here: https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU extension for a more POSIX conformant implementation along with an amendment to regex.3 to document said flag. Instead of removing the tests that don't fail like they should under GNU extensions, I've restored them and added a 'P' flag to specify REG_POSIX and marked the failing tests as such to clearly denote that they require a more strict implementation. Thanks, Kyle Evans From owner-freebsd-hackers@freebsd.org Sat Apr 15 16:18:10 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47CE4D3FF79 for ; Sat, 15 Apr 2017 16:18:10 +0000 (UTC) (envelope-from bapt@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 26FF8E2F; Sat, 15 Apr 2017 16:18:10 +0000 (UTC) (envelope-from bapt@FreeBSD.org) Received: by freefall.freebsd.org (Postfix, from userid 1235) id 4EE3E732D; Sat, 15 Apr 2017 16:18:09 +0000 (UTC) Date: Sat, 15 Apr 2017 18:18:08 +0200 From: Baptiste Daroussin To: Kyle Evans Cc: freebsd-hackers@freebsd.org, Pedro Giffuni , Ed Maste Subject: Re: Replacing libgnuregex Message-ID: <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zty5vwucofg7xgsw" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170306 (1.8.0) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Apr 2017 16:18:10 -0000 --zty5vwucofg7xgsw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote: > On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans wrote: >=20 > > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans wrote: > > > >> > >> On the other hand, I think I could fairly easily implement most of the= se > >> into libc/regex. Here's a summary of what this option entails adding to > >> libc/regex, from what I've found: > >> > >> * Empty subexpressions(*) > >> * Add missing quantifiers to BREs: \?, \+ > >> * Add branching to BREs: \| > >> * Add backreferences (\1 through \9) to EREs > >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], > >> [[:space:]], and [^[:space:]] respectively > >> * Add word boundaries and anchors: > >> ** \b: word boundary > >> ** \B: not word boundary > >> ** \<: Strt of word > >> ** \>: End of word > >> ** \`: Start of subject string > >> ** \': End of subject string > >> > >> (*) I didn't actually find anything explicitly stating this as a GNU > >> extension, but it's certainly not conformant to POSIX specifications to > >> use, it gets used a tiny bit in some ports, and we implement a workaro= und > >> in bsdgrep(1) for the simplest case of empty expressions ("") to match > >> everything and produce zero length matches. > >> > >> The main benefit of this is not having to maintain a completely separa= te > >> regex parser and the potential for inconsistencies that come along wit= h it. > >> The downside is that that would seem to promote expressions that are n= ot > >> strictly POSIX conformant. Is this a problem? Is this a problem worth > >> worrying about? > >> > >> > > FYI- A patch showing what the implementation for all of the above into > > libc/regex looks like [1]. Some cleanup is still in order and the test = set > > is not exhaustive, but this should implement all of the GNU extensions = and > > it's at least functional. > > > > It will break some things (like one of the tests, for instance) that > > relied on being able to escape an ordinary character (e.g. \b) and get = an > > ordinary character. This is specified as producing undefined behavior [= 2], > > though, so I don't feel terrible about breaking it. > > > > If this seems desirable, I can work on cleaning it up and splitting it > > into more consumable bites for FreeBSD's libc. > > > > Thanks, > > > > Kyle Evans > > > > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff > > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/ > > xbd_chap09.html#tag_09_03_03 > > >=20 > An amended version of this patch can be found here: > https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff >=20 > This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU > extension for a more POSIX conformant implementation along with an > amendment to regex.3 to document said flag. >=20 > Instead of removing the tests that don't fail like they should under GNU > extensions, I've restored them and added a 'P' flag to specify REG_POSIX > and marked the failing tests as such to clearly denote that they require a > more strict implementation. >=20 > Thanks, >=20 Thanks for working on this Just to follow up on this: Have you tested the results with the AT&T testsuite for regex? You can find it at least in the dragonfly source tree: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abce74f49c2c19b069= 958a0b48de0a9987d14e35 Or online I don't remember where :) another approach would be to import libtre + extension in our libc (like it= was done on dragonfly - it was actually a freebsd project that stalled) Best regards, Bapt --zty5vwucofg7xgsw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgOTj3suS2urGXVU3Y4mL3PG3PloFAljyR74ACgkQY4mL3PG3 Plp6JA/9HEeUfT4DYLJ9OcHaPwi/5tf54S9iOZD8waD7MtDdydtK9Hghn93rDN6q 4Cxkm1ab0qXnYfFCJqwg2o5jHvmP5RG1a1EkW4OGe0/QUluvVM2bitr7v5BC1IhI Ngrd3xZebLA6ce5KloSnuFxUWrT46CYlcKPWCwCOsXoP+tCRmEYdy5+fnVHACwlO PJtR9xGysEJmow+ZWWL6FByHfui/5Wz5hlztD5T72f8/Y4xYpHQ+HisRrTmRm8TA sxNMHkmffXmuq9wJZY+Pz10ucGkQzS2LjWYfKzN7UcHhqfpLS3GA0II1wqF9rowa RxdDTOl1SsGh5DxEkqP/hepuX5TItLL95G6N7zBmB2m+6qcWVGTINKw1CMT8wVng GeGQElR/lM3qlE8C+jj0uq0RLm33d+7weQle4oiPUScKPf6/CGwDuntHkiU8oe2+ yn8LdBNHjuXQcPkmVz34IWEnAo45ZCTuyK8ebJifjPjZEn3cSVS1TG3HARdF3QKJ e/2pWrwXaA7KXXeW5wA3HamJlcBCIbQ6DKwrKEyJUfavsjp4qmJ/sbE3ok7cM9qY oGLTJsI7YI1KdDneFiL32zzDmPv0uMj8pLTLwzvmVvzKiWw13yBweA96YEbx1+pf TPLUOLeYZhaDG9kkyZCVW9ZtSRzfupKfpC49yhsS9TQg65vdAWk= =oRT7 -----END PGP SIGNATURE----- --zty5vwucofg7xgsw-- From owner-freebsd-hackers@freebsd.org Sat Apr 15 16:31:31 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07468D3F517 for ; Sat, 15 Apr 2017 16:31:31 +0000 (UTC) (envelope-from kevans91@ksu.edu) Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-bl2nam02on0089.outbound.protection.outlook.com [104.47.38.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 66F0BCED; Sat, 15 Apr 2017 16:31:29 +0000 (UTC) (envelope-from kevans91@ksu.edu) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ksu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=2m1amc3RjQSvfwWW8cxJ5VTZsOVAJtVPBWklaQ+hxA4=; b=RRfRvxkskL1Ns+v2YDKj5W5ZQ3Sk/BF8YFNexmFn7Mi1HteSNO8iQ6w6RtSUG9Mlz3gY5dqXYIPgUJwUf/aJ+tcKWZdxY/gmW94Hi6qSxZJM9F1V4IW0cIlJpQyWGTKRUz3nriUw18LbMDU7C87oTdN94A4dBhyQhQUc1OI+J7M= Received: from DM2PR0501CA0031.namprd05.prod.outlook.com (10.162.29.169) by BY2PR0501MB2038.namprd05.prod.outlook.com (10.163.197.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6; Sat, 15 Apr 2017 16:31:27 +0000 Received: from BL2NAM02FT062.eop-nam02.prod.protection.outlook.com (2a01:111:f400:7e46::205) by DM2PR0501CA0031.outlook.office365.com (2a01:111:e400:5148::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.6 via Frontend Transport; Sat, 15 Apr 2017 16:31:27 +0000 Authentication-Results: spf=pass (sender IP is 129.130.18.151) smtp.mailfrom=ksu.edu; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=bestguesspass action=none header.from=ksu.edu; Received-SPF: Pass (protection.outlook.com: domain of ksu.edu designates 129.130.18.151 as permitted sender) receiver=protection.outlook.com; client-ip=129.130.18.151; helo=ome-vm-smtp1.campus.ksu.edu; Received: from ome-vm-smtp1.campus.ksu.edu (129.130.18.151) by BL2NAM02FT062.mail.protection.outlook.com (10.152.77.57) with Microsoft SMTP Server id 15.1.1019.14 via Frontend Transport; Sat, 15 Apr 2017 16:31:26 +0000 Received: from calypso.engg.ksu.edu (calypso.engg.ksu.edu [129.130.43.181]) by ome-vm-smtp1.campus.ksu.edu (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id v3FGVQTs018767; Sat, 15 Apr 2017 11:31:26 -0500 Received: by calypso.engg.ksu.edu (Postfix, from userid 110) id 28ADC248304; Sat, 15 Apr 2017 11:31:26 -0500 (CDT) Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by calypso.engg.ksu.edu (Postfix) with ESMTPA id CBE6F248302; Sat, 15 Apr 2017 11:31:23 -0500 (CDT) Received: by mail-wm0-f45.google.com with SMTP id y18so4221895wmh.0; Sat, 15 Apr 2017 09:31:23 -0700 (PDT) X-Gm-Message-State: AN3rC/6FUVbkyNl+Z4OJmc3q8Z3survvtGnYDU+5ouUdSoQewhWi4xyR 0H1HXbZgfKRmVPn3H6dNG1y3R0eTLQ== X-Received: by 10.28.181.69 with SMTP id e66mr2909977wmf.33.1492273882105; Sat, 15 Apr 2017 09:31:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.39.134 with HTTP; Sat, 15 Apr 2017 09:31:01 -0700 (PDT) In-Reply-To: <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net> References: <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net> From: Kyle Evans Date: Sat, 15 Apr 2017 11:31:01 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replacing libgnuregex To: Baptiste Daroussin CC: Ed Maste , Pedro Giffuni , X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:129.130.18.151; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(979002)(39410400002)(39400400002)(39850400002)(39860400002)(39840400002)(39450400003)(2980300002)(438002)(24454002)(199003)(377454003)(189002)(46386002)(38730400002)(93886004)(305945005)(110136004)(189998001)(50986999)(6306002)(54356999)(76176999)(93516999)(2906002)(86362001)(8676002)(63696999)(6246003)(221733001)(575784001)(8936002)(606005)(356003)(9686003)(3480700004)(7906003)(45336002)(8576002)(498394004)(84326002)(229853002)(90966002)(54906002)(75432002)(55446002)(6916009)(61726006)(7116003)(5660300001)(2950100002)(512874002)(54206008)(4326008)(9896002)(42186005)(61266001)(450100002)(236005)(106466001)(88552002)(55456009)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:BY2PR0501MB2038; H:ome-vm-smtp1.campus.ksu.edu; FPR:; SPF:Pass; MLV:ovrnspm; A:1; MX:1; PTR:ip-18-151.net.ksu.edu; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BL2NAM02FT062; 1:DtqVLGnY0/Yqx8xtE8hrAmChvf+KYobtd7xGSAhFlenP4FKbJuzWowm/voXH/hjo8lNqRDZ2KQcZbsM45jRkwii9Y/fZyWCdt6GEbH3JZGVmgQbUzYC+tiGHT5mEI53sveEMBv1agMqs0qWriQhmVnKrCSMVrKzXuUATpb6j6KGCcr5mOMVgboJCyavbh8oaIlF3WWugKeRr+MkR09ipR/iVjBfPyZ+ZnW4xg6UPpT8D+ggehoHoZKUqvg4StzWtlIVJpo2UsPE5DD/tfNRAtfbztCiURfVXyf9o5T8O4jkTi6C2VtqzaIwG6tkC1yujCpux++IIKB8ZBdUmoguxwo/BGAzti9G+Ndh50ib+SXD90kyzrrEc0YRU8jRjhat12o3YF4eWYDajWKsG4iXDQXe56O431vkzVjxByk8j5HsB8Aub4zNPvmib3tuyhqNQmA74/DrLM/Y3XMxGCeKYM3qWzSJZjITpOiA51MQjS1r6EyWg0Rr1gFA6mpkJp+sfpPT3/zq4JsEp3AbJRVv2ucCkkuJFAOAOslW46SDdTMk= X-MS-Office365-Filtering-Correlation-Id: 3e703db6-f974-4d0f-bf6a-08d4841cdcfc X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081); SRVR:BY2PR0501MB2038; X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 3:dOnI9JIWirZn/Bn/1YAgMPkQo90j2sSXlHix64CrNxyjQVP83hjH6dBjCSq+W8cpWmUKEKt04eKEPkdIXWQyHJFCaenJjvTdfmEtQ1QRZ4HiO93zkfifSyF11Zer/xg6sNAcsZEGnPzj1K/93tGH88aoZEZbKB+1rvsNtrYNII4fffy9kvj/RuwuRDO2D6bSUM6I+8HP7qG2dqYx2BZ4wnaH7cBAxa0BjQmgavbF1FDLNdANqiggzWR/5HIxwUb+Z00gaHZEcTIOV8u+6urEjlCGtdh9mo+2w2wn00q3ONa/rGluBzWgyBRBqsfZNCcLNBCaq+RCsuwiBKhMlT8nKL3JmPz5rJakg8mrSdapCQQt5HnN+WDmgJHIdHJmPJYQHHjdMYkse9zpkMGL47lk6o9u+7UBYRPDNfocMou1xcbaUUz/T4m6iT5qIu6sZAwCxAEMfjeVQchiF2ca1GunwK4zUnbhTzL3sAUVAB4bKfv1GCM6GWRrtRLml/4DjSpG X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 25:hhYzV1BVaYTOmUa0uNd5fJXVF3eUJhoMeEykj11sm8WO8OktWuhee/+1dVOX145wQkGhgq4ImNqWbsXfi2Lc0TbraVGQ3VV1YsgLc8JFfa2q1ZWGmWG2R90ey0WbBG9CCTrEwBOk3V293pnrjY6o9DtdItfFFyvXDo0AQVEWkQiPbcMZ2Mtt74rwt1KFfuZ6LVCYCn2GsXTqMUTYmmNVxmJJLNsr+McKTESFXQMwilxDEv+5oHiHdUego1e/JttsjFKoSUSCAtKv90N9pZRFijcpmA+IxsHCBmx0mZDKg3ZF9oJXCIfz28X1qKuLHj4FX+y+6JFnwfWie6OtDGcVwGEWIfoSaCInJaq/lkPk76GeytjM1Ii5s+qjJk1xCtQbgLf8G4WdF16Csnl2rh4c8KUaqezZ0BrZ3E/fENbh7kpOhRTETDMg3THjv4KVVzyXBUwMjb4J+26VRTatTbkT8cjUYs/MfcAUf3BLGFj5ZDM=; 31:vMTF6qtWCdKOBaCa2CcF4NTQx7r5TVHcQoGFH1Nw5M3QkNJKiaNPIX7bg0jCBU4KE3FbIEpeoqd3oz9R73RZQAufTO8F/2RwWp5JWVXMlNyb2xG5MZOCUJn600rfm48X4lkmyMVtU3g1LqUvWmLuBaZIRVSBJpDGwN/aSCUkkTX2MOsNSB0jHWbXBRyyGpBZw9n1IWUezR3h6x/ry+m/EGI1r4nG+C+drsE9IrrQ7I7IB38khggHSf4C6XKnfrXHzU6ntrm+agWOvzjpl84tBDq0yRxZSiG99H51Oz82T4U= X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 20:rNiGXxop/L/x3DiMe0RS/xz+MgktyT2U2wTvwgbpLHdCj1Hrzo6SD20rif0E4tVowqGLtVylU2/LSEitr+gKe+sG0Tamqo3PtnQaVby5X4v5ULGNwjeZgI8Lh/niL5nQNfsxp9mQeFQSMK5TpzUPhurzDef58WO0bZGA+ZYCYL7dwI1WMq/vvbfcKATN3nppxx5mXEg6MZ89vp+vYByTyGkcfTyktIr3J4x0Zlc7I4y3+gtS8ghj7z7tCiFNamNEoUn2lysYE0J1bU85PFdqDbykNSWvpNOjkvYsqAv7brDdL5Vlah9SeuDPzARNteN1Qtlhw3OUKMkpr11ImLkt/RP3aNmVH01rulCbRwj37xx4CIkAyThHK8KLjpfIIQlTvn585LoN1DL6GsI9Wr1mFFyvuUWWk9Ax5n01ziskV2AnTqqAUkohKcE/Zdf/847XmbxFb6xJekDKA2MMEPisV4N24Tlt32hEv4XTHXrWUSStGH+4OxHNdqMuW+ONoJ5Q X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(209352067349851); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(13017025)(5005006)(8121501046)(13015025)(13023025)(13024025)(13018025)(3002001)(10201501046)(93006095)(93004095)(6041248)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(20161123562025)(20161123560025)(20161123555025)(20161123564025)(6072148); SRVR:BY2PR0501MB2038; BCL:0; PCL:0; RULEID:; SRVR:BY2PR0501MB2038; X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 4:sBupbUwphn14r2IaUqE/Z6TkfncsAymhs21GEfzuyLANSNph1kJdaMQUtgHFAA95mKKIDYUuSXa24SqYHEqT2lBe8SXk6PVLfjluQT0CATOMygCP59mqgybRuJh/ROM5kfJVSQT88BJiFD2MzAKd/JJAm7jmGBJxUl3j7PST2PvmLJddwS8W8NUIL3b1s7vOJEf0Nsd7oFIq66z6MafRCKPxDuoTRjyKT89y3Ht8VvS9N/KhsmVm1/hiSP4qujekNcnKo4pP4oUDM3jBM8FqL64sCfWghVuGtutRBp5Dly2crEUgzGu2ZkaQD/r07aU4GifRBIHjlMzupV0h61ca7P+4uKpyNshPYOaB9j7AoVf349nhXYy8CWeasxANg3reJWazyr2wu2bkTRJCUuljD6fq0w5JFYp4F0Bj61KbCFtW5aFLkN18b4TfsWrDrNAHf1gjx+fFMVmDNv924HH6rBH8sVnKd0aMrlEvL2OZus12Cr07Rme48SiPxJc4YRRGUFKgH645ZLzVzqiPBNPKgqLe+LSrdSFTvB7HO05E8webVdmXacPMVW5/4NyF2LDSIX9rHIZ7yPYr1Ydccu0XcRCHd1xF7tUCymLY1cPeQHyaAdDxCR5ojuJkht3NPzmI6kr+0DO0aAoZuXkS3fofu6pY3nU2039i/cM3hjqUgEv6eqxSCwhrDbxkZ5FdL4iXdMKjDAjX0zOh2M8yLoWiv0RE7EEWqYDJ1V41QInyPH4ITqWWvLm0ZjDZuZwABxiBVasZnHX1HQshiSq1QKNOILs/IFnjghDPlsurMQh4CdjpVIAN/FQ0CjKUAyq6vgD4uk6CcyW5y+Rs8ZeMGmC7rr4vzRB9AFo8oo5bM1EaFt6Zti/xh0P8yM48h9s05+rcoj5yYKyyao/ctd+sDwnCmQ== X-Forefront-PRVS: 02788FF38E X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BY2PR0501MB2038; 23:MkoEPUBkGChn7CBNfpB7Mbvlws0TH4X124l6/A/?= =?us-ascii?Q?9llQMiV8nHnqKMwo+frV1gJCepnJ02wiJmb44LjbTOZ/0eXOSdATlRxz0POi?= =?us-ascii?Q?oItIhQOTd9sb0uS48g47youXCK0eWcymPjO1ziVClb50HoqcyirF+MRkl+JF?= =?us-ascii?Q?uyyCR6h3ipyaP0jjnXryQTAyuIhJMJ6FvQNLzcG8EekWDpDTDRDRVsWp4zIT?= =?us-ascii?Q?iAlL4pWoOLlXC8hOC/CteabX8XySzhx8qcj4070rw+oWDV+m04MJft+7hGob?= =?us-ascii?Q?0ZqwVucS9f+Fgk+O7PNdbFBnK1s8wrbhuIDbnfwLxQnkk6sYNPmW8whZPty0?= =?us-ascii?Q?PE34sC6ZKl+7Z+cqN3o5aULBo7NTtGH6wk5Ux6rztegMVFZpNTjS1FWDFlj9?= =?us-ascii?Q?lzxAuFOmGo5D+LEFvRVCf6to+8iPuEr4dd27lkJt7eTJipd+dnOl/TaYJby1?= =?us-ascii?Q?yabMhnEqQ++YpuSr6Ija5irKzkza2V+NwkIVTcIb2GgMxeD/wjpEQZjqHiYO?= =?us-ascii?Q?kE5Zjt11TsfpcDzOobV2zWB7F+Qffnf39WnHgm3ZoegXEcxsh9h+LoAXI4Ql?= =?us-ascii?Q?vQl5HtYCeCtyRjNsxd073JF96J9RBDpGkVswmCYYzD7FBmlqM8iLam5Aaa3g?= =?us-ascii?Q?Bx2v2Cb2XI36L9dW0r/rbm+HOJjQ6HAFdNxTt6r6KyWJpzENWfyvDNwtdGVS?= =?us-ascii?Q?iZUAc8ejoavkbmlOaXWlHtnip27AQfT/Z7xfekJIpFxYPnAwR7egyaDqKXWu?= =?us-ascii?Q?8w81xmlN9EoEyiUJ9/tbVs8JccYS9PpBkmYzqxQbnHU/YpiW3zHUcu8a9RWS?= =?us-ascii?Q?g1Tnmh0xD6PkM1bslnb+7PG9JqDocSmimgPVoQ/BHG35JN1UnxsoUc7jvamJ?= =?us-ascii?Q?WMkUEJkj7usfjkyqLi3pgypZOLfP0aVyPPJTWTm1eApkCWRiZ+S0Jcl+BXTM?= =?us-ascii?Q?krSwaU7tcwKn8nSYny+KG9BZdluYAfWGOhXzqa1UymSlABrkYbuCLlsXDcmw?= =?us-ascii?Q?i0/WWCgQgCFmzPzcjdcjfH5teHqvrYUGY0HkJPpAamj621V7jFNgUD3FlXFY?= =?us-ascii?Q?BF4edo5zw7FVrjDflMIIFq2sFSrTbsDRMRACtHWs1zBeSsb9wyFaJCWzndZK?= =?us-ascii?Q?IRW+h8tXQjEzFPiREUMqEqw433vnhCUqFj88OV6g7JUEokgwajdSijfLR6yS?= =?us-ascii?Q?cd0ngWo59wq6Y2Uh4cVgzDCME9gpr2LmT+NDcrPMkosRtyYO8ptCcDD0ppMr?= =?us-ascii?Q?01CfTokVdrhI1jENkR5cORPxOk+XzyIb7RlXtpAkrlybhM63qh+6r6Hvcsir?= =?us-ascii?Q?/NvYB72AL03w4KwSFDJZmBOA1LDQ/DktswDXVQRHF35B9+NFtbK1h+Z3nbjN?= =?us-ascii?Q?daONWCGWBg9rWDXGfCiHjkbeEdCYvDwRzodIJ7/zANDAzdnh7+mNuKRnkTaB?= =?us-ascii?Q?+HYkvR0lADfQS5P0wOtmURTtZEqZHVM7DW8JWjS2GtlenW0FvBhNwFz2JE5J?= =?us-ascii?Q?V+zyg42+qAmbqPqtT9UdDiLl0xsNjMPvHYM27WuIBa5YZy+OtgshjQgSRtdw?= =?us-ascii?Q?RAyXH23DqoOY2CTafng=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 6:Wzt3fMOdoICUebW4Th3UDZL7eIOnjwjLPHCCcLWYlgZEX0/736ZM804tR3f3V75JmdEgWYT3gdW2rrSJbHupVuJ5Aqez3sn2dKh40RqGVaxjjaw1gSFhlzi2ZQnAYBbsCWaqhaURed/1uG488/Z8b35pYl2oeHNJtsb9EQ5iX7xe5MdcIn0genvO+RIrkwBKCp6Wr3WkdI1HqNSr+1X0OS777RA51y4U+0n5O+xSJE+KZ7/A/pqp6ouCKLBIe7KeyXovQ9Mhz8pB8xWKdzhm1388aIp5DEj7YfuVee+MX4BC7rJkqf3CWCh6qRmNe9ExCl6dSVGKJcYxRWFEeskMegn+wkWsNQcP7fEea4QIh9lvpXVW04Ej6KVbIuR3cMlyn2G0e+eaH1R3Tl6eKA4JiiklsEoDRVaMETcDohgLYL0yytdYD7z7pjpG5qm9zZEdVR5Pkdmh0LeNZuICamWyNA==; 5:3H8JS0uLrzdx1oA20y7c4ku5z3KVgbmDh82oBUrCaOZmykd7m02m6A0ZmVLGVxxr6eu0Hb3vBS6zQLB7JGfZfRJTikXe6+7tkjC7yWXVk/XBlxZj+OxRXPxwMTc7Mt0dcYvjNpAGofwH0riQeCjveg==; 24:eeFMjVSV6wM9xlREVyMAE/zlyASxqDVDeVxOnZDc1fcfC+rhU6Lt4qLKkyXndYqEwDzHVSRb+9S//8hckd++CYytZQ/zJtFdkpsajV1zMyE= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BY2PR0501MB2038; 7:5zQznRQDW31m4V8DREpWODsMNMKOwQRv8OG9hx7Vy/pAmUajIOXKj1ySZLNCBabur3yPrFHoDe80FvCygnQUcj2zhq7eaH/xcrMhoNvWac/xJT9YlCrKiIfghau/yODfpHA8mnusG4jsD57Q7rXijq0X49o5VX0CgVIcOUJ3b89YP5J5G9PoQbBV5Z41P6E0inxztRBnz9A0DdoMoy/czVzsAh2rQv3cI7TLksWU1BAYSkvTX4UmtySNLU2xyhdgFNiMEqZB9GgkG3uUa03ZXXUHH3h8nJhI5iJNtaWjgfC7u3LDPmqJdvuOkd0phLnMSVk/SlphtJQbXCM6optQFQ==; 20:BONzSB1g8QvVkG7o/Ans0jM8lA7Nv1ph7h0muVGW9mXS4Jlo42l+he1JzcIv1AoIAlIO7kZWBlj4bgl+1ZYEPJQkIBiFJPbNptOn+IK8wXC52v5pkOJ+Uvn0DEvh1rE4o6eDBYdm4qTGX6rl7n31R+qTOHRsFpbUzHOodzImJNs= X-OriginatorOrg: ksu.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2017 16:31:26.6020 (UTC) X-MS-Exchange-CrossTenant-Id: d9a2fa71-d67d-4cb6-b541-06ccaa8013fb X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=d9a2fa71-d67d-4cb6-b541-06ccaa8013fb; Ip=[129.130.18.151]; Helo=[ome-vm-smtp1.campus.ksu.edu] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR0501MB2038 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Apr 2017 16:31:31 -0000 On Apr 15, 2017 11:18 AM, "Baptiste Daroussin" wrote: On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote: > An amended version of this patch can be found here: > https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff > > This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU > extension for a more POSIX conformant implementation along with an > amendment to regex.3 to document said flag. > > Instead of removing the tests that don't fail like they should under GNU > extensions, I've restored them and added a 'P' flag to specify REG_POSIX > and marked the failing tests as such to clearly denote that they require a > more strict implementation. > > Thanks, > Thanks for working on this Just to follow up on this: Have you tested the results with the AT&T testsuite for regex? You can find it at least in the dragonfly source tree: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abc e74f49c2c19b069958a0b48de0a9987d14e35 Or online I don't remember where :) another approach would be to import libtre + extension in our libc (like it was done on dragonfly - it was actually a freebsd project that stalled) Best regards, Bapt Yup, we also have a copy of the AT&T test suite in tree (contrib/netbsd-tests/lib/libc/regex/data/att). It passed that, the other NetBSD tests, and I also ran the NetBSD sed and the gsed test suites using a script provided by pfg@ to ensure no trivial breakage. Has TRE improved over the years? It seems like we had a version around 2011 or so for bsdgrep that was quite rough. I'm not sure if that was heavily modified or just an early infancy state. I think in either case, we might consider throwing errors for the bogus escape sequences (anything that's not \<, \>, and backrefs for BREs) as an intermediate to stop *that* behavior, because that's going to be problematic for many approaches.