Date: Tue, 14 Mar 2017 00:58:15 -0700 From: Mark Millard <markmi@dsl-only.net> To: freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Cc: Andrew Turner <andrew@fubar.geek.nz> Subject: Re: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) Message-ID: <16B3D614-62E1-4E58-B409-8DB9DBB35BCB@dsl-only.net> In-Reply-To: <A82D1406-DB53-42CE-A41C-D984C9F5A1C9@dsl-only.net> References: <01735A68-FED6-4E63-964F-0820FE5C446C@dsl-only.net> <A82D1406-DB53-42CE-A41C-D984C9F5A1C9@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[Another correction I'm afraid --about alternative program variations this time.] On 2017-Mar-13, at 11:52 PM, Mark Millard <markmi@dsl-only.net> wrote: > I'm still at a loss about how to figure out what stages are messed > up. (Memory coherency? Some memory not swapped out? Bad data swapped > out? Wrong data swapped in?) >=20 > But at least I've found a much smaller/simpler example to demonstrate > some problem with in my Pine64+_ 2GB context. >=20 > The Pine64+ 2GB is the only amd64 context that I have access to. Someday I'll learn to type arm64 the first time instead of amd64. > The following program fails its check for data > having its expected byte pattern in dynamically > allocated memory after a fork/swap-out/swap-in > sequence. >=20 > I'll note that the program sleeps for 60s after > forking to give time to do something else to > cause the parent and child processes to swap > out (RES=3D0 as seen in top). The following about the extra test_check() was wrong. > Note the source code line: >=20 > // test_check(); // Adding this line prevents failure. >=20 > It seem that accessing the region contents before forking > and swapping avoids the problem. But there is a problem > if the region was only written-to before the fork/swap. This was because I'd carelessly moved some loop variables to globals in a way that depended on the initialization of the globals and the extra call changed those values. I've noted code adjustments below (3 lines). I get the failures with them as well. > Another point is the size of the region matters: <=3D 14K Bytes > fails and > 14K Bytes works for as much has I have tested. >=20 >=20 > # more swap_testing.c > // swap_testing.c >=20 > // Built via (c++ was clang++ 4.0 in my case): > // > // cc -g -std=3Dc11 -Wpedantic swap_testing.c > // -O0 and -O2 also gets the problem. >=20 > #include <unistd.h> // for fork(), sleep(.) > #include <sys/types.h> // for pid_t > #include <sys/wait.h> // for wait(.) >=20 > extern void test_setup(void); // Sets up the memory byte pattern. > extern void test_check(void); // Tests the memory byte pattern. >=20 > int main(void) > { > test_setup(); test_check(); // This test passes. >=20 > pid_t pid =3D fork(); > int wait_status =3D 0;; >=20 > if (0<pid) { wait(&wait_status); } >=20 > if (-1!=3Dwait_status && 0<=3Dpid) > { > if (0=3D=3Dpid) > { > sleep(60); >=20 > // During this manually force this process to > // swap out. I use something like: >=20 > // stress -m 1 --vm-bytes 1800M >=20 > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. > } >=20 > test_check(); > } > } >=20 > // The memory and test code follows. >=20 > #include <stdbool.h> // for bool, true, false > #include <stddef.h> // for size_t, NULL > #include <stdlib.h> // for malloc(.), free(.) >=20 > #include <signal.h> // for raise(.), SIGABRT >=20 > #define region_size (14u*1024u) > // Bad dyn_region pattern, parent and child > // processes: > // 256u, 4u*1024u, 8u*1024u, 9u*1024u, > // 12u*1024u, 14u*1024u >=20 > // Works: > // 14u*1024u+1u, 15u*1024u, 16u*1024u, > // 32u*1024u, 256u*1024u*1024u >=20 > typedef volatile unsigned char value_type; >=20 > struct region_struct { value_type array[region_size]; }; > typedef struct region_struct region; >=20 > static region gbl_region; > static region * volatile dyn_region =3D NULL; >=20 > static value_type value(size_t v) { return (value_type)v; } >=20 > void test_setup(void) { > dyn_region =3D malloc(sizeof(region)); > if (!dyn_region) raise(SIGABRT); >=20 > for(size_t i=3D0u; i<region_size; i++) { > (*dyn_region).array[i] =3D gbl_region.array[i] =3D value(i); > } > } >=20 > static volatile bool gbl_failed =3D false; // Until potentially = disproved > static volatile size_t gbl_pos =3D 0u; >=20 > static volatile bool dyn_failed =3D false; // Until potentially = disproved > static volatile size_t dyn_pos =3D 0u; >=20 > void test_check(void) { gbl_pos =3D 0u; > while (!gbl_failed && gbl_pos<region_size) { > gbl_failed =3D (value(gbl_pos) !=3D gbl_region.array[gbl_pos]); > gbl_pos++; > } >=20 dyn_pos =3D 0u; > while (!dyn_failed && dyn_pos<region_size) { > dyn_failed =3D (value(dyn_pos) !=3D = (*dyn_region).array[dyn_pos]); > // Note: When the memory pattern fails this case is that > // records the failure. > dyn_pos++; > } >=20 > if (gbl_failed) raise(SIGABRT); > if (dyn_failed) raise(SIGABRT); // lldb reports this line for the = __raise call. > // when it fails (both parent and = child processes). > } I'm not bothering to redo the details below for the line number variations. > Other details from lldb (not using -O2 so things are > simpler, not presented in the order examined): >=20 > # lldb a.out -c /var/crash/a.out.11575.core > (lldb) target create "a.out" --core "/var/crash/a.out.11575.core" > Core file '/var/crash/a.out.11575.core' (aarch64) was loaded. > (lldb) bt > * thread #1, name =3D 'a.out', stop reason =3D signal SIGABRT > * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8 > frame #1: libc.so.7`__raise(s=3D6) at raise.c:52 > frame #2: a.out`test_check at swap_testing.c:103 > frame #3: a.out`main at swap_testing.c:42 > frame #4: 0x0000000000020184 a.out`__start + 364 > frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41 >=20 > (lldb) up 2 > frame #2: a.out`test_check at swap_testing.c:103 > 100 } > 101 =09 > 102 if (gbl_failed) raise(SIGABRT); > -> 103 if (dyn_failed) raise(SIGABRT); // lldb reports this = line for the __raise call. > 104 // when it fails (both = parent and child processes). > 105 } >=20 > (lldb) print dyn_pos > (size_t) $0 =3D 2 >=20 > (That is one after the failure position.) >=20 >=20 > (lldb) print dyn_region > (region *volatile) $3 =3D 0x0000000040616000 >=20 > (lldb) print *dyn_region > (region) $1 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\0' > [2] =3D '\0' > . . . (all '\0' bytes) . . . > [251] =3D '\0' > [252] =3D '\0' > [253] =3D '\0' > [254] =3D '\0' > [255] =3D '\0' > ... > } > } >=20 > (lldb) print gbl_region > (region) $2 =3D { > array =3D { > [0] =3D '\0' > [1] =3D '\x01' > [2] =3D '\x02' > . . . > [251] =3D '\xfb' > [252] =3D '\xfc' > [253] =3D '\xfd' > [254] =3D '\xfe' > [255] =3D '\xff' > ... > } > } >=20 > (lldb) disass -n main > a.out`main: > 0x2022c <+0>: sub sp, sp, #0x30 ; =3D0x30=20 > 0x20230 <+4>: stp x29, x30, [sp, #0x20] > 0x20234 <+8>: add x29, sp, #0x20 ; =3D0x20=20 > 0x20238 <+12>: stur wzr, [x29, #-0x4] > 0x2023c <+16>: bl 0x202b0 ; test_setup at = swap_testing.c:74 > 0x20240 <+20>: bl 0x20580 ; symbol stub for: = fork > 0x20244 <+24>: mov w8, wzr > 0x20248 <+28>: stur w0, [x29, #-0x8] > 0x2024c <+32>: stur wzr, [x29, #-0xc] > 0x20250 <+36>: ldur w0, [x29, #-0x8] > 0x20254 <+40>: cmp w8, w0 > 0x20258 <+44>: b.ge 0x20268 ; <+60> at = swap_testing.c > 0x2025c <+48>: sub x0, x29, #0xc ; =3D0xc=20 > 0x20260 <+52>: bl 0x20590 ; symbol stub for: = wait > 0x20264 <+56>: str w0, [sp, #0x10] > 0x20268 <+60>: mov w8, #-0x1 > 0x2026c <+64>: ldur w9, [x29, #-0xc] > 0x20270 <+68>: cmp w8, w9 > 0x20274 <+72>: b.eq 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20278 <+76>: mov w8, wzr > 0x2027c <+80>: ldur w9, [x29, #-0x8] > 0x20280 <+84>: cmp w8, w9 > 0x20284 <+88>: b.gt 0x202a0 ; <+116> at = swap_testing.c:44 > 0x20288 <+92>: ldur w8, [x29, #-0x8] > 0x2028c <+96>: cbnz w8, 0x2029c ; <+112> at = swap_testing.c:42 > 0x20290 <+100>: orr w0, wzr, #0x3c > 0x20294 <+104>: bl 0x205a0 ; symbol stub for: = sleep > 0x20298 <+108>: str w0, [sp, #0xc] > 0x2029c <+112>: bl 0x20348 ; test_check at = swap_testing.c:89 > 0x202a0 <+116>: ldur w0, [x29, #-0x4] > 0x202a4 <+120>: ldp x29, x30, [sp, #0x20] > 0x202a8 <+124>: add sp, sp, #0x30 ; =3D0x30=20 > 0x202ac <+128>: ret =20 >=20 > (lldb) disass -n value > a.out`value: > 0x204cc <+0>: sub sp, sp, #0x10 ; =3D0x10=20 > 0x204d0 <+4>: str x0, [sp, #0x8] > 0x204d4 <+8>: ldrb w8, [sp, #0x8] > 0x204d8 <+12>: mov w1, w8 > 0x204dc <+16>: mov w0, w8 > 0x204e0 <+20>: str w1, [sp, #0x4] > 0x204e4 <+24>: add sp, sp, #0x10 ; =3D0x10=20 > 0x204e8 <+28>: ret =20 >=20 > (lldb) disass -n test_setup > a.out`test_setup: > 0x202b0 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x202b4 <+4>: stp x29, x30, [sp, #0x10] > 0x202b8 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x202bc <+12>: orr x0, xzr, #0x3800 > 0x202c0 <+16>: bl 0x205b0 ; symbol stub for: = malloc > 0x202c4 <+20>: adrp x30, 48 > 0x202c8 <+24>: add x30, x30, #0x0 ; =3D0x0=20 > 0x202cc <+28>: str x0, [x30] > 0x202d0 <+32>: ldr x0, [x30] > 0x202d4 <+36>: cbnz x0, 0x202e4 ; <+52> at = swap_testing.c:78 > 0x202d8 <+40>: orr w0, wzr, #0x6 > 0x202dc <+44>: bl 0x205c0 ; symbol stub for: = raise > 0x202e0 <+48>: str w0, [sp, #0x4] > 0x202e4 <+52>: str xzr, [sp, #0x8] > 0x202e8 <+56>: orr x8, xzr, #0x3800 > 0x202ec <+60>: ldr x9, [sp, #0x8] > 0x202f0 <+64>: cmp x9, x8 > 0x202f4 <+68>: b.hs 0x2033c ; <+140> at = swap_testing.c:81 > 0x202f8 <+72>: ldr x0, [sp, #0x8] > 0x202fc <+76>: bl 0x204cc ; value at = swap_testing.c:72 > 0x20300 <+80>: adrp x30, 48 > 0x20304 <+84>: add x30, x30, #0x0 ; =3D0x0=20 > 0x20308 <+88>: adrp x8, 48 > 0x2030c <+92>: add x8, x8, #0x8 ; =3D0x8=20 > 0x20310 <+96>: ldr x9, [sp, #0x8] > 0x20314 <+100>: add x8, x8, x9 > 0x20318 <+104>: strb w0, [x8] > 0x2031c <+108>: ldr x8, [x30] > 0x20320 <+112>: ldr x9, [sp, #0x8] > 0x20324 <+116>: add x8, x8, x9 > 0x20328 <+120>: strb w0, [x8] > 0x2032c <+124>: ldr x8, [sp, #0x8] > 0x20330 <+128>: add x8, x8, #0x1 ; =3D0x1=20 > 0x20334 <+132>: str x8, [sp, #0x8] > 0x20338 <+136>: b 0x202e8 ; <+56> at = swap_testing.c > 0x2033c <+140>: ldp x29, x30, [sp, #0x10] > 0x20340 <+144>: add sp, sp, #0x20 ; =3D0x20=20 > 0x20344 <+148>: ret =20 >=20 > (lldb) disass -n test_check > a.out`test_check: > 0x20348 <+0>: sub sp, sp, #0x20 ; =3D0x20=20 > 0x2034c <+4>: stp x29, x30, [sp, #0x10] > 0x20350 <+8>: add x29, sp, #0x10 ; =3D0x10=20 > 0x20354 <+12>: b 0x20358 ; <+16> at = swap_testing.c > 0x20358 <+16>: mov w8, wzr > 0x2035c <+20>: adrp x9, 51 > 0x20360 <+24>: add x9, x9, #0x808 ; =3D0x808=20 > 0x20364 <+28>: ldrb w10, [x9] > 0x20368 <+32>: stur w8, [x29, #-0x4] > 0x2036c <+36>: tbnz w10, #0x0, 0x2038c ; <+68> at = swap_testing.c > 0x20370 <+40>: orr x8, xzr, #0x3800 > 0x20374 <+44>: adrp x9, 51 > 0x20378 <+48>: add x9, x9, #0x810 ; =3D0x810=20 > 0x2037c <+52>: ldr x9, [x9] > 0x20380 <+56>: cmp x9, x8 > 0x20384 <+60>: cset w10, lo > 0x20388 <+64>: stur w10, [x29, #-0x4] > 0x2038c <+68>: ldur w8, [x29, #-0x4] > 0x20390 <+72>: tbz w8, #0x0, 0x203ec ; <+164> at = swap_testing.c:95 > 0x20394 <+76>: adrp x8, 51 > 0x20398 <+80>: add x8, x8, #0x810 ; =3D0x810=20 > 0x2039c <+84>: ldr x0, [x8] > 0x203a0 <+88>: bl 0x204cc ; value at = swap_testing.c:72 > 0x203a4 <+92>: adrp x8, 51 > 0x203a8 <+96>: add x8, x8, #0x810 ; =3D0x810=20 > 0x203ac <+100>: adrp x30, 51 > 0x203b0 <+104>: add x30, x30, #0x808 ; =3D0x808=20 > 0x203b4 <+108>: adrp x9, 48 > 0x203b8 <+112>: add x9, x9, #0x8 ; =3D0x8=20 > 0x203bc <+116>: uxtb w0, w0 > 0x203c0 <+120>: ldr x10, [x8] > 0x203c4 <+124>: add x9, x9, x10 > 0x203c8 <+128>: ldrb w11, [x9] > 0x203cc <+132>: cmp w0, w11 > 0x203d0 <+136>: cset w11, ne > 0x203d4 <+140>: and w11, w11, #0x1 > 0x203d8 <+144>: strb w11, [x30] > 0x203dc <+148>: ldr x9, [x8] > 0x203e0 <+152>: add x9, x9, #0x1 ; =3D0x1=20 > 0x203e4 <+156>: str x9, [x8] > 0x203e8 <+160>: b 0x20358 ; <+16> at = swap_testing.c > 0x203ec <+164>: b 0x203f0 ; <+168> at = swap_testing.c > 0x203f0 <+168>: mov w8, wzr > 0x203f4 <+172>: adrp x9, 51 > 0x203f8 <+176>: add x9, x9, #0x818 ; =3D0x818=20 > 0x203fc <+180>: ldrb w10, [x9] > 0x20400 <+184>: str w8, [sp, #0x8] > 0x20404 <+188>: tbnz w10, #0x0, 0x20424 ; <+220> at = swap_testing.c > 0x20408 <+192>: orr x8, xzr, #0x3800 > 0x2040c <+196>: adrp x9, 51 > 0x20410 <+200>: add x9, x9, #0x820 ; =3D0x820=20 > 0x20414 <+204>: ldr x9, [x9] > 0x20418 <+208>: cmp x9, x8 > 0x2041c <+212>: cset w10, lo > 0x20420 <+216>: str w10, [sp, #0x8] > 0x20424 <+220>: ldr w8, [sp, #0x8] > 0x20428 <+224>: tbz w8, #0x0, 0x20488 ; <+320> at = swap_testing.c > 0x2042c <+228>: adrp x8, 51 > 0x20430 <+232>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20434 <+236>: ldr x0, [x8] > 0x20438 <+240>: bl 0x204cc ; value at = swap_testing.c:72 > 0x2043c <+244>: adrp x8, 51 > 0x20440 <+248>: add x8, x8, #0x820 ; =3D0x820=20 > 0x20444 <+252>: adrp x30, 51 > 0x20448 <+256>: add x30, x30, #0x818 ; =3D0x818=20 > 0x2044c <+260>: adrp x9, 48 > 0x20450 <+264>: add x9, x9, #0x0 ; =3D0x0=20 > 0x20454 <+268>: uxtb w0, w0 > 0x20458 <+272>: ldr x9, [x9] > 0x2045c <+276>: ldr x10, [x8] > 0x20460 <+280>: add x9, x9, x10 > 0x20464 <+284>: ldrb w11, [x9] > 0x20468 <+288>: cmp w0, w11 > 0x2046c <+292>: cset w11, ne > 0x20470 <+296>: and w11, w11, #0x1 > 0x20474 <+300>: strb w11, [x30] > 0x20478 <+304>: ldr x9, [x8] > 0x2047c <+308>: add x9, x9, #0x1 ; =3D0x1=20 > 0x20480 <+312>: str x9, [x8] > 0x20484 <+316>: b 0x203f0 ; <+168> at = swap_testing.c > 0x20488 <+320>: adrp x8, 51 > 0x2048c <+324>: add x8, x8, #0x808 ; =3D0x808=20 > 0x20490 <+328>: ldrb w9, [x8] > 0x20494 <+332>: tbz w9, #0x0, 0x204a4 ; <+348> at = swap_testing.c > 0x20498 <+336>: orr w0, wzr, #0x6 > 0x2049c <+340>: bl 0x205c0 ; symbol stub for: = raise > 0x204a0 <+344>: str w0, [sp, #0x4] > 0x204a4 <+348>: adrp x8, 51 > 0x204a8 <+352>: add x8, x8, #0x818 ; =3D0x818=20 > 0x204ac <+356>: ldrb w9, [x8] > 0x204b0 <+360>: tbz w9, #0x0, 0x204c0 ; <+376> at = swap_testing.c:105 > 0x204b4 <+364>: orr w0, wzr, #0x6 > 0x204b8 <+368>: bl 0x205c0 ; symbol stub for: = raise > -> 0x204bc <+372>: str w0, [sp] > 0x204c0 <+376>: ldp x29, x30, [sp, #0x10] > 0x204c4 <+380>: add sp, sp, #0x20 ; =3D0x20=20 > 0x204c8 <+384>: ret =20 >=20 > # uname -apKU > FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r314638M arm64 = aarch64 1200023 1200023 >=20 > buildworld buildlkernel did not have MALLOC_PRODUCTION=3D defined. The = kernel is a > non-debug kernel. (Previous to these experiments my other corruption = examples > were not caught by a debug kernel. I'm not hopeful that this simpler = context > would either.) =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16B3D614-62E1-4E58-B409-8DB9DBB35BCB>