Date: Tue, 1 Jan 2019 19:37:35 -0800 From: Mark Millard <marklmi@yahoo.com> To: freebsd-emulation@freebsd.org, ports-list freebsd <freebsd-ports@freebsd.org> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [problem possibly found] Message-ID: <B68A2CBC-7B81-47B9-A997-2A78C24F67FC@yahoo.com> In-Reply-To: <51834F45-1EB7-449F-A0EC-7F5EC7CBB12F@yahoo.com> References: <51834F45-1EB7-449F-A0EC-7F5EC7CBB12F@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[It looks to me like the assembler code has some code moved out of the loop that it should not be (by intent). It appears that calling tcmpset_32 does not prevent code motion to before the call and the variable involved was not declared volatile.] On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote: > The below showed up for poudiere-devel bulk getting stuck using one = FreeBSD > cpu while building graphics/poppler-qt5 . This is not a kevent = hang-up, unlike > the last hang-up that I analyzed. I do not yet know how repeatable = this is > but the original hang-up and the one experiment the below is from. >=20 > =46rom top: >=20 > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND > 12789 root 4 52 0 166M 33M 0 uwait 6 36:06 = 97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E = cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0 >=20 > Note: The vast margority of the 36:06 has been stuck in the uwait loop = involved. >=20 > =46rom ps -auxd: >=20 > root 94075 0.0 0.0 12932 3552 1 S+ 10:42 = 0:01.21 | `-- sh -e /usr/local/share/poudriere/bulk.sh = -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 > root 1944 0.0 0.0 12932 3540 1 I+ 10:42 = 0:00.00 | |-- sh -e /usr/local/share/poudriere/bulk.sh = -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 > root 1957 0.0 0.0 12932 3556 1 I 10:42 = 0:00.04 | |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: = build_pkg (poppler-qt5-0.72.0) (sh) > root 12328 0.0 0.0 12932 3548 1 I 10:49 = 0:00.00 | | `-- sh: = poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) = (sh) > root 12329 0.0 0.0 10328 1756 1 IJ 10:49 = 0:00.01 | | `-- /usr/bin/make -C = /usr/ports/graphics/poppler-qt5 stage > root 12350 0.0 0.0 9860 1248 1 IJ 10:49 = 0:00.00 | | `-- /usr/bin/make -f Makefile = DESTDIR=3D/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install > root 12353 0.0 0.0 10236 1664 1 IJ 10:49 = 0:00.05 | | `-- /nxb-bin/usr/bin/make -f = CMakeFiles/Makefile2 qt5/all > root 12787 0.0 0.0 9856 1236 1 IJ 10:50 = 0:00.00 | | `-- /nxb-bin/usr/bin/make -f = qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make = qt5/test > root 12789 100.0 0.0 169868 33528 1 IJ 10:50 = 36:35.26 | | `-- /usr/local/bin/qemu-arm-static = /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/ > root 94423 0.0 0.0 12932 3484 1 S+ 10:42 = 0:12.91 | `-- sh -e /usr/local/share/poudriere/bulk.sh = -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 >=20 >=20 > (gdb) attach 12789 > Attaching to process 12789 > Reading symbols from = /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qe= mu-arm-static...done. > [New LWP 101168 of process 12789] > [New LWP 101178 of process 12789] > [New LWP 101499 of process 12789] > [Switching to LWP 100304 of process 12789] > _umtx_op () at _umtx_op.S:3 > 3 RSYSCALL(_umtx_op) > (gdb) info threads > Id Target Id Frame=20 > * 1 LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3 > 2 LWP 101168 of process 12789 _umtx_op_err () at = /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 3 LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3 > 4 LWP 101499 of process 12789 0x0000000060051c26 in = atomic_cmpset_int (dst=3D<optimized out>, expect=3D<optimized out>, = src=3D536870912) at /usr/include/machine/atomic.h:220 > (gdb) thread 4 > [Switching to thread 4 (LWP 101499 of process 12789)] > #0 0x0000000060051c26 in atomic_cmpset_int (dst=3D<optimized out>, = expect=3D<optimized out>, src=3D536870912) at = /usr/include/machine/atomic.h:220 > 220 ATOMIC_CMPSET(int); >=20 > (gdb) bt > #0 0x0000000060051c26 in atomic_cmpset_int (dst=3D<optimized out>, = expect=3D<optimized out>, src=3D536870912) at = /usr/include/machine/atomic.h:220 > #1 tcmpset_32 (addr=3D<optimized out>, a=3D<optimized out>, = b=3D536870912) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.c:178 > #2 freebsd_rw_unlock (target_addr=3D4108246528) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.c:1264 > #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=3D<optimized out>, = op=3D536870912, val=3D<optimized out>, uaddr=3D<optimized out>, = target_time=3D<optimized out>) > at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.h:474 > #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=3D0x86159b118, = num=3D454, arg1=3D<optimized out>, arg2=3D<optimized out>, = arg3=3D<optimized out>, arg4=3D0, arg5=3D0, arg6=3D-184411592, = arg7=3D-199471616,=20 > arg8=3D-1622188640) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/syscall.c:1364 > #5 0x00000000600392f0 in target_cpu_loop (env=3D0x86159b118) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/arm/target_arch_cpu.h:207 > #6 0x0000000060038c99 in cpu_loop (env=3D0xf4dede80) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/main.c:121 > #7 0x0000000060050c1a in new_freebsd_thread_start (arg=3D<optimized = out>) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.c:152 > #8 0x00000000601ad5f6 in thread_start (curthread=3D0x861571300) at = /usr/src/lib/libthr/thread/thr_create.c:291 > #9 0x0000000000000000 in ?? () > Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000 >=20 > 181 #define ATOMIC_CMPSET(TYPE) \ > 182 static __inline int \ > 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, = u_##TYPE src) \ > 184 { \ > (gdb)=20 > 185 u_char res; \ > 186 \ > 187 __asm __volatile( \ > 188 " " MPLOCKED " " \ > 189 " cmpxchg %3,%1 ; " \ > 190 " sete %0 ; " \ > 191 "# atomic_cmpset_" #TYPE " " \ > 192 : "=3Dq" (res), /* 0 */ \ > 193 "+m" (*dst), /* 1 */ \ > 194 "+a" (expect) /* 2 */ \ > (gdb)=20 > 195 : "r" (src) /* 3 */ \ > 196 : "memory", "cc"); \ > 197 return (res); \ > 198 } \ > . . . > 218 ATOMIC_CMPSET(char); > 219 ATOMIC_CMPSET(short); > 220 ATOMIC_CMPSET(int); > 221 ATOMIC_CMPSET(long); >=20 > (gdb) up > #1 tcmpset_32 (addr=3D<optimized out>, a=3D<optimized out>, = b=3D536870912) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.c:178 > 178 return atomic_cmpset_acq_32(addr, current, new); > (gdb) list > 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b) > 174 { > 175 uint32_t current =3D tswap32(a); > 176 uint32_t new =3D tswap32(b); > 177=09 > 178 return atomic_cmpset_acq_32(addr, current, new); > 179 } >=20 > (gdb) up > #2 freebsd_rw_unlock (target_addr=3D4108246528) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.c:1264 > 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, = (state - 1))) { > (gdb) list > 1259 } > 1260 } > 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) !=3D 0) { > 1262 /* decrement reader count */ > 1263 for (;;) { > 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, = (state - 1))) { > 1265 if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) = { > 1266 unlock_user_struct(target_urwlock, > 1267 target_addr, 1); > 1268 return -TARGET_EPERM; > (gdb)=20 > 1269 } > 1270 } else { > 1271 break; > 1272 } > 1273 } > 1274 } else { > 1275 unlock_user_struct(target_urwlock, target_addr, 1); > 1276 return -TARGET_EPERM; > 1277 } > 1278=09 >=20 > The above loop is where the code is stuck looping, using most of the = capacity of > one cpu (as FreeBSD classifies cpus). The value of state is stuck at: >=20 > (gdb) print state > $2 =3D 536870913 >=20 > I'll note that the loop is inside qemu-arm-atatic, not in the cmake = that it > is running. >=20 >=20 >=20 > To show a little more context before the assembler for the routine = that ends up > containing that loop: >=20 > (gdb) up > #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=3D<optimized out>, = op=3D536870912, val=3D<optimized out>, uaddr=3D<optimized out>, = target_time=3D<optimized out>) > at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/freebsd/os-thread.h:474 > 474 ret =3D freebsd_rw_unlock(obj); > (gdb) list > 469 ret =3D freebsd_rw_wrlock(obj, val, 0, NULL); > 470 } > 471 break; > 472=09 > 473 case TARGET_UMTX_OP_RW_UNLOCK: > 474 ret =3D freebsd_rw_unlock(obj); > 475 break; > 476=09 > 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000 > 478 #ifdef UMTX_OP_MUTEX_WAKE2 >=20 > (gdb) up > #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=3D0x86159b118, = num=3D454, arg1=3D<optimized out>, arg2=3D<optimized out>, = arg3=3D<optimized out>, arg4=3D0, arg5=3D0, arg6=3D-184411592, = arg7=3D-199471616,=20 > arg8=3D-1622188640) at = /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/b= sd-user/syscall.c:1364 > 1364 ret =3D do_freebsd__umtx_op(arg1, arg2, arg3, arg4, = arg5); > (gdb) list > 1359 ret =3D do_freebsd__umtx_unlock(arg1); > 1360 break; > 1361 #endif > 1362=09 > 1363 case TARGET_FREEBSD_NR__umtx_op: /* undocumented */ > 1364 ret =3D do_freebsd__umtx_op(arg1, arg2, arg3, arg4, = arg5); > 1365 break; > 1366=09 > 1367 /* > 1368 * ioctl(2) >=20 > As for the assembler level view for the routine containing the > loop (via inlining and such): >=20 > Dump of assembler code for function freebsd_rw_unlock: > 0x0000000060051bd0 <+0>: push %rbp > 0x0000000060051bd1 <+1>: mov %rsp,%rbp > 0x0000000060051bd4 <+4>: push %rbx > 0x0000000060051bd5 <+5>: push %rax > 0x0000000060051bd6 <+6>: mov %edi,%ebx > 0x0000000060051bd8 <+8>: mov $0x20,%esi > 0x0000000060051bdd <+13>: mov $0x3,%edx > 0x0000000060051be2 <+18>: callq 0x60036b70 <page_check_range> > 0x0000000060051be7 <+23>: mov %eax,%ecx > 0x0000000060051be9 <+25>: mov $0xfffffff2,%eax > 0x0000000060051bee <+30>: test %ecx,%ecx > 0x0000000060051bf0 <+32>: jne 0x60051c50 = <freebsd_rw_unlock+128> > 0x0000000060051bf2 <+34>: mov %ebx,%edi > 0x0000000060051bf4 <+36>: add 0x26a6355(%rip),%rdi # = 0x626f7f50 <guest_base> > 0x0000000060051bfb <+43>: je 0x60051c50 = <freebsd_rw_unlock+128> > 0x0000000060051bfd <+45>: mov (%rdi),%ecx > 0x0000000060051bff <+47>: mov 0x4(%rdi),%edx > 0x0000000060051c02 <+50>: test %ecx,%ecx > 0x0000000060051c04 <+52>: js 0x60051c30 <freebsd_rw_unlock+96> > 0x0000000060051c06 <+54>: test $0x1fffffff,%ecx > 0x0000000060051c0c <+60>: je 0x60051c4b = <freebsd_rw_unlock+123> > 0x0000000060051c0e <+62>: lea -0x1(%rcx),%esi > 0x0000000060051c11 <+65>: nopw %cs:0x0(%rax,%rax,1) > 0x0000000060051c1b <+75>: nopl 0x0(%rax,%rax,1) > NOTE: loop is below: > 0x0000000060051c20 <+80>: mov %ecx,%eax > =3D> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi) > 0x0000000060051c26 <+86>: sete %bl > 0x0000000060051c29 <+89>: test %bl,%bl > 0x0000000060051c2b <+91>: je 0x60051c20 <freebsd_rw_unlock+80> > Note: loop is above > 0x0000000060051c2d <+93>: jmp 0x60051c57 = <freebsd_rw_unlock+135> > 0x0000000060051c2f <+95>: nop > 0x0000000060051c30 <+96>: mov %ecx,%esi > 0x0000000060051c32 <+98>: and $0x7fffffff,%esi > 0x0000000060051c38 <+104>: mov %ecx,%eax > 0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi) > 0x0000000060051c3e <+110>: sete %bl > 0x0000000060051c41 <+113>: test %bl,%bl > 0x0000000060051c43 <+115>: jne 0x60051c57 = <freebsd_rw_unlock+135> > 0x0000000060051c45 <+117>: mov (%rdi),%ecx > 0x0000000060051c47 <+119>: test %ecx,%ecx > 0x0000000060051c49 <+121>: js 0x60051c30 <freebsd_rw_unlock+96> > 0x0000000060051c4b <+123>: mov $0xffffffff,%eax > 0x0000000060051c50 <+128>: add $0x8,%rsp > 0x0000000060051c54 <+132>: pop %rbx > 0x0000000060051c55 <+133>: pop %rbp > 0x0000000060051c56 <+134>: retq =20 > 0x0000000060051c57 <+135>: test $0x2,%dl > 0x0000000060051c5a <+138>: jne 0x60051c70 = <freebsd_rw_unlock+160> > 0x0000000060051c5c <+140>: test $0x40000000,%ecx > 0x0000000060051c62 <+146>: jne 0x60051c82 = <freebsd_rw_unlock+178> > 0x0000000060051c64 <+148>: xor %eax,%eax > 0x0000000060051c66 <+150>: test $0x20000000,%ecx > 0x0000000060051c6c <+156>: je 0x60051c50 = <freebsd_rw_unlock+128> > 0x0000000060051c6e <+158>: jmp 0x60051c89 = <freebsd_rw_unlock+185> > 0x0000000060051c70 <+160>: test $0x20000000,%ecx > 0x0000000060051c76 <+166>: jne 0x60051c89 = <freebsd_rw_unlock+185> > 0x0000000060051c78 <+168>: xor %eax,%eax > 0x0000000060051c7a <+170>: test $0x40000000,%ecx > 0x0000000060051c80 <+176>: je 0x60051c50 = <freebsd_rw_unlock+128> > 0x0000000060051c82 <+178>: mov $0x1,%edx > 0x0000000060051c87 <+183>: jmp 0x60051c8e = <freebsd_rw_unlock+190> > 0x0000000060051c89 <+185>: mov $0x7fffffff,%edx > 0x0000000060051c8e <+190>: mov $0x3,%esi > 0x0000000060051c93 <+195>: xor %ecx,%ecx > 0x0000000060051c95 <+197>: xor %r8d,%r8d > 0x0000000060051c98 <+200>: callq 0x601b5970 <_umtx_op> > 0x0000000060051c9d <+205>: mov %eax,%edi > 0x0000000060051c9f <+207>: add $0x8,%rsp > 0x0000000060051ca3 <+211>: pop %rbx > 0x0000000060051ca4 <+212>: pop %rbp > --Type <RET> for more, q to quit, c to continue without paging-- > 0x0000000060051ca5 <+213>: jmpq 0x60040a40 <get_errno> > End of assembler dump. >=20 > For just before the "lock cmpxchg %esi,(%rdi)" in the loop: >=20 > rax 0x20000001 536870913 > rbx 0xf4dede00 4108246528 > rcx 0x20000001 536870913 > rdx 0x2 2 > rsi 0x20000000 536870912 > rdi 0xf4dede80 4108246656 >=20 > (gdb) x 0xf4dede80 > 0xf4dede80: 0x40000001 >=20 > 0x0000000060051c20 <+80>: mov %ecx,%eax > =3D> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi) > 0x0000000060051c26 <+86>: sete %bl > 0x0000000060051c29 <+89>: test %bl,%bl > 0x0000000060051c2b <+91>: je 0x60051c20 <freebsd_rw_unlock+80> >=20 >=20 > At this point I do not have interpretation of the details, not even > a comparison to the source code. In the source code there is: 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) !=3D 0) { 1262 /* decrement reader count */ 1263 for (;;) { 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, = (state - 1))) { 1265 if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) = { 1266 unlock_user_struct(target_urwlock, 1267 target_addr, 1); 1268 return -TARGET_EPERM; (gdb)=20 1269 } 1270 } else { 1271 break; 1272 } 1273 } But in the assembler code it appears to me that the code: 1265 if (TARGET_URWLOCK_READER_COUNT(state) =3D=3D 0) = { 1266 unlock_user_struct(target_urwlock, 1267 target_addr, 1); 1268 return -TARGET_EPERM; (gdb)=20 1269 } was factored out of the loop, with the test moved to before the loop. The TARGET_URWLOCK_READER_COUNT use masks off based on 0x1fffffff . But the code for the conditional test is before the loop: 0x0000000060051c06 <+54>: test $0x1fffffff,%ecx 0x0000000060051c0c <+60>: je 0x60051c4b = <freebsd_rw_unlock+123> . . . NOTE: loop is below: 0x0000000060051c20 <+80>: mov %ecx,%eax =3D> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi) 0x0000000060051c26 <+86>: sete %bl 0x0000000060051c29 <+89>: test %bl,%bl 0x0000000060051c2b <+91>: je 0x60051c20 <freebsd_rw_unlock+80> Note: loop is above I may try declaring state volatile in an attempt to prevent the code motion. (But what of other uses of tcmpset_32 controlling subordinate code in other places?) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B68A2CBC-7B81-47B9-A997-2A78C24F67FC>