From owner-freebsd-arm@freebsd.org Thu Jan 3 01:41:21 2019 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2528C14388BE; Thu, 3 Jan 2019 01:41:21 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BCB5382FBB; Thu, 3 Jan 2019 01:41:20 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id 51D5311457; Thu, 3 Jan 2019 01:41:20 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-lj1-f177.google.com with SMTP id v1-v6so28542784ljd.0; Wed, 02 Jan 2019 17:41:20 -0800 (PST) X-Gm-Message-State: AJcUukfWT9N8dMCGmqn38AmC6tRmV8i4GIuafRdMmA4d/YLGFM85+EG4 WSFa/2SHFx7sWEM6U0Pdg8lknY92v9e4FqF99bQ= X-Google-Smtp-Source: AFSGD/VJMe+L+fDMV1QOWtVKFImtMj9oK/7YxzR6l2A9eNAQEZzYbL+cfIIFZCkJvNdmsv3pbmPNnzFZBOYtZg4EhjM= X-Received: by 2002:a2e:b04f:: with SMTP id d15-v6mr28618253ljl.3.1546479678673; Wed, 02 Jan 2019 17:41:18 -0800 (PST) MIME-Version: 1.0 References: <51834F45-1EB7-449F-A0EC-7F5EC7CBB12F@yahoo.com> In-Reply-To: From: Kyle Evans Date: Wed, 2 Jan 2019 19:41:07 -0600 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [a tested fix included] To: Mark Millard Cc: Sean Bruno , freebsd-emulation@freebsd.org, ports-list freebsd , freebsd-arm Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: BCB5382FBB X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.97)[-0.969,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_LONG(-1.00)[-0.998,0] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2019 01:41:21 -0000 On Wed, Jan 2, 2019 at 3:38 AM Mark Millard via freebsd-ports wrote: > > On 2019-Jan-1, at 18:43, Mark Millard wrote: > > > The below showed up for poudiere-devel bulk getting stuck using one FreeBSD > > cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike > > the last hang-up that I analyzed. I do not yet know how repeatable this is > > but the original hang-up and the one experiment the below is from. > > > > From top: > > > > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND > > 12789 root 4 52 0 166M 33M 0 uwait 6 36:06 97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0 > > > > Note: The vast margority of the 36:06 has been stuck in the uwait loop involved. > > > > From ps -auxd: > > > > root 94075 0.0 0.0 12932 3552 1 S+ 10:42 0:01.21 | `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 > > root 1944 0.0 0.0 12932 3540 1 I+ 10:42 0:00.00 | |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 > > root 1957 0.0 0.0 12932 3556 1 I 10:42 0:00.04 | |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh) > > root 12328 0.0 0.0 12932 3548 1 I 10:49 0:00.00 | | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh) > > root 12329 0.0 0.0 10328 1756 1 IJ 10:49 0:00.01 | | `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage > > root 12350 0.0 0.0 9860 1248 1 IJ 10:49 0:00.00 | | `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install > > root 12353 0.0 0.0 10236 1664 1 IJ 10:49 0:00.05 | | `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all > > root 12787 0.0 0.0 9856 1236 1 IJ 10:50 0:00.00 | | `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test > > root 12789 100.0 0.0 169868 33528 1 IJ 10:50 36:35.26 | | `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/ > > root 94423 0.0 0.0 12932 3484 1 S+ 10:42 0:12.91 | `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5 > > > > > > (gdb) attach 12789 > > Attaching to process 12789 > > Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done. > > [New LWP 101168 of process 12789] > > [New LWP 101178 of process 12789] > > [New LWP 101499 of process 12789] > > [Switching to LWP 100304 of process 12789] > > _umtx_op () at _umtx_op.S:3 > > 3 RSYSCALL(_umtx_op) > > (gdb) info threads > > Id Target Id Frame > > * 1 LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3 > > 2 LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > > 3 LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3 > > 4 LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=, expect=, src=536870912) at /usr/include/machine/atomic.h:220 > > (gdb) thread 4 > > [Switching to thread 4 (LWP 101499 of process 12789)] > > #0 0x0000000060051c26 in atomic_cmpset_int (dst=, expect=, src=536870912) at /usr/include/machine/atomic.h:220 > > 220 ATOMIC_CMPSET(int); > > > > (gdb) bt > > #0 0x0000000060051c26 in atomic_cmpset_int (dst=, expect=, src=536870912) at /usr/include/machine/atomic.h:220 > > #1 tcmpset_32 (addr=, a=, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178 > > #2 freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264 > > #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=, op=536870912, val=, uaddr=, target_time=) > > at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474 > > #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=, arg2=, arg3=, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616, > > arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364 > > #5 0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207 > > #6 0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121 > > #7 0x0000000060050c1a in new_freebsd_thread_start (arg=) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152 > > #8 0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291 > > #9 0x0000000000000000 in ?? () > > Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000 > > > > 181 #define ATOMIC_CMPSET(TYPE) \ > > 182 static __inline int \ > > 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \ > > 184 { \ > > (gdb) > > 185 u_char res; \ > > 186 \ > > 187 __asm __volatile( \ > > 188 " " MPLOCKED " " \ > > 189 " cmpxchg %3,%1 ; " \ > > 190 " sete %0 ; " \ > > 191 "# atomic_cmpset_" #TYPE " " \ > > 192 : "=q" (res), /* 0 */ \ > > 193 "+m" (*dst), /* 1 */ \ > > 194 "+a" (expect) /* 2 */ \ > > (gdb) > > 195 : "r" (src) /* 3 */ \ > > 196 : "memory", "cc"); \ > > 197 return (res); \ > > 198 } \ > > . . . > > 218 ATOMIC_CMPSET(char); > > 219 ATOMIC_CMPSET(short); > > 220 ATOMIC_CMPSET(int); > > 221 ATOMIC_CMPSET(long); > > > > (gdb) up > > #1 tcmpset_32 (addr=, a=, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178 > > 178 return atomic_cmpset_acq_32(addr, current, new); > > (gdb) list > > 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b) > > 174 { > > 175 uint32_t current = tswap32(a); > > 176 uint32_t new = tswap32(b); > > 177 > > 178 return atomic_cmpset_acq_32(addr, current, new); > > 179 } > > > > (gdb) up > > #2 freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264 > > 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) { > > (gdb) list > > 1259 } > > 1260 } > > 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) { > > 1262 /* decrement reader count */ > > 1263 for (;;) { > > 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) { > > 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) { > > 1266 unlock_user_struct(target_urwlock, > > 1267 target_addr, 1); > > 1268 return -TARGET_EPERM; > > (gdb) > > 1269 } > > 1270 } else { > > 1271 break; > > 1272 } > > 1273 } > > 1274 } else { > > 1275 unlock_user_struct(target_urwlock, target_addr, 1); > > 1276 return -TARGET_EPERM; > > 1277 } > > 1278 > > > > The above loop is where the code is stuck looping, using most of the capacity of > > one cpu (as FreeBSD classifies cpus). The value of state is stuck at: > > > > (gdb) print state > > $2 = 536870913 > > > > I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it > > is running. > > > > > > > > To show a little more context before the assembler for the routine that ends up > > containing that loop: > > > > (gdb) up > > #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=, op=536870912, val=, uaddr=, target_time=) > > at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474 > > 474 ret = freebsd_rw_unlock(obj); > > (gdb) list > > 469 ret = freebsd_rw_wrlock(obj, val, 0, NULL); > > 470 } > > 471 break; > > 472 > > 473 case TARGET_UMTX_OP_RW_UNLOCK: > > 474 ret = freebsd_rw_unlock(obj); > > 475 break; > > 476 > > 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000 > > 478 #ifdef UMTX_OP_MUTEX_WAKE2 > > > > (gdb) up > > #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=, arg2=, arg3=, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616, > > arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364 > > 1364 ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5); > > (gdb) list > > 1359 ret = do_freebsd__umtx_unlock(arg1); > > 1360 break; > > 1361 #endif > > 1362 > > 1363 case TARGET_FREEBSD_NR__umtx_op: /* undocumented */ > > 1364 ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5); > > 1365 break; > > 1366 > > 1367 /* > > 1368 * ioctl(2) > > > > As for the assembler level view for the routine containing the > > loop (via inlining and such): > > > > Dump of assembler code for function freebsd_rw_unlock: > > 0x0000000060051bd0 <+0>: push %rbp > > 0x0000000060051bd1 <+1>: mov %rsp,%rbp > > 0x0000000060051bd4 <+4>: push %rbx > > 0x0000000060051bd5 <+5>: push %rax > > 0x0000000060051bd6 <+6>: mov %edi,%ebx > > 0x0000000060051bd8 <+8>: mov $0x20,%esi > > 0x0000000060051bdd <+13>: mov $0x3,%edx > > 0x0000000060051be2 <+18>: callq 0x60036b70 > > 0x0000000060051be7 <+23>: mov %eax,%ecx > > 0x0000000060051be9 <+25>: mov $0xfffffff2,%eax > > 0x0000000060051bee <+30>: test %ecx,%ecx > > 0x0000000060051bf0 <+32>: jne 0x60051c50 > > 0x0000000060051bf2 <+34>: mov %ebx,%edi > > 0x0000000060051bf4 <+36>: add 0x26a6355(%rip),%rdi # 0x626f7f50 > > 0x0000000060051bfb <+43>: je 0x60051c50 > > 0x0000000060051bfd <+45>: mov (%rdi),%ecx > > 0x0000000060051bff <+47>: mov 0x4(%rdi),%edx > > 0x0000000060051c02 <+50>: test %ecx,%ecx > > 0x0000000060051c04 <+52>: js 0x60051c30 > > 0x0000000060051c06 <+54>: test $0x1fffffff,%ecx > > 0x0000000060051c0c <+60>: je 0x60051c4b > > 0x0000000060051c0e <+62>: lea -0x1(%rcx),%esi > > 0x0000000060051c11 <+65>: nopw %cs:0x0(%rax,%rax,1) > > 0x0000000060051c1b <+75>: nopl 0x0(%rax,%rax,1) > > NOTE: loop is below: > > 0x0000000060051c20 <+80>: mov %ecx,%eax > > => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi) > > 0x0000000060051c26 <+86>: sete %bl > > 0x0000000060051c29 <+89>: test %bl,%bl > > 0x0000000060051c2b <+91>: je 0x60051c20 > > Note: loop is above > > 0x0000000060051c2d <+93>: jmp 0x60051c57 > > 0x0000000060051c2f <+95>: nop > > 0x0000000060051c30 <+96>: mov %ecx,%esi > > 0x0000000060051c32 <+98>: and $0x7fffffff,%esi > > 0x0000000060051c38 <+104>: mov %ecx,%eax > > 0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi) > > 0x0000000060051c3e <+110>: sete %bl > > 0x0000000060051c41 <+113>: test %bl,%bl > > 0x0000000060051c43 <+115>: jne 0x60051c57 > > 0x0000000060051c45 <+117>: mov (%rdi),%ecx > > 0x0000000060051c47 <+119>: test %ecx,%ecx > > 0x0000000060051c49 <+121>: js 0x60051c30 > > 0x0000000060051c4b <+123>: mov $0xffffffff,%eax > > 0x0000000060051c50 <+128>: add $0x8,%rsp > > 0x0000000060051c54 <+132>: pop %rbx > > 0x0000000060051c55 <+133>: pop %rbp > > 0x0000000060051c56 <+134>: retq > > 0x0000000060051c57 <+135>: test $0x2,%dl > > 0x0000000060051c5a <+138>: jne 0x60051c70 > > 0x0000000060051c5c <+140>: test $0x40000000,%ecx > > 0x0000000060051c62 <+146>: jne 0x60051c82 > > 0x0000000060051c64 <+148>: xor %eax,%eax > > 0x0000000060051c66 <+150>: test $0x20000000,%ecx > > 0x0000000060051c6c <+156>: je 0x60051c50 > > 0x0000000060051c6e <+158>: jmp 0x60051c89 > > 0x0000000060051c70 <+160>: test $0x20000000,%ecx > > 0x0000000060051c76 <+166>: jne 0x60051c89 > > 0x0000000060051c78 <+168>: xor %eax,%eax > > 0x0000000060051c7a <+170>: test $0x40000000,%ecx > > 0x0000000060051c80 <+176>: je 0x60051c50 > > 0x0000000060051c82 <+178>: mov $0x1,%edx > > 0x0000000060051c87 <+183>: jmp 0x60051c8e > > 0x0000000060051c89 <+185>: mov $0x7fffffff,%edx > > 0x0000000060051c8e <+190>: mov $0x3,%esi > > 0x0000000060051c93 <+195>: xor %ecx,%ecx > > 0x0000000060051c95 <+197>: xor %r8d,%r8d > > 0x0000000060051c98 <+200>: callq 0x601b5970 <_umtx_op> > > 0x0000000060051c9d <+205>: mov %eax,%edi > > 0x0000000060051c9f <+207>: add $0x8,%rsp > > 0x0000000060051ca3 <+211>: pop %rbx > > 0x0000000060051ca4 <+212>: pop %rbp > > --Type for more, q to quit, c to continue without paging-- > > 0x0000000060051ca5 <+213>: jmpq 0x60040a40 > > End of assembler dump. > > > > For just before the "lock cmpxchg %esi,(%rdi)" in the loop: > > > > rax 0x20000001 536870913 > > rbx 0xf4dede00 4108246528 > > rcx 0x20000001 536870913 > > rdx 0x2 2 > > rsi 0x20000000 536870912 > > rdi 0xf4dede80 4108246656 > > > > (gdb) x 0xf4dede80 > > 0xf4dede80: 0x40000001 > > > > 0x0000000060051c20 <+80>: mov %ecx,%eax > > => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi) > > 0x0000000060051c26 <+86>: sete %bl > > 0x0000000060051c29 <+89>: test %bl,%bl > > 0x0000000060051c2b <+91>: je 0x60051c20 > > > > > > At this point I do not have interpretation of the details, not even > > a comparison to the source code. > > > Note: The problem was observed in building ports head -r488859 's graphics/poppler-qt5 . > > In the below code I've added a missing statement: > > > 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) { > > 1262 /* decrement reader count */ > > 1263 for (;;) { > > 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) { > __get_user(state, &target_urwlock->rw_state); > > 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) { > > 1266 unlock_user_struct(target_urwlock, > > 1267 target_addr, 1); > > 1268 return -TARGET_EPERM; > > (gdb) > > 1269 } > > 1270 } else { > > 1271 break; > > 1272 } > > 1273 } > > The lack of the update to 'state' is why it was valid for the > compiler to eliminate the code for: > > > 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) { > > 1266 unlock_user_struct(target_urwlock, > > 1267 target_addr, 1); > > 1268 return -TARGET_EPERM; > > (gdb) > > 1269 } > > So (without old line numbers): > > } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) { > /* decrement reader count */ > for (;;) { > if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) { > __get_user(state, &target_urwlock->rw_state); > if (TARGET_URWLOCK_READER_COUNT(state) == 0) { > unlock_user_struct(target_urwlock, > target_addr, 1); > return -TARGET_EPERM; > } > } else { > break; > } > } > > This follows the structure of other tcmpset_32 use in the source file. > > With this change poudriere-devel's bulk worked for graphics/poppler-qt5 > as a amd64->armv7 cross-build (FreeBSD head -r341836 based, under Hyper-V, > with 28 logical-processors assigned): > Ah, thanks for that! I think your analysis is correct, and I've created a pull request [1] for Sean. This should fix the apparent hangs reported by many across armv7/aarch64. [1] https://github.com/seanbruno/qemu-bsd-user/pull/72