Date: Thu, 30 Apr 2020 19:41:00 -0700 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: 32-bit powerpc head -r360311 has signal 11 process crashes with rendezvous_request involved (old PowerMac dual-socket, 1 core each) Message-ID: <BE0A8F26-419A-4113-8E81-5AA54D2FE5F4@yahoo.com> In-Reply-To: <D57AB314-48E8-4371-B392-3C591CBC0367@yahoo.com> References: <76D3A482-98BC-44C3-B84D-504A012CA8D8@yahoo.com> <D57AB314-48E8-4371-B392-3C591CBC0367@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[After some time dhclient and sendmail did have the problem despite the debug-kernel being in use.] On 2020-Apr-29, at 21:29, Mark Millard <marklmi at yahoo.com> wrote: > [Notes on using the artifact.ci head -3260311 debug-kernel added.] >=20 > On 2020-Apr-29, at 02:27, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> Since updating from head -r359427 based to head -r360311 base >> (my own non-debug builds), various things report segmentation >> faults. It appears that rendezvous_request may always be >> involved. I've not had time (yet) to deal with substituting a >> debug kernel from: >>=20 >> = https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/ >>=20 >> I expect to do so at some point. >>=20 >>=20 >> I give 2 examples below (mountd and rpcbind). Both involve >> rendezvous_request and svc_getreq_common . >>=20 >> First mountd : >>=20 >> # gdb mountd /mountd.core=20 >> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD] >> Copyright (C) 2020 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later = <http://gnu.org/licenses/gpl.html> >> . . . >> Reading symbols from mountd... >> Reading symbols from /usr/lib/debug//usr/sbin/mountd.debug... >> [New LWP 100105] >> Core was generated by `/usr/sbin/mountd -r'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 atomic_load_u (a=3D<optimized out>, mo=3D<optimized out>) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70 >>=20 >> warning: Source file is more recent than executable. >> 70 JEMALLOC_GENERATE_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT) >> (gdb) bt >> #0 atomic_load_u (a=3D<optimized out>, mo=3D<optimized out>) at = /usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70 >> #1 rtree_leaf_elm_szind_read (tsdn=3D<optimized out>, = rtree=3D<optimized out>, elm=3D<optimized out>, dependent=3D<optimized = out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:230 >> #2 rtree_szind_slab_read (tsdn=3D0x50094018, rtree=3D<optimized = out>, rtree_ctx=3D0x50094044, key=3D1869636193, dependent=3Dtrue, = r_szind=3D<optimized out>, r_slab=3D<optimized out>) >> at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:504 >> #3 ifree (tsd=3D<optimized out>, ptr=3D0x6f706261, tcache=3D<optimized= out>, slow_path=3Dfalse) at jemalloc_jemalloc.c:2574 >> #4 __je_free_default (ptr=3D0x6f706261) at jemalloc_jemalloc.c:2775 >> #5 0x50235db0 in __free (ptr=3D0x6f706261) at = jemalloc_jemalloc.c:2852 >> #6 0x5026525c in freenetconfigent (netconfigp=3D0x50049170) at = /usr/src/lib/libc/rpc/getnetconfig.c:540 >> #7 0x50260d8c in __rpc_sockinfo2netid (sip=3D<optimized out>, = netid=3D<optimized out>) at /usr/src/lib/libc/rpc/rpc_generic.c:573 >> #8 0x502521f0 in makefd_xprt (fd=3D10, sendsize=3D9000, = recvsize=3D9000) at /usr/src/lib/libc/rpc/svc_vc.c:270 >> #9 0x50252fa4 in rendezvous_request (xprt=3D0x5007b120, = msg=3D<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:315 >> #10 0x50254588 in svc_getreq_common (fd=3D<optimized out>) at = /usr/src/lib/libc/rpc/svc.c:640 >> #11 0x502543d0 in svc_getreqset (readfds=3D<optimized out>) at = /usr/src/lib/libc/rpc/svc.c:611 >> #12 0x1001434c in main (argc=3D<optimized out>, argv=3D0xffffde3c) at = /usr/src/usr.sbin/mountd/mountd.c:683 >>=20 >> (gdb) disass >> Dump of assembler code for function __je_free_default: >> 0x50235244 <+0>: mflr r0 >> 0x50235248 <+4>: stw r0,4(r1) >> 0x5023524c <+8>: stwu r1,-80(r1) >> 0x50235250 <+12>: stw r30,72(r1) >> 0x50235254 <+16>: stw r21,36(r1) >> 0x50235258 <+20>: stw r22,40(r1) >> 0x5023525c <+24>: stw r23,44(r1) >> 0x50235260 <+28>: stw r24,48(r1) >> 0x50235264 <+32>: stw r25,52(r1) >> 0x50235268 <+36>: stw r26,56(r1) >> 0x5023526c <+40>: stw r27,60(r1) >> 0x50235270 <+44>: stw r28,64(r1) >> 0x50235274 <+48>: stw r29,68(r1) >> 0x50235278 <+52>: bl 0x5023527c <__je_free_default+56> >> 0x5023527c <+56>: mr r28,r3 >> 0x50235280 <+60>: mflr r30 >> 0x50235284 <+64>: addis r30,r30,14 >> 0x50235288 <+68>: addi r30,r30,-20816 >> 0x5023528c <+72>: lwz r4,64(r30) >> 0x50235290 <+76>: lwz r5,6188(r30) >> 0x50235294 <+80>: lwz r4,0(r4) >> 0x50235298 <+84>: stw r4,32(r1) >> 0x5023529c <+88>: lbz r4,0(r5) >> 0x502352a0 <+92>: cmplwi r4,0 >> 0x502352a4 <+96>: bne 0x502353c4 <__je_free_default+384> >> 0x502352a8 <+100>: cmplwi r28,0 >> 0x502352ac <+104>: beq 0x50235378 <__je_free_default+308> >> 0x502352b0 <+108>: addi r3,r30,4332 >> 0x502352b4 <+112>: rlwinm r24,r28,0,0,9 >> 0x502352b8 <+116>: bl 0x502f06b4 = <00000000.plt_pic32.__tls_get_addr> >> 0x502352bc <+120>: rlwinm r26,r28,13,25,28 >> 0x502352c0 <+124>: mr r29,r3 >> 0x502352c4 <+128>: lbz r3,0(r3) >> 0x502352c8 <+132>: cmplwi r3,0 >> 0x502352cc <+136>: bne 0x502353fc <__je_free_default+440> >> 0x502352d0 <+140>: add r25,r29,r26 >> 0x502352d4 <+144>: lwz r3,44(r25) >> 0x502352d8 <+148>: addi r27,r29,44 >> 0x502352dc <+152>: cmplw r3,r24 >> 0x502352e0 <+156>: bne 0x50235578 <__je_free_default+820> >> 0x502352e4 <+160>: lwz r3,48(r25) >> 0x502352e8 <+164>: rlwinm r4,r28,20,22,31 >> 0x502352ec <+168>: mulli r4,r4,12 >> 0x502352f0 <+172>: add r3,r3,r4 >> =3D> 0x502352f4 <+176>: lwz r6,4(r3) >> 0x502352f8 <+180>: lwz r5,4368(r30) >> 0x502352fc <+184>: addi r26,r29,288 >> 0x50235300 <+188>: lbz r4,8(r3) >> 0x50235304 <+192>: rlwinm r3,r6,2,0,29 >> 0x50235308 <+196>: lwz r7,28(r29) >> 0x5023530c <+200>: lwzx r5,r5,r3 >> 0x50235310 <+204>: lwz r8,24(r29) >> 0x50235314 <+208>: andi. r4,r4,1 >> 0x50235318 <+212>: addc r4,r7,r5 >> 0x5023531c <+216>: addze r5,r8 >> 0x50235320 <+220>: stw r5,24(r29) >> 0x50235324 <+224>: stw r4,28(r29) >> 0x50235328 <+228>: ble 0x50235534 <__je_free_default+752> >> . . . >>=20 >> (gdb) info reg >> r0 0x50235c04 1344494596 >> r1 0xffffcfb0 4294954928 >> r2 0x5009b018 1342812184 >> r3 0x2448 9288 >> r4 0x500940ac 1342783660 >> r5 0x2448 9288 >> r6 0x500940cc 1342783692 >> r7 0x1 1 >> r8 0x0 0 >> r9 0x80808080 2155905152 >> r10 0xc 12 >> r11 0x502e3b50 1345207120 >> r12 0x500491a0 1342476704 >> r13 0x0 0 >> r14 0x1 1 >> r15 0x10040000 268697600 >> r16 0x0 0 >> r17 0x10040000 268697600 >> r18 0x2 2 >> r19 0x0 0 >> r20 0x1 1 >> r21 0x5007b164 1342681444 >> r22 0xffffd2dc 4294955740 >> r23 0x80 128 >> r24 0x6f400000 1866465280 >> r25 0x50094080 1342783616 >> r26 0x68 104 >> r27 0x50094044 1342783556 >> r28 0x6f706261 1869636193 >> r29 0x50094018 1342783512 >> r30 0x5031012c 1345388844 >> r31 0x10040000 268697600 >> pc 0x502352f4 0x502352f4 <__je_free_default+176> >> msr <unavailable> >> cr 0x242008a4 606079140 >> lr 0x50235c04 0x50235c04 = <__je_free_default+2496> >> ctr 0x0 0 >> xer 0x0 0 >> fpscr 0x0 0 >> vscr <unavailable> >> vrsave <unavailable> >>=20 >>=20 >> Then rpcbind : >>=20 >> # gdb rpcbind /rpcbind.core=20 >> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD] >> Copyright (C) 2020 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later = <http://gnu.org/licenses/gpl.html> >> . . . >> Reading symbols from rpcbind... >> Reading symbols from /usr/lib/debug//usr/sbin/rpcbind.debug... >> [New LWP 100098] >> Core was generated by `/usr/sbin/rpcbind'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 __xdrrec_setnonblock (xdrs=3D0x500472d8, maxrec=3D9000) at = /usr/src/lib/libc/xdr/xdr_rec.c:607 >> 607 rstrm->nonblock =3D TRUE; >> (gdb) bt >> #0 __xdrrec_setnonblock (xdrs=3D0x500472d8, maxrec=3D9000) at = /usr/src/lib/libc/xdr/xdr_rec.c:607 >> #1 0x502440d4 in rendezvous_request (xprt=3D<optimized out>, = msg=3D<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:348 >> #2 0x50245588 in svc_getreq_common (fd=3D<optimized out>) at = /usr/src/lib/libc/rpc/svc.c:640 >> #3 0x5024580c in svc_getreq_poll (pfdp=3D<optimized out>, = pollretval=3D1) at /usr/src/lib/libc/rpc/svc.c:739 >> #4 0x10018568 in my_svc_run () at = /usr/src/usr.sbin/rpcbind/rpcb_svc_com.c:1167 >> #5 0x10014ad8 in main (argc=3D<optimized out>, argv=3D<optimized = out>) at /usr/src/usr.sbin/rpcbind/rpcbind.c:250 >>=20 >> (gdb) disass >> Dump of assembler code for function __xdrrec_setnonblock: >> 0x50250468 <+0>: lwz r5,12(r3) >> 0x5025046c <+4>: li r6,1 >> 0x50250470 <+8>: cmplwi r4,0 >> =3D> 0x50250474 <+12>: stw r6,64(r5) >> 0x50250478 <+16>: bne 0x50250480 <__xdrrec_setnonblock+24> >> 0x5025047c <+20>: lwz r4,60(r5) >> 0x50250480 <+24>: li r3,1 >> 0x50250484 <+28>: stw r4,92(r5) >> 0x50250488 <+32>: blr >> End of assembler dump. >>=20 >> (gdb) info reg >> r0 0x5c 92 >> r1 0xffffb400 4294947840 >> r2 0x500a1018 1342836760 >> r3 0x500472d8 1342468824 >> r4 0x2328 9000 >> r5 0x2020 8224 >> r6 0x1 1 >> r7 0xffffb364 4294947684 >> r8 0x500472f4 1342468852 >> r9 0x0 0 >> r10 0x20 32 >> r11 0x502d8ea0 1345162912 >> r12 0x24200880 606079104 >> r13 0x0 0 >> r14 0x0 0 >> r15 0xffffbc28 4294949928 >> r16 0x10002848 268445768 >> r17 0x10040000 268697600 >> r18 0x2 2 >> r19 0x0 0 >> r20 0x1 1 >> r21 0x5004c044 1342488644 >> r22 0xffffb63c 4294948412 >> r23 0x80 128 >> r24 0x50048010 1342472208 >> r25 0x14 20 >> r26 0xffffb630 4294948400 >> r27 0x500472d0 1342468816 >> r28 0xe 14 >> r29 0x50047220 1342468640 >> r30 0x5030112c 1345327404 >> r31 0x10040000 268697600 >> pc 0x50250474 0x50250474 = <__xdrrec_setnonblock+12> >> msr <unavailable> >> cr 0x44200080 1142947968 >> lr 0x502440d4 0x502440d4 = <rendezvous_request+584> >> ctr 0x502d8ea0 1345162912 >> xer 0x0 0 >> fpscr 0x0 0 >> vscr <unavailable> >> vrsave <unavailable> >>=20 >>=20 >> dhclient and sendmail have notices of signal 11's >> but I do not find any .core files around for them. >>=20 >> Prior to this upgrade I was having no such problems >> with the 32-bit powerpc PowerMac. >=20 > I substituted the debug-kernel from: >=20 > = https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/kern= el.txz >=20 > and with it there is no evidence so far of the > problem(s) occurring. Since the same world build is > in use in both contexts, it looks like the kernel > is what makes the difference for the problem(s). >=20 > With the debug-kernel avoiding the problem, I've yet > to figure out how to gather evidence. >=20 About 8.5 hours later than the boot completing it got a: kernel: pid 659 (dhclient), jid 0, uid 65: exited on signal 11 About 7 hours later it started getting sendmail messaged similar to: kernel: pid 3722 (sendmail), jid 0, uid 25: exited on signal 11 These seem to be 0.5 hours part. Between the 3rd and 4th is a lock order reversal notice that I've separately submitted to the lists (but that do not yet appear there, last I checked). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BE0A8F26-419A-4113-8E81-5AA54D2FE5F4>