Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Apr 2020 19:41:00 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: 32-bit powerpc head -r360311 has signal 11 process crashes with rendezvous_request involved (old PowerMac dual-socket, 1 core each)
Message-ID:  <BE0A8F26-419A-4113-8E81-5AA54D2FE5F4@yahoo.com>
In-Reply-To: <D57AB314-48E8-4371-B392-3C591CBC0367@yahoo.com>
References:  <76D3A482-98BC-44C3-B84D-504A012CA8D8@yahoo.com> <D57AB314-48E8-4371-B392-3C591CBC0367@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[After some time dhclient and sendmail did have the problem
despite the debug-kernel being in use.]

On 2020-Apr-29, at 21:29, Mark Millard <marklmi at yahoo.com> wrote:

> [Notes on using the artifact.ci head -3260311 debug-kernel added.]
>=20
> On 2020-Apr-29, at 02:27, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> Since updating from head -r359427 based to head -r360311 base
>> (my own non-debug builds), various things report segmentation
>> faults. It appears that rendezvous_request may always be
>> involved. I've not had time (yet) to deal with substituting a
>> debug kernel from:
>>=20
>> =
https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/
>>=20
>> I expect to do so at some point.
>>=20
>>=20
>> I give 2 examples below (mountd and rpcbind). Both involve
>> rendezvous_request and svc_getreq_common .
>>=20
>> First mountd :
>>=20
>> # gdb mountd /mountd.core=20
>> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later =
<http://gnu.org/licenses/gpl.html>;
>> . . .
>> Reading symbols from mountd...
>> Reading symbols from /usr/lib/debug//usr/sbin/mountd.debug...
>> [New LWP 100105]
>> Core was generated by `/usr/sbin/mountd -r'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  atomic_load_u (a=3D<optimized out>, mo=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70
>>=20
>> warning: Source file is more recent than executable.
>> 70	JEMALLOC_GENERATE_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT)
>> (gdb) bt
>> #0  atomic_load_u (a=3D<optimized out>, mo=3D<optimized out>) at =
/usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70
>> #1  rtree_leaf_elm_szind_read (tsdn=3D<optimized out>, =
rtree=3D<optimized out>, elm=3D<optimized out>, dependent=3D<optimized =
out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:230
>> #2  rtree_szind_slab_read (tsdn=3D0x50094018, rtree=3D<optimized =
out>, rtree_ctx=3D0x50094044, key=3D1869636193, dependent=3Dtrue, =
r_szind=3D<optimized out>, r_slab=3D<optimized out>)
>>   at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:504
>> #3  ifree (tsd=3D<optimized out>, ptr=3D0x6f706261, tcache=3D<optimized=
 out>, slow_path=3Dfalse) at jemalloc_jemalloc.c:2574
>> #4  __je_free_default (ptr=3D0x6f706261) at jemalloc_jemalloc.c:2775
>> #5  0x50235db0 in __free (ptr=3D0x6f706261) at =
jemalloc_jemalloc.c:2852
>> #6  0x5026525c in freenetconfigent (netconfigp=3D0x50049170) at =
/usr/src/lib/libc/rpc/getnetconfig.c:540
>> #7  0x50260d8c in __rpc_sockinfo2netid (sip=3D<optimized out>, =
netid=3D<optimized out>) at /usr/src/lib/libc/rpc/rpc_generic.c:573
>> #8  0x502521f0 in makefd_xprt (fd=3D10, sendsize=3D9000, =
recvsize=3D9000) at /usr/src/lib/libc/rpc/svc_vc.c:270
>> #9  0x50252fa4 in rendezvous_request (xprt=3D0x5007b120, =
msg=3D<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:315
>> #10 0x50254588 in svc_getreq_common (fd=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:640
>> #11 0x502543d0 in svc_getreqset (readfds=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:611
>> #12 0x1001434c in main (argc=3D<optimized out>, argv=3D0xffffde3c) at =
/usr/src/usr.sbin/mountd/mountd.c:683
>>=20
>> (gdb) disass
>> Dump of assembler code for function __je_free_default:
>>  0x50235244 <+0>:	mflr    r0
>>  0x50235248 <+4>:	stw     r0,4(r1)
>>  0x5023524c <+8>:	stwu    r1,-80(r1)
>>  0x50235250 <+12>:	stw     r30,72(r1)
>>  0x50235254 <+16>:	stw     r21,36(r1)
>>  0x50235258 <+20>:	stw     r22,40(r1)
>>  0x5023525c <+24>:	stw     r23,44(r1)
>>  0x50235260 <+28>:	stw     r24,48(r1)
>>  0x50235264 <+32>:	stw     r25,52(r1)
>>  0x50235268 <+36>:	stw     r26,56(r1)
>>  0x5023526c <+40>:	stw     r27,60(r1)
>>  0x50235270 <+44>:	stw     r28,64(r1)
>>  0x50235274 <+48>:	stw     r29,68(r1)
>>  0x50235278 <+52>:	bl      0x5023527c <__je_free_default+56>
>>  0x5023527c <+56>:	mr      r28,r3
>>  0x50235280 <+60>:	mflr    r30
>>  0x50235284 <+64>:	addis   r30,r30,14
>>  0x50235288 <+68>:	addi    r30,r30,-20816
>>  0x5023528c <+72>:	lwz     r4,64(r30)
>>  0x50235290 <+76>:	lwz     r5,6188(r30)
>>  0x50235294 <+80>:	lwz     r4,0(r4)
>>  0x50235298 <+84>:	stw     r4,32(r1)
>>  0x5023529c <+88>:	lbz     r4,0(r5)
>>  0x502352a0 <+92>:	cmplwi  r4,0
>>  0x502352a4 <+96>:	bne     0x502353c4 <__je_free_default+384>
>>  0x502352a8 <+100>:	cmplwi  r28,0
>>  0x502352ac <+104>:	beq     0x50235378 <__je_free_default+308>
>>  0x502352b0 <+108>:	addi    r3,r30,4332
>>  0x502352b4 <+112>:	rlwinm  r24,r28,0,0,9
>>  0x502352b8 <+116>:	bl      0x502f06b4 =
<00000000.plt_pic32.__tls_get_addr>
>>  0x502352bc <+120>:	rlwinm  r26,r28,13,25,28
>>  0x502352c0 <+124>:	mr      r29,r3
>>  0x502352c4 <+128>:	lbz     r3,0(r3)
>>  0x502352c8 <+132>:	cmplwi  r3,0
>>  0x502352cc <+136>:	bne     0x502353fc <__je_free_default+440>
>>  0x502352d0 <+140>:	add     r25,r29,r26
>>  0x502352d4 <+144>:	lwz     r3,44(r25)
>>  0x502352d8 <+148>:	addi    r27,r29,44
>>  0x502352dc <+152>:	cmplw   r3,r24
>>  0x502352e0 <+156>:	bne     0x50235578 <__je_free_default+820>
>>  0x502352e4 <+160>:	lwz     r3,48(r25)
>>  0x502352e8 <+164>:	rlwinm  r4,r28,20,22,31
>>  0x502352ec <+168>:	mulli   r4,r4,12
>>  0x502352f0 <+172>:	add     r3,r3,r4
>> =3D> 0x502352f4 <+176>:	lwz     r6,4(r3)
>>  0x502352f8 <+180>:	lwz     r5,4368(r30)
>>  0x502352fc <+184>:	addi    r26,r29,288
>>  0x50235300 <+188>:	lbz     r4,8(r3)
>>  0x50235304 <+192>:	rlwinm  r3,r6,2,0,29
>>  0x50235308 <+196>:	lwz     r7,28(r29)
>>  0x5023530c <+200>:	lwzx    r5,r5,r3
>>  0x50235310 <+204>:	lwz     r8,24(r29)
>>  0x50235314 <+208>:	andi.   r4,r4,1
>>  0x50235318 <+212>:	addc    r4,r7,r5
>>  0x5023531c <+216>:	addze   r5,r8
>>  0x50235320 <+220>:	stw     r5,24(r29)
>>  0x50235324 <+224>:	stw     r4,28(r29)
>>  0x50235328 <+228>:	ble     0x50235534 <__je_free_default+752>
>> . . .
>>=20
>> (gdb) info reg
>> r0             0x50235c04          1344494596
>> r1             0xffffcfb0          4294954928
>> r2             0x5009b018          1342812184
>> r3             0x2448              9288
>> r4             0x500940ac          1342783660
>> r5             0x2448              9288
>> r6             0x500940cc          1342783692
>> r7             0x1                 1
>> r8             0x0                 0
>> r9             0x80808080          2155905152
>> r10            0xc                 12
>> r11            0x502e3b50          1345207120
>> r12            0x500491a0          1342476704
>> r13            0x0                 0
>> r14            0x1                 1
>> r15            0x10040000          268697600
>> r16            0x0                 0
>> r17            0x10040000          268697600
>> r18            0x2                 2
>> r19            0x0                 0
>> r20            0x1                 1
>> r21            0x5007b164          1342681444
>> r22            0xffffd2dc          4294955740
>> r23            0x80                128
>> r24            0x6f400000          1866465280
>> r25            0x50094080          1342783616
>> r26            0x68                104
>> r27            0x50094044          1342783556
>> r28            0x6f706261          1869636193
>> r29            0x50094018          1342783512
>> r30            0x5031012c          1345388844
>> r31            0x10040000          268697600
>> pc             0x502352f4          0x502352f4 <__je_free_default+176>
>> msr            <unavailable>
>> cr             0x242008a4          606079140
>> lr             0x50235c04          0x50235c04 =
<__je_free_default+2496>
>> ctr            0x0                 0
>> xer            0x0                 0
>> fpscr          0x0                 0
>> vscr           <unavailable>
>> vrsave         <unavailable>
>>=20
>>=20
>> Then rpcbind :
>>=20
>> # gdb rpcbind /rpcbind.core=20
>> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later =
<http://gnu.org/licenses/gpl.html>;
>> . . .
>> Reading symbols from rpcbind...
>> Reading symbols from /usr/lib/debug//usr/sbin/rpcbind.debug...
>> [New LWP 100098]
>> Core was generated by `/usr/sbin/rpcbind'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  __xdrrec_setnonblock (xdrs=3D0x500472d8, maxrec=3D9000) at =
/usr/src/lib/libc/xdr/xdr_rec.c:607
>> 607		rstrm->nonblock =3D TRUE;
>> (gdb) bt
>> #0  __xdrrec_setnonblock (xdrs=3D0x500472d8, maxrec=3D9000) at =
/usr/src/lib/libc/xdr/xdr_rec.c:607
>> #1  0x502440d4 in rendezvous_request (xprt=3D<optimized out>, =
msg=3D<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:348
>> #2  0x50245588 in svc_getreq_common (fd=3D<optimized out>) at =
/usr/src/lib/libc/rpc/svc.c:640
>> #3  0x5024580c in svc_getreq_poll (pfdp=3D<optimized out>, =
pollretval=3D1) at /usr/src/lib/libc/rpc/svc.c:739
>> #4  0x10018568 in my_svc_run () at =
/usr/src/usr.sbin/rpcbind/rpcb_svc_com.c:1167
>> #5  0x10014ad8 in main (argc=3D<optimized out>, argv=3D<optimized =
out>) at /usr/src/usr.sbin/rpcbind/rpcbind.c:250
>>=20
>> (gdb) disass
>> Dump of assembler code for function __xdrrec_setnonblock:
>>  0x50250468 <+0>:	lwz     r5,12(r3)
>>  0x5025046c <+4>:	li      r6,1
>>  0x50250470 <+8>:	cmplwi  r4,0
>> =3D> 0x50250474 <+12>:	stw     r6,64(r5)
>>  0x50250478 <+16>:	bne     0x50250480 <__xdrrec_setnonblock+24>
>>  0x5025047c <+20>:	lwz     r4,60(r5)
>>  0x50250480 <+24>:	li      r3,1
>>  0x50250484 <+28>:	stw     r4,92(r5)
>>  0x50250488 <+32>:	blr
>> End of assembler dump.
>>=20
>> (gdb) info reg
>> r0             0x5c                92
>> r1             0xffffb400          4294947840
>> r2             0x500a1018          1342836760
>> r3             0x500472d8          1342468824
>> r4             0x2328              9000
>> r5             0x2020              8224
>> r6             0x1                 1
>> r7             0xffffb364          4294947684
>> r8             0x500472f4          1342468852
>> r9             0x0                 0
>> r10            0x20                32
>> r11            0x502d8ea0          1345162912
>> r12            0x24200880          606079104
>> r13            0x0                 0
>> r14            0x0                 0
>> r15            0xffffbc28          4294949928
>> r16            0x10002848          268445768
>> r17            0x10040000          268697600
>> r18            0x2                 2
>> r19            0x0                 0
>> r20            0x1                 1
>> r21            0x5004c044          1342488644
>> r22            0xffffb63c          4294948412
>> r23            0x80                128
>> r24            0x50048010          1342472208
>> r25            0x14                20
>> r26            0xffffb630          4294948400
>> r27            0x500472d0          1342468816
>> r28            0xe                 14
>> r29            0x50047220          1342468640
>> r30            0x5030112c          1345327404
>> r31            0x10040000          268697600
>> pc             0x50250474          0x50250474 =
<__xdrrec_setnonblock+12>
>> msr            <unavailable>
>> cr             0x44200080          1142947968
>> lr             0x502440d4          0x502440d4 =
<rendezvous_request+584>
>> ctr            0x502d8ea0          1345162912
>> xer            0x0                 0
>> fpscr          0x0                 0
>> vscr           <unavailable>
>> vrsave         <unavailable>
>>=20
>>=20
>> dhclient and sendmail have notices of signal 11's
>> but I do not find any .core files around for them.
>>=20
>> Prior to this upgrade I was having no such problems
>> with the 32-bit powerpc PowerMac.
>=20
> I substituted the debug-kernel from:
>=20
> =
https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/kern=
el.txz
>=20
> and with it there is no evidence so far of the
> problem(s) occurring. Since the same world build is
> in use in both contexts, it looks like the kernel
> is what makes the difference for the problem(s).
>=20
> With the debug-kernel avoiding the problem, I've yet
> to figure out how to gather evidence.
>=20

About 8.5 hours later than the boot completing
it got a:

kernel: pid 659 (dhclient), jid 0, uid 65: exited on signal 11

About 7 hours later it started getting sendmail messaged
similar to:

kernel: pid 3722 (sendmail), jid 0, uid 25: exited on signal 11

These seem to be 0.5 hours part. Between the 3rd and 4th
is a lock order reversal notice that I've separately
submitted to the lists (but that do not yet appear there,
last I checked).

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BE0A8F26-419A-4113-8E81-5AA54D2FE5F4>