Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jun 2019 23:29:49 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: crash of 32-bit powerpc -r347549 kernel built via system-clang-8 (crash is while trying to mount the root file system) [debug kernel case: code generation error] [I was wrong]
Message-ID:  <EBB778D2-8005-4514-BC74-2F5B70EA677E@yahoo.com>
In-Reply-To: <DD895640-0487-45F5-9D88-C0CD3CD7CF9D@yahoo.com>
References:  <45D010BF-7654-43A6-8FF4-CCDEEF4004F6@yahoo.com> <4354EA25-69C2-4CAB-8273-62457333BD30@yahoo.com> <995DA649-9390-420B-AC95-FFD17079CDA9@yahoo.com> <DD895640-0487-45F5-9D88-C0CD3CD7CF9D@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I misanalysed the code. Sorry for the noise.]

On 2019-Jun-5, at 14:17, Mark Millard <marklmi at yahoo.com> wrote:

> [This is from my experiments with more modern toolchains than
> normally/offocially used, specifically for 32-bit powerpc this
> time.]
>=20
> On 2019-Jun-5, at 01:35, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2019-Jun-3, at 19:40, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> On 2019-Jun-3, at 17:24, Mark Millard <marklmi at yahoo.com> wrote:
>>>=20
>>>> I tried (cross) building a 32-bit powerpc kernel and world =
(non-debug)=20
>>>> with system-clang (on amd64) and use of devel/powerpc64-binutils . =
The
>>>> installed kernel panics trying to mount the root file system.
>>>>=20
>>>> FYI: Typed from picture of screen . . .
>>>>=20
>>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs =
[rw,noatime]...
>>>> panic: getnewbuf_empty: Locked buf 0xd2800000 on free queue.
>>>> . . .
>>>> 0xd6919080: at kdb_backtrace+0x64
>>>> 0xd69190e0: at vpanic+0x200
>>>> 0xd6919150: at panic+0x50
>>>> 0xd6919190: at getnewbuf+0x594
>>>> 0xd69191f0: at getblkx+0x540
>>>> 0xd69192a0: at breadn_flags+0x90
>>>> 0xd69192f0: at ffs_use_bread+0x9c
>>>> 0xd6919330: at readsuper+0x68
>>>> 0xd6919370: at ffs_sbget+0xcc
>>>> 0xd69193c0: at ffs_mount+0x18b8
>>>> 0xd69194f0: at vfs_domount+0xa74
>>>> 0xd69196a0: at vfs_donmount+0x944
>>>> 0xd6919700: at kernel_mount+0x64
>>>> 0xd6919740: at parse_mount+0x52c
>>>> 0xd6919840: at vfs_mountroot+0x71c
>>>> 0xd69199b0: at start_init+0x44
>>>> 0xd6919a10: at fork_exit_0xcc
>>>> 0xd6919a40: at fork_trampoline+0xc
>>>> KDB: enter panic
>>>> [ thread pid 1 tid 100002 ]
>>>> Stopped at kdb_enter+0x74: addi r3,r0,0x0
>>>>=20
>>>> This reproduces with each boot attempt.
>>>>=20
>>>> Replacing the kernel with one built via gcc 4.2.1 and booting
>>>> the result does not panic.
>>>>=20
>>>>=20
>>>> FYI for the context of the panic call:
>>>>=20
>>>> /usr/src/sys/kern/vfs_bio.c :
>>>>=20
>>>> static struct buf *
>>>> buf_alloc(struct bufdomain *bd)
>>>> {
>>>>     struct buf *bp;
>>>>     int freebufs;
>>>>=20
>>>>     /*
>>>>      * We can only run out of bufs in the buf zone if the average =
buf
>>>>      * is less than BKVASIZE.  In this case the actual wait/block =
will
>>>>      * come from buf_reycle() failing to flush one of these small =
bufs.
>>>>      */
>>>>     bp =3D NULL;
>>>>     freebufs =3D atomic_fetchadd_int(&bd->bd_freebuffers, -1);
>>>>     if (freebufs > 0)
>>>>             bp =3D uma_zalloc(buf_zone, M_NOWAIT);
>>>>     if (bp =3D=3D NULL) {
>>>>             atomic_add_int(&bd->bd_freebuffers, 1);
>>>>             bufspace_daemon_wakeup(bd);
>>>>             counter_u64_add(numbufallocfails, 1);
>>>>             return (NULL);
>>>>     }
>>>>     /*
>>>>      * Wake-up the bufspace daemon on transition below threshold.
>>>>      */
>>>>     if (freebufs =3D=3D bd->bd_lofreebuffers)
>>>>             bufspace_daemon_wakeup(bd);
>>>>=20
>>>>     if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) !=3D 0)
>>>>             panic("getnewbuf_empty: Locked buf %p on free queue.", =
bp);
>>>=20
>>>=20
>>> I tried making a debug kernel build via system-clang-8. It
>>> reports differently but still during getnewbuf being active
>>> on the stack (again typed from a picture):
>>>=20
>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]...
>>> . . . (ignore witness/diagnostic warnings) . . .
>>> panic: bq_remove: Locked buf 0xd2a00000 not on a queue.
>>> . . .
>>> 0xd6b7bfd0: at kdb_backtrace+0x64
>>> 0xd6b7c030: at vpanic+0x200
>>> 0xd6b7c0a0: at panic+0x50
>>> 0xd6b7c0e0: at bq_remove+01e0
>>> 0xd6b7c100: at buf_import+0x8c
>>> 0xd6b7c130: at uma_zalloc_arg+0x544
>>> 0xd6b7c190: at getnewbuf+0x380
>>> 0xd6b7c1f0: at getblkx+0x620
>>> 0xd6b7c290: at breadn_flags+0x90
>>> 0xd6b7c2e0: at ffs_use_bread+0xa8
>>> 0xd6b7c320: at readsuper+0x68
>>> 0xd6b7c360: at ffs_sbget+0xcc
>>> 0xd6b7c3b0: at ffs_mount+0xefc
>>> 0xd6b7c4e0: at vfs_domount+0xa754
>>> 0xd6b7c690: at vfs_donmount+0x78c
>>> 0xd6b7c6f0: at kernel_mount+0x7c
>>> 0xd6b7c730: at parse_mount+0x52c
>>> 0xd6b7c830: at vfs_mountroot+0x660
>>> 0xd6b7c9a0: at start_init+0x4c
>>> 0xd6b7ca10: at fork_exit_0xb0
>>> 0xd6b7ca40: at fork_trampoline+0xc
>>>=20
>>> /usr/src/sys/kern/vfs_bio.c :
>>>=20
>>> static void
>>> bq_remove(struct bufqueue *bq, struct buf *bp)
>>> {
>>>=20
>>>      CTR3(KTR_BUF, "bq_remove(%p) vp %p flags %X",
>>>          bp, bp->b_vp, bp->b_flags);
>>>      KASSERT(bp->b_qindex !=3D QUEUE_NONE,
>>>          ("bq_remove: buffer %p not on a queue.", bp));
>>> . . .
>>>=20
>>> For reference:
>>>=20
>>> static int
>>> buf_import(void *arg, void **store, int cnt, int domain, int flags)
>>> {
>>>      struct buf *bp;
>>>      int i;
>>>=20
>>>      BQ_LOCK(&bqempty);
>>>      for (i =3D 0; i < cnt; i++) {
>>>              bp =3D TAILQ_FIRST(&bqempty.bq_queue);
>>>              if (bp =3D=3D NULL)
>>>                      break;
>>>              bq_remove(&bqempty, bp);
>>>              store[i] =3D bp;
>>>      }
>>>      BQ_UNLOCK(&bqempty);
>>>=20
>>>      return (i);
>>> }
>>>=20
>>>=20
>>=20
>> I tried building the debug kernel with KTR for KTR_BUF.
>> Installing and booting the result did not panic. Manually
>> forcing getting to ddb> soon enough and doing "show ktr"
>> did show a bq_remove for 0xd2a00000 (and later activity).
>>=20
>> =46rom the looks of the KTR_BUF CTRn's, this suggests to me
>> that the access to bp->qindex in bq_remove is racy in
>> some way vs. updates to the value.
>=20
> The code produced by clang for the debug kernel, KTR
> off in this case, for:
>=20
>      KASSERT(bp->b_qindex !=3D QUEUE_NONE,
>          ("bq_remove: buffer %p not on a queue.", bp));
>=20
> is wrong [the 84(r29) accesses bp->b_qindex]:
>=20
> . . .
> 00618aa8 <bq_remove+0x34> lbz     r5,84(r29)
> 00618aac <bq_remove+0x38> cmplwi  r5,4
> 00618ab0 <bq_remove+0x3c> bgt-    00618c10 <bq_remove+0x19c>
> . . .
> 00618c10 <bq_remove+0x19c> lwz     r3,-32364(r30)
> 00618c14 <bq_remove+0x1a0> crclr   4*cr1+eq
> 00618c18 <bq_remove+0x1a4> mr      r4,r29
> 00618c1c <bq_remove+0x1a8> bl      00541ca0 <panic>
> . . .
>=20
> Comparing against 4 does not match any part of
> bq_remove. Comparison via gt would make sense for:

Wrong.

The 4 and gt use comes from inlining bufqueue(bp)
in the following KASSERT. For reference (from the .i):

bufqueue(struct buf *bp)
{

 switch (bp->b_qindex) {
 case 0:

 case 4:
  return (((void *)0));
 case 1:
  return (&bqempty);
 case 2:
  return (&bufdomain(bp)->bd_dirtyq);
 case 3:
  return (&bufdomain(bp)->bd_subq[bp->b_subqueue]);
 default:
  break;
 }
 panic("bufqueue(%p): Unhandled type %d\n", bp, bp->b_qindex);
}

The code generation put the first KASSERT's related panic in
to case 0 above when bufqueue was inlined.

> /usr/src/sys/sys/buf.h: uint8_t         b_qindex;       /* (Q) buffer =
queue index */)
>=20
> if the comparison was against zero. It should
> have been:
>=20
> /usr/src/sys/kern/vfs_bio.c:#define QUEUE_NONE  0       /* on no queue =
*/
>=20
>=20
> This is for a head -r347549 32-bit powerpc FreeBSD context,
> built with system clang (an amd6->powerpc cross build using
> devel/powerpc64-binutils ).



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EBB778D2-8005-4514-BC74-2F5B70EA677E>