Date: Wed, 5 Jun 2019 23:29:49 -0700 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org> Subject: Re: crash of 32-bit powerpc -r347549 kernel built via system-clang-8 (crash is while trying to mount the root file system) [debug kernel case: code generation error] [I was wrong] Message-ID: <EBB778D2-8005-4514-BC74-2F5B70EA677E@yahoo.com> In-Reply-To: <DD895640-0487-45F5-9D88-C0CD3CD7CF9D@yahoo.com> References: <45D010BF-7654-43A6-8FF4-CCDEEF4004F6@yahoo.com> <4354EA25-69C2-4CAB-8273-62457333BD30@yahoo.com> <995DA649-9390-420B-AC95-FFD17079CDA9@yahoo.com> <DD895640-0487-45F5-9D88-C0CD3CD7CF9D@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I misanalysed the code. Sorry for the noise.] On 2019-Jun-5, at 14:17, Mark Millard <marklmi at yahoo.com> wrote: > [This is from my experiments with more modern toolchains than > normally/offocially used, specifically for 32-bit powerpc this > time.] >=20 > On 2019-Jun-5, at 01:35, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> On 2019-Jun-3, at 19:40, Mark Millard <marklmi at yahoo.com> wrote: >>=20 >>> On 2019-Jun-3, at 17:24, Mark Millard <marklmi at yahoo.com> wrote: >>>=20 >>>> I tried (cross) building a 32-bit powerpc kernel and world = (non-debug)=20 >>>> with system-clang (on amd64) and use of devel/powerpc64-binutils . = The >>>> installed kernel panics trying to mount the root file system. >>>>=20 >>>> FYI: Typed from picture of screen . . . >>>>=20 >>>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs = [rw,noatime]... >>>> panic: getnewbuf_empty: Locked buf 0xd2800000 on free queue. >>>> . . . >>>> 0xd6919080: at kdb_backtrace+0x64 >>>> 0xd69190e0: at vpanic+0x200 >>>> 0xd6919150: at panic+0x50 >>>> 0xd6919190: at getnewbuf+0x594 >>>> 0xd69191f0: at getblkx+0x540 >>>> 0xd69192a0: at breadn_flags+0x90 >>>> 0xd69192f0: at ffs_use_bread+0x9c >>>> 0xd6919330: at readsuper+0x68 >>>> 0xd6919370: at ffs_sbget+0xcc >>>> 0xd69193c0: at ffs_mount+0x18b8 >>>> 0xd69194f0: at vfs_domount+0xa74 >>>> 0xd69196a0: at vfs_donmount+0x944 >>>> 0xd6919700: at kernel_mount+0x64 >>>> 0xd6919740: at parse_mount+0x52c >>>> 0xd6919840: at vfs_mountroot+0x71c >>>> 0xd69199b0: at start_init+0x44 >>>> 0xd6919a10: at fork_exit_0xcc >>>> 0xd6919a40: at fork_trampoline+0xc >>>> KDB: enter panic >>>> [ thread pid 1 tid 100002 ] >>>> Stopped at kdb_enter+0x74: addi r3,r0,0x0 >>>>=20 >>>> This reproduces with each boot attempt. >>>>=20 >>>> Replacing the kernel with one built via gcc 4.2.1 and booting >>>> the result does not panic. >>>>=20 >>>>=20 >>>> FYI for the context of the panic call: >>>>=20 >>>> /usr/src/sys/kern/vfs_bio.c : >>>>=20 >>>> static struct buf * >>>> buf_alloc(struct bufdomain *bd) >>>> { >>>> struct buf *bp; >>>> int freebufs; >>>>=20 >>>> /* >>>> * We can only run out of bufs in the buf zone if the average = buf >>>> * is less than BKVASIZE. In this case the actual wait/block = will >>>> * come from buf_reycle() failing to flush one of these small = bufs. >>>> */ >>>> bp =3D NULL; >>>> freebufs =3D atomic_fetchadd_int(&bd->bd_freebuffers, -1); >>>> if (freebufs > 0) >>>> bp =3D uma_zalloc(buf_zone, M_NOWAIT); >>>> if (bp =3D=3D NULL) { >>>> atomic_add_int(&bd->bd_freebuffers, 1); >>>> bufspace_daemon_wakeup(bd); >>>> counter_u64_add(numbufallocfails, 1); >>>> return (NULL); >>>> } >>>> /* >>>> * Wake-up the bufspace daemon on transition below threshold. >>>> */ >>>> if (freebufs =3D=3D bd->bd_lofreebuffers) >>>> bufspace_daemon_wakeup(bd); >>>>=20 >>>> if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) !=3D 0) >>>> panic("getnewbuf_empty: Locked buf %p on free queue.", = bp); >>>=20 >>>=20 >>> I tried making a debug kernel build via system-clang-8. It >>> reports differently but still during getnewbuf being active >>> on the stack (again typed from a picture): >>>=20 >>> Trying to mount root from ufs:/dev/ufs/FBSDG4Srootfs [rw,noatime]... >>> . . . (ignore witness/diagnostic warnings) . . . >>> panic: bq_remove: Locked buf 0xd2a00000 not on a queue. >>> . . . >>> 0xd6b7bfd0: at kdb_backtrace+0x64 >>> 0xd6b7c030: at vpanic+0x200 >>> 0xd6b7c0a0: at panic+0x50 >>> 0xd6b7c0e0: at bq_remove+01e0 >>> 0xd6b7c100: at buf_import+0x8c >>> 0xd6b7c130: at uma_zalloc_arg+0x544 >>> 0xd6b7c190: at getnewbuf+0x380 >>> 0xd6b7c1f0: at getblkx+0x620 >>> 0xd6b7c290: at breadn_flags+0x90 >>> 0xd6b7c2e0: at ffs_use_bread+0xa8 >>> 0xd6b7c320: at readsuper+0x68 >>> 0xd6b7c360: at ffs_sbget+0xcc >>> 0xd6b7c3b0: at ffs_mount+0xefc >>> 0xd6b7c4e0: at vfs_domount+0xa754 >>> 0xd6b7c690: at vfs_donmount+0x78c >>> 0xd6b7c6f0: at kernel_mount+0x7c >>> 0xd6b7c730: at parse_mount+0x52c >>> 0xd6b7c830: at vfs_mountroot+0x660 >>> 0xd6b7c9a0: at start_init+0x4c >>> 0xd6b7ca10: at fork_exit_0xb0 >>> 0xd6b7ca40: at fork_trampoline+0xc >>>=20 >>> /usr/src/sys/kern/vfs_bio.c : >>>=20 >>> static void >>> bq_remove(struct bufqueue *bq, struct buf *bp) >>> { >>>=20 >>> CTR3(KTR_BUF, "bq_remove(%p) vp %p flags %X", >>> bp, bp->b_vp, bp->b_flags); >>> KASSERT(bp->b_qindex !=3D QUEUE_NONE, >>> ("bq_remove: buffer %p not on a queue.", bp)); >>> . . . >>>=20 >>> For reference: >>>=20 >>> static int >>> buf_import(void *arg, void **store, int cnt, int domain, int flags) >>> { >>> struct buf *bp; >>> int i; >>>=20 >>> BQ_LOCK(&bqempty); >>> for (i =3D 0; i < cnt; i++) { >>> bp =3D TAILQ_FIRST(&bqempty.bq_queue); >>> if (bp =3D=3D NULL) >>> break; >>> bq_remove(&bqempty, bp); >>> store[i] =3D bp; >>> } >>> BQ_UNLOCK(&bqempty); >>>=20 >>> return (i); >>> } >>>=20 >>>=20 >>=20 >> I tried building the debug kernel with KTR for KTR_BUF. >> Installing and booting the result did not panic. Manually >> forcing getting to ddb> soon enough and doing "show ktr" >> did show a bq_remove for 0xd2a00000 (and later activity). >>=20 >> =46rom the looks of the KTR_BUF CTRn's, this suggests to me >> that the access to bp->qindex in bq_remove is racy in >> some way vs. updates to the value. >=20 > The code produced by clang for the debug kernel, KTR > off in this case, for: >=20 > KASSERT(bp->b_qindex !=3D QUEUE_NONE, > ("bq_remove: buffer %p not on a queue.", bp)); >=20 > is wrong [the 84(r29) accesses bp->b_qindex]: >=20 > . . . > 00618aa8 <bq_remove+0x34> lbz r5,84(r29) > 00618aac <bq_remove+0x38> cmplwi r5,4 > 00618ab0 <bq_remove+0x3c> bgt- 00618c10 <bq_remove+0x19c> > . . . > 00618c10 <bq_remove+0x19c> lwz r3,-32364(r30) > 00618c14 <bq_remove+0x1a0> crclr 4*cr1+eq > 00618c18 <bq_remove+0x1a4> mr r4,r29 > 00618c1c <bq_remove+0x1a8> bl 00541ca0 <panic> > . . . >=20 > Comparing against 4 does not match any part of > bq_remove. Comparison via gt would make sense for: Wrong. The 4 and gt use comes from inlining bufqueue(bp) in the following KASSERT. For reference (from the .i): bufqueue(struct buf *bp) { switch (bp->b_qindex) { case 0: case 4: return (((void *)0)); case 1: return (&bqempty); case 2: return (&bufdomain(bp)->bd_dirtyq); case 3: return (&bufdomain(bp)->bd_subq[bp->b_subqueue]); default: break; } panic("bufqueue(%p): Unhandled type %d\n", bp, bp->b_qindex); } The code generation put the first KASSERT's related panic in to case 0 above when bufqueue was inlined. > /usr/src/sys/sys/buf.h: uint8_t b_qindex; /* (Q) buffer = queue index */) >=20 > if the comparison was against zero. It should > have been: >=20 > /usr/src/sys/kern/vfs_bio.c:#define QUEUE_NONE 0 /* on no queue = */ >=20 >=20 > This is for a head -r347549 32-bit powerpc FreeBSD context, > built with system clang (an amd6->powerpc cross build using > devel/powerpc64-binutils ). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EBB778D2-8005-4514-BC74-2F5B70EA677E>