Date: Wed, 27 Nov 2019 12:45:59 -0800 From: Mark Millard <marklmi@yahoo.com> To: freebsd-arm@freebsd.org Subject: Re: After more than 59 hr 20 min of poudreire based port building, the Rock64 (4 GiByte) got a data_abort with a panic message that mentioned "vm_fault failed" (cmd in dma_done was NULL, making cmd->data fail). Message-ID: <AEA74194-7652-40F1-A340-3DD59A250C3D@yahoo.com> In-Reply-To: <F337577B-3ED5-4B72-AB02-2FB10FDB7600@yahoo.com> References: <F337577B-3ED5-4B72-AB02-2FB10FDB7600.ref@yahoo.com> <F337577B-3ED5-4B72-AB02-2FB10FDB7600@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-Nov-27, at 09:31, Mark Millard <marklmi at yahoo.com> wrote: > The failure was while dwmmc_intr was active on the bus. It looks > like the vm_fault failed address matches the elr value, which is > near the lr value and near the "pc =3D" value listed for dwmmc_intr. > (Back trace shown later.) I should have mentioned that the system was running a non-debug build (with symbols). Looks like "cmd" was zero (NULL) in: 766 static int 767 dma_done(struct dwmmc_softc *sc, struct mmc_command *cmd) 768 { 769 struct mmc_data *data; 771 data =3D cmd->data; 0xffff00000078e51c <+648>: ldr x8, [x23, #40] for the use of dma_done in dwmmc_intr that is shown below: . . . cmd =3D sc->curcmd; . . . /* Ack interrupts */ WRITE4(sc, SDMMC_RINTSTS, reg); if (sc->use_pio) { if (reg & (SDMMC_INTMASK_RXDR|SDMMC_INTMASK_DTO)) { pio_read(sc, cmd); } if (reg & (SDMMC_INTMASK_TXDR|SDMMC_INTMASK_DTO)) { pio_write(sc, cmd); } } else { /* Now handle DMA interrupts */ reg =3D READ4(sc, SDMMC_IDSTS); if (reg) { dprintf("dma intr 0x%08x\n", reg); if (reg & (SDMMC_IDINTEN_TI | SDMMC_IDINTEN_RI)) = { WRITE4(sc, SDMMC_IDSTS, = (SDMMC_IDINTEN_TI | = SDMMC_IDINTEN_RI)); WRITE4(sc, SDMMC_IDSTS, = SDMMC_IDINTEN_NI); dma_done(sc, cmd); } } } . . . Unfortunately, I did not get a dump. > This is a head -r355027 based context. >=20 > This does not look easy to reproduce. >=20 > I had poudriere running 4 jobs, each allowed to use 4 processes, > so the bulk of the time the load average was between 8 and 17. >=20 > The last top update (of my extended top) showed top never saw > significant swap usage: >=20 > Swap: 4608M Total, 22M Used, 4586M Free, 32M MaxObsUsed >=20 > ("MaxObs" is short for "Maximum Observed".) >=20 > It also showed (line wrapped by me): >=20 > Mem: 196M Active, 1078M Inact, 4272K Laundry, 650M Wired, 264M Buf, > 2035M Free, 2517M MaxObsActive, 805M MaxObsWired, 3219M = MaxObs(Act+Wir) >=20 > It showed as running: >=20 > /usr/local/sbin/pkg-static create -r = /wrkdirs/usr/ports/devel/llvm90/work/stage . . . > (earlier llvm80 had completed fine) >=20 > and 3 of processes the form: >=20 > cpdup -i0 -x ref0? >=20 > Those 3 seem to be for the 3 "Building"s listed below: >=20 > [59:20:56] [02] [00:14:53] Finished devel/qt5-linguist | = qt5-linguist-5.13.2: Success > [59:20:57] [02] [00:00:00] Building deskutils/lumina-archiver | = lumina-archiver-1.5.0 > [59:20:57] [03] [00:00:00] Building deskutils/lumina-calculator | = lumina-calculator-1.5.0 > [59:20:57] [04] [00:00:00] Building x11/lumina-core | = lumina-core-1.5.0 >=20 >=20 > The serial console's report was: >=20 > Fatal data abort: > x0: fffffd0000b45b00 > x1: ffff000040588000 > x2: 8c > x3: 100 > x4: ffff00004035caa0 > x5: ffff00004035c7b0 > x6: 0 > x7: 1 > x8: ffff000000758ebc > x9: ffff000000a33100 > x10: fffffd0000a28678 > x11: 0 > x12: 9633b10b > x13: 2af8 > x14: 2777 > x15: 2af8 > x16: 38 > x17: 38 > x18: ffff00004035c870 > x19: fffffd0000a28600 > x20: 8c > x21: fffffd0000b45e58 > x22: ffff000000a4b000 > x23: 0 > x24: fffffd0000b45e10 > x25: fffffd0000b89514 > x26: fffffd0000b8f180 > x27: fffffd0000b45e00 > x28: ffff000000a4bd98 > x29: ffff00004035c8b0 > sp: ffff00004035c870 > lr: ffff00000078e518 > elr: ffff00000078e51c > spsr: 145 > far: 28 > esr: 96000005 > panic: vm_fault failed: ffff00000078e51c > cpuid =3D 2 > time =3D 1574872496 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x28 > pc =3D 0xffff00000075ba9c lr =3D 0xffff0000001066a8 > sp =3D 0xffff00004035c270 fp =3D 0xffff00004035c480 >=20 > db_trace_self_wrapper() at vpanic+0x18c > pc =3D 0xffff0000001066a8 lr =3D 0xffff00000041903c > sp =3D 0xffff00004035c490 fp =3D 0xffff00004035c530 >=20 > vpanic() at panic+0x44 > pc =3D 0xffff00000041903c lr =3D 0xffff000000418eac > sp =3D 0xffff00004035c540 fp =3D 0xffff00004035c5c0 >=20 > panic() at data_abort+0x1e0 > pc =3D 0xffff000000418eac lr =3D 0xffff000000777d94 > sp =3D 0xffff00004035c5d0 fp =3D 0xffff00004035c680 >=20 > data_abort() at do_el1h_sync+0x144 > pc =3D 0xffff000000777d94 lr =3D 0xffff000000776fb0 > sp =3D 0xffff00004035c690 fp =3D 0xffff00004035c6c0 >=20 > do_el1h_sync() at handle_el1h_sync+0x78 > pc =3D 0xffff000000776fb0 lr =3D 0xffff00000075e078 > sp =3D 0xffff00004035c6d0 fp =3D 0xffff00004035c7e0 >=20 > handle_el1h_sync() at dwmmc_intr+0x280 > pc =3D 0xffff00000075e078 lr =3D 0xffff00000078e514 > sp =3D 0xffff00004035c7f0 fp =3D 0xffff00004035c8b0 >=20 > dwmmc_intr() at ithread_loop+0x1f4 > pc =3D 0xffff00000078e514 lr =3D 0xffff0000003db604 > sp =3D 0xffff00004035c8c0 fp =3D 0xffff00004035c940 >=20 > ithread_loop() at fork_exit+0x90 > pc =3D 0xffff0000003db604 lr =3D 0xffff0000003d7be4 > sp =3D 0xffff00004035c950 fp =3D 0xffff00004035c980 >=20 > fork_exit() at fork_trampoline+0x10 > pc =3D 0xffff0000003d7be4 lr =3D 0xffff000000776cec > sp =3D 0xffff00004035c990 fp =3D 0x0000000000000000 >=20 > KDB: enter: panic > [ thread pid 12 tid 100038 ] > Stopped at dwmmc_intr+0x288: ldr x8, [x23, #40] > db>=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AEA74194-7652-40F1-A340-3DD59A250C3D>