Date: Sun, 13 Nov 2022 14:48:02 -0800 From: Christopher Bowman <crb@chrisbowman.com> To: Mark Johnston <markj@freebsd.org> Cc: hackers@freebsd.org Subject: Re: I could use some help Message-ID: <5C07A058-977C-4F2E-8B41-01EBAB4FF24B@chrisbowman.com> In-Reply-To: <Y3FDfpKWKjyHaWAK@nuc> References: <ED4D59FC-DE1B-4591-AE51-5AA61B5647A6@chrisbowman.com> <Y3FDfpKWKjyHaWAK@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Nov 13, 2022, at 11:20 AM, Mark Johnston <markj@freebsd.org> wrote: >=20 > On Wed, Nov 09, 2022 at 10:18:02PM -0800, Christopher Bowman wrote: >> OK I=E2=80=99m really confused and I could use some help: >>=20 >> 13.0 runs fine on my Xilinx Zynq based board (DIgilent Arty Z20). = However when I compile 13.1 release it doesn=E2=80=99t boot. The kernel = stops during boot as follows: >>=20 >> Using DTB from loaded file '/boot/dtb/zynq-artyz7.dtb'. >>=20 >>=20 >> Loading DTB overlays: 'artyz7_ssd_overlay.dtb' >>=20 >>=20 >> /boot/dtb/overlays/artyz7_ssd_overlay.dtb size=3D0x1a1 >>=20 >>=20 >> Kernel entry at 0x17a00200... >>=20 >>=20 >> Kernel args: (null) >>=20 >>=20 >> applying DTB overlay '/boot/dtb/overlays/artyz7_ssd_overlay.dtb' >>=20 >>=20 >> WARNING: Cannot find freebsd,dts-version property, cannot check DTB = compliance >> ---<<BOOT>>--- >> Copyright (c) 1992-2021 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, = 1994 >> .The Regents of the University of California. All rights reserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 13.0-STABLE #22 n248064-ce9c3848ff3: Wed Nov 9 22:04:45 PST = 2022 >> crb@eclipse.ChrisBowman.com:/usr/obj/usr/src/arm.armv7/sys/ARTYZ7 = arm >> FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git = llvmorg-11.0.1-0-g43ff75f2c3fe) >> CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000) >> CPU Features:=20 >> Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk >> Optional instructions:=20 >> UMULL, SMULL, SIMD(ext) >> LoUU:2 LoC:2 LoUIS:2=20 >> Cache level 1: >> 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc >> 32KB/32B 4-way instruction cache Read-Alloc >> real memory =3D 536866816 (511 MB) >> avail memory =3D 515162112 (491 MB) >> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs >> random: unblocking device. >> random: entropy device external interface >> ofwbus0: <Open Firmware Device Tree> >> simplebus0: <Flattened device tree simple bus> on ofwbus0 >> simplebus1: <Flattened device tree simple bus> on ofwbus0 >> l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 = on simplebus1 >> l2cache0: cannot allocate IRQ, not using interrupt >> l2cache0: Part number: 0x3, release: 0x8 >> l2cache0: L2 Cache enabled: 512KB/32B 8 ways >> gic0: <ARM Generic Interrupt Controller> mem = 0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus1 >> gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96 >> mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on = simplebus1 >> Timecounter "MPCore" frequency 50000000 Hz quality 800 >> mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on = simplebus1 >> Event timer "MPCore" frequency 50000000 Hz quality 1000 >> cpulist0: <Open Firmware CPU Group> on ofwbus0 >> cpu0: <Open Firmware CPU> on cpulist0 >> cpu1: <Open Firmware CPU> on cpulist0 >> uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus1 >> uart0: console (-1,n,8,1) >> zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff = irq 13 on simplebus1 >> zy7_qspi0: must have ref-clock property >> device_attach: zy7_qspi0 attach returned 6 >> cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem = 0xe000b000-0xe000bfff irq 15 on simplebus1 >> miibus0: <MII bus> on cgem0 >> rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on = miibus0 >> rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, = 1000baseT-FDX, 1000baseT-FDX-master, auto >> rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on = miibus0 >> rgephy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, = 1000baseT-FDX, 1000baseT-FDX-master, auto >> cgem0: Ethernet address: 56:99:3e:50:9a:8e >> sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem = 0xe0100000-0xe0100fff irq 17 on simplebus1 >> sdhci_fdt0: 1 slot(s) allocated >> mmc0: <MMC/SD bus> on sdhci_fdt0 >> zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on = simplebus1 >> Timecounters tick every 1.000 msec >> sdhci_fdt0-slot0: Controller timeout >> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER = DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> sdhci_fdt0-slot0: Sys addr: 0x00060000 | Version: 0x00008901 >> sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt: 0x00000001 >> sdhci_fdt0-slot0: Argument: 0x00000000 | Trn mode: 0x00000013 >> sdhci_fdt0-slot0: Present: 0x01ff0202 | Host ctl: 0x00000001 >> sdhci_fdt0-slot0: Power: 0x0000000f | Blk gap: 0x00000000 >> sdhci_fdt0-slot0: Wake-up: 0x00000000 | Clock: 0x00004007 >> sdhci_fdt0-slot0: Timeout: 0x00000006 | Int stat: 0x00000001 >> sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa >> sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 >> sdhci_fdt0-slot0: Caps: 0x69ec0080 | Caps2: 0x00000000 >> sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000 >> sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 >> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= >> sdhci_fdt0-slot0: Controller timeout >> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER = DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> sdhci_fdt0-slot0: Sys addr: 0x00000000 | Version: 0x00008901 >> sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt: 0x00000001 >> sdhci_fdt0-slot0: Argument: 0xaaaa0000 | Trn mode: 0x00000013 >> sdhci_fdt0-slot0: Present: 0x01ff0000 | Host ctl: 0x00000001 >> sdhci_fdt0-slot0: Power: 0x0000000f | Blk gap: 0x00000000 >> sdhci_fdt0-slot0: Wake-up: 0x00000000 | Clock: 0x00004007 >> sdhci_fdt0-slot0: Timeout: 0x00000006 | Int stat: 0x00000001 >> sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa >> sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 >> sdhci_fdt0-slot0: Caps: 0x69ec0080 | Caps2: 0x00000000 >> sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000 >> sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 >> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= >>=20 >> The controller register dumps continue for a while and then I get: >>=20 >> mmc0: CMD7 failed, RESULT: 1 >> Release APs >> Trying to mount root from ufs:/dev/mmcsd0s2a [rw]... >> mountroot: waiting for device /dev/mmcsd0s2a... >> Mounting from ufs:/dev/mmcsd0s2a failed with error 19. >> Trying to mount root from ufs:mmcsd0s2a []... >> mountroot: waiting for device mmcsd0s2a... >> Mounting from ufs:mmcsd0s2a failed with error 19. >>=20 >> Loader variables: >> vfs.root.mountfrom=3Dufs:/dev/mmcsd0s2a >> vfs.root.mountfrom.options=3Drw >>=20 >> Manual root filesystem specification: >> <fstype>:<device> [options] >> Mount <device> using filesystem <fstype> >> and with the specified (optional) option list. >>=20 >> eg. ufs:/dev/da0s1a >> zfs:zroot/ROOT/default >> cd9660:/dev/cd0 ro >> (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /) >>=20 >> ? List valid disk boot devices >> . Yield 1 second (for background tasks) >> <empty line> Abort manual input >>=20 >> mountroot>=20 >>=20 >> If I replace the kernel with a 13.0 kernel, it boots just fine. >>=20 >> So I go and do a git bisect between release/13.0.0 and release/13.1.0 >>=20 >> and I get the following: >>=20 >> git bisect start '--first-parent' >> # status: waiting for both good and bad commits >> # good: [ea31abc261ffc01b6ff5671bffb15cf910a07f4b] 13.0: update to = RELEASE >> git bisect good ea31abc261ffc01b6ff5671bffb15cf910a07f4b >> # status: waiting for bad commit, 1 good commit known >> # bad: [fc952ac2212b121aa6eefc273f5960ec3e0a466d] Update in = preparation of 13.1-RELEASE >> git bisect bad fc952ac2212b121aa6eefc273f5960ec3e0a466d >> # skip: [4c44dbde5491516eba8725dc51d39c1dcc817472] MFC jail: Handle a = parent jail when a child is added to it >> git bisect skip 4c44dbde5491516eba8725dc51d39c1dcc817472 >> # good: [476f87219f408343846254743c7189076be80c04] wpi: Fix a lock = leak in an error path in wpi_run() >> git bisect good 476f87219f408343846254743c7189076be80c04 >> # bad: [05bf7d68c56830e52dee14dc87c07d6716e8195e] aesni: Fix an = out-of-bounds read in AES_GCM_decrypt() >> git bisect bad 05bf7d68c56830e52dee14dc87c07d6716e8195e >> # good: [014ae00ef6edca2687d618e0bda138086a1e1230] date: Capitalize = seconds string in synopses >> git bisect good 014ae00ef6edca2687d618e0bda138086a1e1230 >> # bad: [08d995ca8f6f1008a10e4bf4d924824c040f842a] swapoff_one(): only = check free pages count manually turning swap off >> git bisect bad 08d995ca8f6f1008a10e4bf4d924824c040f842a >> # bad: [81b6dba1a08b031bdf7463c1704d27ae1e0daa0f] ktls: Fix assertion = for TLS 1.0 CBC when using non-zero starting seqno. >> git bisect bad 81b6dba1a08b031bdf7463c1704d27ae1e0daa0f >> # bad: [67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1] net80211: add a = driver-private pointer to struct ieee80211_node >> git bisect bad 67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1 >> # good: [109330155000bfec215ee39148254d2a0b628798] module(9): = Document that evhand can be NULL >> git bisect good 109330155000bfec215ee39148254d2a0b628798 >> # bad: [4c8e29637456bbbe709425f691f637914658009f] LinuxKPI: add = module_pci_driver() and pci_alloc_irq_vectors() >> git bisect bad 4c8e29637456bbbe709425f691f637914658009f >> # bad: [4a03ae8d17ddf3d3b57ca281000fd98e200b92cc] nfscl: Fix use = after free for forced dismount >> git bisect bad 4a03ae8d17ddf3d3b57ca281000fd98e200b92cc >> # bad: [de957de097857fabb69a59a9ba36276c5e735de5] bhyve: Fix the = WITH_BHYVE_SNAPSHOT build >> git bisect bad de957de097857fabb69a59a9ba36276c5e735de5 >> # bad: [5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391] hwpmc: initialize = arm64 counter/interrupt state >> git bisect bad 5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391 >> # bad: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: Fix handling = of reserves in zone_import() >> git bisect bad ce9c3848ff369467749f59fd24f8b9f1241e725c >> # good: [d5ebaa6f8f850bb6f6273f01386832efcb295827] uma: Improve = M_USE_RESERVE handling in keg_fetch_slab() >> git bisect good d5ebaa6f8f850bb6f6273f01386832efcb295827 >> # first bad commit: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: = Fix handling of reserves in zone_import() >>=20 >> If I do git log ce9c3848ff369467749f59fd24f8b9f1241e725c it does seem = that d5ebaa6f8f850bb6f6273f01386832efcb295827 is the previous commit and = that it works just fine but ce9c3848ff369467749f59fd24f8b9f1241e725c = doesn=E2=80=99t boot. It=E2=80=99s literally the same file system and = DTB, the only difference is the kernel installed. >>=20 >> What=E2=80=99s confusing to me is that looking at = ce9c3848ff369467749f59fd24f8b9f1241e725c I can=E2=80=99t see how that = commit would result in my kernel hanging while booting. >=20 > It's indeed a bit hard to see why commit > ce9c3848ff369467749f59fd24f8b9f1241e725c might cause an sdhci command > timeout. I suspect the commit itself is innocent, but happens to > uncover a problem elsewhere by changing the order in which page > allocations are distributed to different consumers. At the time of = the > command timeouts, the kernel is still running on a single CPU so the > order in which pages are allocated should be fairly deterministic and > thus consistent across multiple boots. >=20 > One thing you can try is to take a releng/13.1 tree and revert just > ce9c3848ff369467749f59fd24f8b9f1241e725c, and/or take releng/13.0 and > apply just that commit. It would be useful to know whether either of > the resulting kernels can boot successfully. >=20 > Could you also please post your kernel configuration (ARTYZ7) = somewhere, > together with a dmesg of a successful boot? Mark, If I checkout release/13.1.0 and revert the commit in question = then my kernel boots. I wasn=E2=80=99t able to checkout releng/13.1 as = I get the following: crb@eclipse:142> git checkout releng/13.1 M sys/vm/uma_core.c Already on 'releng/13.1' Your branch is up to date with 'origin/releng/13.1=E2=80=99. I think that=E2=80=99s just me being incompetent with git. I can add the commit to release/13.1.0 if you would still like My kernel is essentially identical to the in-tree ZEDBOARD config with the name changed. I can post it if truly desired. Please find the dmesg output for 13.1 release below. Thanks Christopher crb@artyz7:51> dmesg=20 Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.1-RELEASE #1 n250148-fc952ac2212-dirty: Sun Nov 13 14:29:58 = PST 2022 crb@eclipse.ChrisBowman.com:/usr/obj/usr/src/arm.armv7/sys/ARTYZ7 = arm FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git = llvmorg-13.0.0-0-gd7b669b3a303) CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000) CPU Features:=20 Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk Optional instructions:=20 UMULL, SMULL, SIMD(ext) LoUU:2 LoC:2 LoUIS:2=20 Cache level 1: 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc 32KB/32B 4-way instruction cache Read-Alloc real memory =3D 536866816 (511 MB) avail memory =3D 515440640 (491 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs random: unblocking device. random: entropy device external interface ofwbus0: <Open Firmware Device Tree> simplebus0: <Flattened device tree simple bus> on ofwbus0 simplebus1: <Flattened device tree simple bus> on ofwbus0 l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 on = simplebus1 l2cache0: cannot allocate IRQ, not using interrupt l2cache0: Part number: 0x3, release: 0x8 l2cache0: L2 Cache enabled: 512KB/32B 8 ways gic0: <ARM Generic Interrupt Controller> mem = 0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus1 gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96 mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on = simplebus1 Timecounter "MPCore" frequency 50000000 Hz quality 800 mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on = simplebus1 Event timer "MPCore" frequency 50000000 Hz quality 1000 cpulist0: <Open Firmware CPU Group> on ofwbus0 cpu0: <Open Firmware CPU> on cpulist0 cpu1: <Open Firmware CPU> on cpulist0 uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus1 uart0: console (-1,n,8,1) zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff = irq 13 on simplebus1 zy7_qspi0: must have ref-clock property device_attach: zy7_qspi0 attach returned 6 cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem = 0xe000b000-0xe000bfff irq 15 on simplebus1 miibus0: <MII bus> on cgem0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on = miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, = 1000baseT-FDX, 1000baseT-FDX-master, auto rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on = miibus0 rgephy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, = 1000baseT-FDX, 1000baseT-FDX-master, auto cgem0: Ethernet address: 72:11:aa:7b:e3:52 sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem = 0xe0100000-0xe0100fff irq 17 on simplebus1 sdhci_fdt0: 1 slot(s) allocated mmc0: <MMC/SD bus> on sdhci_fdt0 zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on = simplebus1 Timecounters tick every 1.000 msec mmcsd0: 32GB <SDHC SH32G 8.0 SN 88212660 MFG 06/2020 by 3 WD> at mmc0 = 50.0MHz/4bit/65535-block Trying to mount root from ufs:/dev/mmcsd0s2a [rw]... Release APs Warning: no time-of-day clock registered, system time will not be set = accurately lo0: link state changed to UP cgem0: link state changed to DOWN cgem0: cgem_mediachange: could not set ref clk0 to 125000000. cgem0: link state changed to UP
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5C07A058-977C-4F2E-8B41-01EBAB4FF24B>