Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Nov 2022 14:48:02 -0800
From:      Christopher Bowman <crb@chrisbowman.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        hackers@freebsd.org
Subject:   Re: I could use some help
Message-ID:  <5C07A058-977C-4F2E-8B41-01EBAB4FF24B@chrisbowman.com>
In-Reply-To: <Y3FDfpKWKjyHaWAK@nuc>
References:  <ED4D59FC-DE1B-4591-AE51-5AA61B5647A6@chrisbowman.com> <Y3FDfpKWKjyHaWAK@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Nov 13, 2022, at 11:20 AM, Mark Johnston <markj@freebsd.org> wrote:
>=20
> On Wed, Nov 09, 2022 at 10:18:02PM -0800, Christopher Bowman wrote:
>> OK I=E2=80=99m really confused and I could use some help:
>>=20
>> 13.0 runs fine on my Xilinx Zynq based board (DIgilent Arty Z20).  =
However when I compile 13.1 release it doesn=E2=80=99t boot.  The kernel =
stops during boot as follows:
>>=20
>> Using DTB from loaded file '/boot/dtb/zynq-artyz7.dtb'.
>>=20
>>=20
>> Loading DTB overlays: 'artyz7_ssd_overlay.dtb'
>>=20
>>=20
>> /boot/dtb/overlays/artyz7_ssd_overlay.dtb size=3D0x1a1
>>=20
>>=20
>> Kernel entry at 0x17a00200...
>>=20
>>=20
>> Kernel args: (null)
>>=20
>>=20
>> applying DTB overlay '/boot/dtb/overlays/artyz7_ssd_overlay.dtb'
>>=20
>>=20
>> WARNING: Cannot find freebsd,dts-version property, cannot check DTB =
compliance
>> ---<<BOOT>>---
>> Copyright (c) 1992-2021 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, =
1994
>> .The Regents of the University of California. All rights reserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 13.0-STABLE #22 n248064-ce9c3848ff3: Wed Nov  9 22:04:45 PST =
2022
>>    crb@eclipse.ChrisBowman.com:/usr/obj/usr/src/arm.armv7/sys/ARTYZ7 =
arm
>> FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git =
llvmorg-11.0.1-0-g43ff75f2c3fe)
>> CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000)
>> CPU Features:=20
>>  Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk
>> Optional instructions:=20
>>  UMULL, SMULL, SIMD(ext)
>> LoUU:2 LoC:2 LoUIS:2=20
>> Cache level 1:
>> 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc
>> 32KB/32B 4-way instruction cache Read-Alloc
>> real memory  =3D 536866816 (511 MB)
>> avail memory =3D 515162112 (491 MB)
>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>> random: unblocking device.
>> random: entropy device external interface
>> ofwbus0: <Open Firmware Device Tree>
>> simplebus0: <Flattened device tree simple bus> on ofwbus0
>> simplebus1: <Flattened device tree simple bus> on ofwbus0
>> l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 =
on simplebus1
>> l2cache0: cannot allocate IRQ, not using interrupt
>> l2cache0: Part number: 0x3, release: 0x8
>> l2cache0: L2 Cache enabled: 512KB/32B 8 ways
>> gic0: <ARM Generic Interrupt Controller> mem =
0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus1
>> gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96
>> mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on =
simplebus1
>> Timecounter "MPCore" frequency 50000000 Hz quality 800
>> mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on =
simplebus1
>> Event timer "MPCore" frequency 50000000 Hz quality 1000
>> cpulist0: <Open Firmware CPU Group> on ofwbus0
>> cpu0: <Open Firmware CPU> on cpulist0
>> cpu1: <Open Firmware CPU> on cpulist0
>> uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus1
>> uart0: console (-1,n,8,1)
>> zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff =
irq 13 on simplebus1
>> zy7_qspi0: must have ref-clock property
>> device_attach: zy7_qspi0 attach returned 6
>> cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem =
0xe000b000-0xe000bfff irq 15 on simplebus1
>> miibus0: <MII bus> on cgem0
>> rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on =
miibus0
>> rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, =
1000baseT-FDX, 1000baseT-FDX-master, auto
>> rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on =
miibus0
>> rgephy1:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, =
1000baseT-FDX, 1000baseT-FDX-master, auto
>> cgem0: Ethernet address: 56:99:3e:50:9a:8e
>> sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem =
0xe0100000-0xe0100fff irq 17 on simplebus1
>> sdhci_fdt0: 1 slot(s) allocated
>> mmc0: <MMC/SD bus> on sdhci_fdt0
>> zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on =
simplebus1
>> Timecounters tick every 1.000 msec
>> sdhci_fdt0-slot0: Controller timeout
>> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER =
DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> sdhci_fdt0-slot0: Sys addr: 0x00060000 | Version:  0x00008901
>> sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt:  0x00000001
>> sdhci_fdt0-slot0: Argument: 0x00000000 | Trn mode: 0x00000013
>> sdhci_fdt0-slot0: Present:  0x01ff0202 | Host ctl: 0x00000001
>> sdhci_fdt0-slot0: Power:    0x0000000f | Blk gap:  0x00000000
>> sdhci_fdt0-slot0: Wake-up:  0x00000000 | Clock:    0x00004007
>> sdhci_fdt0-slot0: Timeout:  0x00000006 | Int stat: 0x00000001
>> sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa
>> sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000
>> sdhci_fdt0-slot0: Caps:     0x69ec0080 | Caps2:    0x00000000
>> sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000
>> sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000
>> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

>> sdhci_fdt0-slot0: Controller timeout
>> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER =
DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> sdhci_fdt0-slot0: Sys addr: 0x00000000 | Version:  0x00008901
>> sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt:  0x00000001
>> sdhci_fdt0-slot0: Argument: 0xaaaa0000 | Trn mode: 0x00000013
>> sdhci_fdt0-slot0: Present:  0x01ff0000 | Host ctl: 0x00000001
>> sdhci_fdt0-slot0: Power:    0x0000000f | Blk gap:  0x00000000
>> sdhci_fdt0-slot0: Wake-up:  0x00000000 | Clock:    0x00004007
>> sdhci_fdt0-slot0: Timeout:  0x00000006 | Int stat: 0x00000001
>> sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa
>> sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000
>> sdhci_fdt0-slot0: Caps:     0x69ec0080 | Caps2:    0x00000000
>> sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000
>> sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000
>> sdhci_fdt0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

>>=20
>> The controller register dumps continue for a while and then I get:
>>=20
>> mmc0: CMD7 failed, RESULT: 1
>> Release APs
>> Trying to mount root from ufs:/dev/mmcsd0s2a [rw]...
>> mountroot: waiting for device /dev/mmcsd0s2a...
>> Mounting from ufs:/dev/mmcsd0s2a failed with error 19.
>> Trying to mount root from ufs:mmcsd0s2a []...
>> mountroot: waiting for device mmcsd0s2a...
>> Mounting from ufs:mmcsd0s2a failed with error 19.
>>=20
>> Loader variables:
>>  vfs.root.mountfrom=3Dufs:/dev/mmcsd0s2a
>>  vfs.root.mountfrom.options=3Drw
>>=20
>> Manual root filesystem specification:
>>  <fstype>:<device> [options]
>>      Mount <device> using filesystem <fstype>
>>      and with the specified (optional) option list.
>>=20
>>    eg. ufs:/dev/da0s1a
>>        zfs:zroot/ROOT/default
>>        cd9660:/dev/cd0 ro
>>          (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)
>>=20
>>  ?               List valid disk boot devices
>>  .               Yield 1 second (for background tasks)
>>  <empty line>    Abort manual input
>>=20
>> mountroot>=20
>>=20
>> If I replace the kernel with a 13.0 kernel, it boots just fine.
>>=20
>> So I go and do a git bisect between release/13.0.0 and release/13.1.0
>>=20
>> and I get the following:
>>=20
>> git bisect start '--first-parent'
>> # status: waiting for both good and bad commits
>> # good: [ea31abc261ffc01b6ff5671bffb15cf910a07f4b] 13.0: update to =
RELEASE
>> git bisect good ea31abc261ffc01b6ff5671bffb15cf910a07f4b
>> # status: waiting for bad commit, 1 good commit known
>> # bad: [fc952ac2212b121aa6eefc273f5960ec3e0a466d] Update in =
preparation of 13.1-RELEASE
>> git bisect bad fc952ac2212b121aa6eefc273f5960ec3e0a466d
>> # skip: [4c44dbde5491516eba8725dc51d39c1dcc817472] MFC jail: Handle a =
parent jail when a child is added to it
>> git bisect skip 4c44dbde5491516eba8725dc51d39c1dcc817472
>> # good: [476f87219f408343846254743c7189076be80c04] wpi: Fix a lock =
leak in an error path in wpi_run()
>> git bisect good 476f87219f408343846254743c7189076be80c04
>> # bad: [05bf7d68c56830e52dee14dc87c07d6716e8195e] aesni: Fix an =
out-of-bounds read in AES_GCM_decrypt()
>> git bisect bad 05bf7d68c56830e52dee14dc87c07d6716e8195e
>> # good: [014ae00ef6edca2687d618e0bda138086a1e1230] date: Capitalize =
seconds string in synopses
>> git bisect good 014ae00ef6edca2687d618e0bda138086a1e1230
>> # bad: [08d995ca8f6f1008a10e4bf4d924824c040f842a] swapoff_one(): only =
check free pages count manually turning swap off
>> git bisect bad 08d995ca8f6f1008a10e4bf4d924824c040f842a
>> # bad: [81b6dba1a08b031bdf7463c1704d27ae1e0daa0f] ktls: Fix assertion =
for TLS 1.0 CBC when using non-zero starting seqno.
>> git bisect bad 81b6dba1a08b031bdf7463c1704d27ae1e0daa0f
>> # bad: [67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1] net80211: add a =
driver-private pointer to struct ieee80211_node
>> git bisect bad 67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1
>> # good: [109330155000bfec215ee39148254d2a0b628798] module(9): =
Document that evhand can be NULL
>> git bisect good 109330155000bfec215ee39148254d2a0b628798
>> # bad: [4c8e29637456bbbe709425f691f637914658009f] LinuxKPI: add =
module_pci_driver() and pci_alloc_irq_vectors()
>> git bisect bad 4c8e29637456bbbe709425f691f637914658009f
>> # bad: [4a03ae8d17ddf3d3b57ca281000fd98e200b92cc] nfscl: Fix use =
after free for forced dismount
>> git bisect bad 4a03ae8d17ddf3d3b57ca281000fd98e200b92cc
>> # bad: [de957de097857fabb69a59a9ba36276c5e735de5] bhyve: Fix the =
WITH_BHYVE_SNAPSHOT build
>> git bisect bad de957de097857fabb69a59a9ba36276c5e735de5
>> # bad: [5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391] hwpmc: initialize =
arm64 counter/interrupt state
>> git bisect bad 5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391
>> # bad: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: Fix handling =
of reserves in zone_import()
>> git bisect bad ce9c3848ff369467749f59fd24f8b9f1241e725c
>> # good: [d5ebaa6f8f850bb6f6273f01386832efcb295827] uma: Improve =
M_USE_RESERVE handling in keg_fetch_slab()
>> git bisect good d5ebaa6f8f850bb6f6273f01386832efcb295827
>> # first bad commit: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: =
Fix handling of reserves in zone_import()
>>=20
>> If I do git log ce9c3848ff369467749f59fd24f8b9f1241e725c it does seem =
that d5ebaa6f8f850bb6f6273f01386832efcb295827 is the previous commit and =
that it works just fine but ce9c3848ff369467749f59fd24f8b9f1241e725c =
doesn=E2=80=99t boot.  It=E2=80=99s literally the same file system and =
DTB, the only difference is the kernel installed.
>>=20
>> What=E2=80=99s confusing to me is that looking at =
ce9c3848ff369467749f59fd24f8b9f1241e725c I can=E2=80=99t see how that =
commit would result in my kernel hanging while booting.
>=20
> It's indeed a bit hard to see why commit
> ce9c3848ff369467749f59fd24f8b9f1241e725c might cause an sdhci command
> timeout.  I suspect the commit itself is innocent, but happens to
> uncover a problem elsewhere by changing the order in which page
> allocations are distributed to different consumers.  At the time of =
the
> command timeouts, the kernel is still running on a single CPU so the
> order in which pages are allocated should be fairly deterministic and
> thus consistent across multiple boots.
>=20
> One thing you can try is to take a releng/13.1 tree and revert just
> ce9c3848ff369467749f59fd24f8b9f1241e725c, and/or take releng/13.0 and
> apply just that commit.  It would be useful to know whether either of
> the resulting kernels can boot successfully.
>=20
> Could you also please post your kernel configuration (ARTYZ7) =
somewhere,
> together with a dmesg of a successful boot?

Mark,
	If I checkout release/13.1.0 and revert the commit in question =
then my kernel boots.  I wasn=E2=80=99t able to checkout releng/13.1 as =
I get the following:

crb@eclipse:142> git checkout releng/13.1
M	sys/vm/uma_core.c
Already on 'releng/13.1'
Your branch is up to date with 'origin/releng/13.1=E2=80=99.

I think that=E2=80=99s just me being incompetent with git.

I can add the commit to release/13.1.0 if you would still like

My kernel is essentially identical to the in-tree ZEDBOARD config
with the name changed.  I can post it if truly desired.

Please find the dmesg output for 13.1 release below.
Thanks
Christopher


crb@artyz7:51> dmesg=20

Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights =
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.1-RELEASE #1 n250148-fc952ac2212-dirty: Sun Nov 13 14:29:58 =
PST 2022
    crb@eclipse.ChrisBowman.com:/usr/obj/usr/src/arm.armv7/sys/ARTYZ7 =
arm
FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git =
llvmorg-13.0.0-0-gd7b669b3a303)
CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000)
CPU Features:=20
  Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk
Optional instructions:=20
  UMULL, SMULL, SIMD(ext)
LoUU:2 LoC:2 LoUIS:2=20
Cache level 1:
 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc
 32KB/32B 4-way instruction cache Read-Alloc
real memory  =3D 536866816 (511 MB)
avail memory =3D 515440640 (491 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
random: unblocking device.
random: entropy device external interface
ofwbus0: <Open Firmware Device Tree>
simplebus0: <Flattened device tree simple bus> on ofwbus0
simplebus1: <Flattened device tree simple bus> on ofwbus0
l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 on =
simplebus1
l2cache0: cannot allocate IRQ, not using interrupt
l2cache0: Part number: 0x3, release: 0x8
l2cache0: L2 Cache enabled: 512KB/32B 8 ways
gic0: <ARM Generic Interrupt Controller> mem =
0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus1
gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96
mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on =
simplebus1
Timecounter "MPCore" frequency 50000000 Hz quality 800
mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on =
simplebus1
Event timer "MPCore" frequency 50000000 Hz quality 1000
cpulist0: <Open Firmware CPU Group> on ofwbus0
cpu0: <Open Firmware CPU> on cpulist0
cpu1: <Open Firmware CPU> on cpulist0
uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus1
uart0: console (-1,n,8,1)
zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff =
irq 13 on simplebus1
zy7_qspi0: must have ref-clock property
device_attach: zy7_qspi0 attach returned 6
cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem =
0xe000b000-0xe000bfff irq 15 on simplebus1
miibus0: <MII bus> on cgem0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on =
miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, =
1000baseT-FDX, 1000baseT-FDX-master, auto
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on =
miibus0
rgephy1:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, =
1000baseT-FDX, 1000baseT-FDX-master, auto
cgem0: Ethernet address: 72:11:aa:7b:e3:52
sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem =
0xe0100000-0xe0100fff irq 17 on simplebus1
sdhci_fdt0: 1 slot(s) allocated
mmc0: <MMC/SD bus> on sdhci_fdt0
zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on =
simplebus1
Timecounters tick every 1.000 msec
mmcsd0: 32GB <SDHC SH32G 8.0 SN 88212660 MFG 06/2020 by 3 WD> at mmc0 =
50.0MHz/4bit/65535-block
Trying to mount root from ufs:/dev/mmcsd0s2a [rw]...
Release APs
Warning: no time-of-day clock registered, system time will not be set =
accurately
lo0: link state changed to UP
cgem0: link state changed to DOWN
cgem0: cgem_mediachange: could not set ref clk0 to 125000000.
cgem0: link state changed to UP





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5C07A058-977C-4F2E-8B41-01EBAB4FF24B>