Date: Wed, 5 Jul 2023 23:42:08 -0700 From: Mark Millard <marklmi@yahoo.com> To: Current FreeBSD <freebsd-current@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot Message-ID: <C3952A9A-E21B-41C1-9BB1-68D189083F5D@yahoo.com> In-Reply-To: <7A41DED4-876F-4270-A980-549A4832B39A@yahoo.com> References: <7A41DED4-876F-4270-A980-549A4832B39A@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 25, 2023, at 17:16, Mark Millard <marklmi@yahoo.com> wrote: > Using the likes of: >=20 > = FreeBSD-14.0-CURRENT-arm64-aarch64-ROCK64-20230622-b95d2237af40-263748.img= > and: > = FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230622-b95d2237af40-263748.img >=20 > I have shown the following behavior after setting up storage > media based on them. (This was a test that my builds were not > odd for the issue.) >=20 > Boot the aarch64 media and log in. (Note: I logged in > as root.) >=20 > mount the armv7 media (-noatime is just my habit) > and then put it to use: >=20 > # mount -onoatime /dev/da1s2a /mnt >=20 > # chroot /mnt/ >=20 > # kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin > sys/kern/kern_copyin:kern_copyin -> =20 >=20 > On the serial console: >=20 > # ps -xu > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > root 11 1498.4 0.0 0 256 - RNL 23:24 542:52.92 [idle] > root 1174 100.0 0.0 0 16 - Rs 23:37 0:00.00 = /usr/tests/sys/kern/kern_copyin -vunprivileged-user=3Dtests = -r/tmp/kyua.9YUttj/2/result.atf kern_copyin > root 0 0.0 0.0 0 1616 - DLs 23:24 0:00.50 [kernel] > root 1 0.0 0.0 11704 1288 - ILs 23:24 0:00.02 /sbin/init > root 2 0.0 0.0 0 256 - WL 23:24 0:00.26 [clock] > root 3 0.0 0.0 0 272 - DL 23:24 0:00.00 [crypto] > root 4 0.0 0.0 0 80 - DL 23:24 0:00.95 [cam] > root 5 0.0 0.0 0 16 - DL 23:24 0:00.00 [busdma] > root 6 0.0 0.0 0 16 - DL 23:24 0:00.03 = [rand_harvestq] > root 7 0.0 0.0 0 48 - DL 23:24 0:00.06 = [pagedaemon] > root 8 0.0 0.0 0 16 - DL 23:24 0:00.00 [vmdaemon] > root 9 0.0 0.0 0 160 - DL 23:24 0:00.38 = [bufdaemon] > root 10 0.0 0.0 0 16 - DL 23:24 0:00.00 [audit] > root 12 0.0 0.0 0 880 - WL 23:24 0:11.81 [intr] > root 13 0.0 0.0 0 48 - DL 23:24 0:00.04 [geom] > root 14 0.0 0.0 0 16 - DL 23:24 0:00.00 [sequencer = 00] > root 15 0.0 0.0 0 160 - DL 23:24 0:06.42 [usb] > root 16 0.0 0.0 0 16 - DL 23:24 0:00.10 = [acpi_thermal] > root 17 0.0 0.0 0 16 - DL 23:24 0:00.00 = [acpi_cooling0] > root 18 0.0 0.0 0 16 - DL 23:24 0:00.04 [syncer] > root 19 0.0 0.0 0 16 - DL 23:24 0:00.00 [vnlru] > root 671 0.0 0.0 13260 2600 - Is 23:25 0:00.00 dhclient: = system.syslog (dhclient) > root 674 0.0 0.0 13260 2752 - Is 23:25 0:00.00 dhclient: = dpni0 [priv] (dhclient) > root 761 0.0 0.0 14572 3972 - Ss 23:25 0:00.02 /sbin/devd > root 964 0.0 0.0 12832 2764 - Is 23:25 0:00.02 = /usr/sbin/syslogd -s > root 1033 0.0 0.0 13012 2604 - Ss 23:25 0:00.01 = /usr/sbin/cron -s > root 1058 0.0 0.0 21052 8308 - Is 23:25 0:00.01 sshd: = /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd) > root 1078 0.0 0.0 21288 9304 - Is 23:26 0:00.09 sshd: = root@pts/0 (sshd) > root 1175 0.0 0.0 21288 9496 - Is 23:37 0:00.04 sshd: = root@pts/1 (sshd) > root 1074 0.0 0.0 13380 3008 u0 Is 23:25 0:00.01 login = [pam] (login) > root 1075 0.0 0.0 13460 3292 u0 S 23:25 0:00.02 -sh (sh) > root 1233 0.0 0.0 13588 3016 u0 R+ 00:00 0:00.00 ps -xu > root 1081 0.0 0.0 13460 3328 0 Is 23:26 0:00.02 -sh (sh) > root 1170 0.0 0.0 5788 2884 0 I 23:36 0:00.02 /bin/sh -i > root 1172 0.0 0.0 10408 7192 0 I+ 23:37 0:00.30 kyua test = -k /usr/tests/Kyuafile sys/kern/kern_copyin > root 1178 0.0 0.0 13460 3320 1 Is+ 23:38 0:00.01 -sh (sh) >=20 > 1174 is stuck, even if one waits for 30min+. > kill and kill -9 will not kill 1174. >=20 > "shutdown -r now" hangs before the reboot happens > and reports: "some processes would not die". >=20 > An interesting property is that ps and top disagree > about 1174 CPU usage: ps 100%, top 0%. But top also > indicates 1174 always has CPU0 "STATE". (Across > tests CPUn varies but within a test it has > a fixed n.) >=20 > I have also seen ps "STAT" being RXs. >=20 > The following is from my earlier activity with my own > builds involved, here 1119, not the 1174 from above. > truss reports as the last thing for the stuck process > as "getpid()". >=20 > . . . > 1119: 0.588983953 fstatat(AT_FDCWD,"/usr/tests/sys/kern/kern_copyin",{ = mode=3D-r-xr-xr-x ,inode=3D111756,size=3D9776,blksize=3D10240 = },AT_SYMLINK_NOFOLLOW) =3D 0 (0x0) > 1119: 0.589065030 = mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-= 1,0x0) =3D 1074188288 (0x4006d000) > 1119: 0.589227544 = openat(AT_FDCWD,"/tmp/kyua.aBQv6E/2/result.atf",O_WRONLY|O_CREAT|O_TRUNC,0= 644) =3D 3 (0x3) > 1119: 0.589276503 getpid() =3D 1119 (0x45f) >=20 >=20 >=20 > For reference, from inside an armv7 chroot session > before doing such a test: >=20 > # uname -apKU > FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT #0 = main-n263748-b95d2237af40: Thu Jun 22 11:10:50 UTC 2023 = root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC = arm armv7 1400090 1400090 I've replicated the same sort of hangup based on: aarch64 (booted): # uname -apKU FreeBSD CA72-16Gp-ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #0 = n263893-0631830a7a3c-dirty: Wed Jul 5 13:54:15 PDT 2023 = root@CA72-16Gp-ZFS:/usr/obj/BUILDs/alt-main-CA72-nodbg-clang-alt/usr/alt-m= ain-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400092 = 1400092 armv7 (as seen in a chroot use): # uname -apKU FreeBSD CA72-16Gp-ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #0 = n263893-0631830a7a3c-dirty: Wed Jul 5 13:54:15 PDT 2023 = root@CA72-16Gp-ZFS:/usr/obj/BUILDs/alt-main-CA72-nodbg-clang-alt/usr/alt-m= ain-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm armv7 1400092 1400092 =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3952A9A-E21B-41C1-9BB1-68D189083F5D>