Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Feb 2022 16:50:21 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Pi3 answers ssh only if outbound ping is running on -current
Message-ID:  <F5FC2E50-C1E8-4E84-B4D4-DC7BFC3C9C14@yahoo.com>
In-Reply-To: <34F5C092-7C35-4CBB-9CC3-99E373D1F785@yahoo.com>
References:  <20220212185618.GA37391@www.zefox.net> <34F5C092-7C35-4CBB-9CC3-99E373D1F785@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Feb-12, at 13:32, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Feb-12, at 10:56, bob prohaska <fbsd@www.zefox.net> wrote:
>=20
>> For a few weeks now a Pi3 running -current will not respond to
>> an incoming ssh connection unless an outbound ping process is  =
running.
>>=20
>> Once the outbound ping is started via the serial console, incoming
>> ssh connections are answered normally. Uname -a reports
>> FreeBSD www.zefox.org 14.0-CURRENT FreeBSD 14.0-CURRENT #10 =
main-n253073-6db44b0158c: Sat Feb 12 04:30:21 PST 2022     =
bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
>>=20
>> A Pi4 running -current of a few days ago exhibits no such problems.
>>=20
>> Another Pi3 running stable/13 has been behaving in the same way.
>>=20
>> Both Pi3s successfully set time via ntp on reboot and will
>> very briefly (one or two minutes) prompt for an ssh password,
>> but no further progress is made and the login attempt times out.
>> If the ssh login is attempted a second time, not even a password
>> prompt comes back.
>>=20
>> Ping times (to an adjacent machine on the same subnet are
>> 64 bytes from 50.1.20.26: icmp_seq=3D2 ttl=3D64 time=3D0.978 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D3 ttl=3D64 time=3D0.967 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D4 ttl=3D64 time=3D1.088 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D5 ttl=3D64 time=3D0.983 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D6 ttl=3D64 time=3D1.007 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D7 ttl=3D64 time=3D1.075 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D8 ttl=3D64 time=3D1.020 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D9 ttl=3D64 time=3D1.044 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D10 ttl=3D64 time=3D1.026 ms
>> 64 bytes from 50.1.20.26: icmp_seq=3D11 ttl=3D64 time=3D0.908 ms
>>=20
>> That might be considered slow, but the correspondent machine
>> is only a Pi2 running=20
>> FreeBSD www.zefox.com 14.0-CURRENT FreeBSD 14.0-CURRENT #3 =
main-71d2d5adfe: Tue Dec 21 00:23:51 PST 2021     =
bob@www.zefox.com:/usr/obj/usr/freebsd-src/arm.armv7/sys/GENERIC  arm
>>=20
>> If the outbound ping is started, an incoming ssh connection =
established
>> and the outbound ping subsequently stopped the running ssh connection
>> silently freezes; no disconnect, but no response, not even echo. Some
>> tens of seconds later, all inputs were responded to. Tried a second =
time,
>> the stoppage recurred, restarting the outbound ping eventually =
restored
>> responsiveness.
>>=20
>> With the outbound ping stopped, an inbound ssh attempt silently =
failed:
>>=20
>> bob@raspberrypi:~ $ ssh -vvv 50.1.20.28
>> OpenSSH_7.9p1 Raspbian-10+deb10u2+rpt1, OpenSSL 1.1.1d  10 Sep 2019
>> debug1: Reading configuration data /etc/ssh/ssh_config
>> debug1: /etc/ssh/ssh_config line 19: Applying options for *
>> debug2: resolve_canonicalize: hostname 50.1.20.28 is address
>> debug2: ssh_connect_direct
>> debug1: Connecting to 50.1.20.28 [50.1.20.28] port 22.
>> [enter key echoed]
>> debug1: connect to address 50.1.20.28 port 22: Connection timed out
>> ssh: connect to host 50.1.20.28 port 22: Connection timed out
>> bob@raspberrypi:~ $ =20
>>=20
>> Thanks for reading and any insights. If I've omitted useful=20
>> details or tests please indicate.
>>=20
>=20
> You have made multiple reports to the arm list for this issue
> without anyone having managed to help. This report does have
> more comparative context, which might help someone help.
>=20
> It may be time to try other lists like freebsd-net and,
> possibly, freebsd-hackers or freebsd-stable or
> freebsd-current .
>=20
> However, the best thing no matter where you go would be
> to (approximately) bisect toward the back-to-back FreeBSD
> version-pair on, say, stable/13 at which the the problem
> goes from not-there to happening. ( stable/13 changes
> slower and so has fewer versions to deal with. Also its
> KBI may grow but is constrained to otherwise be more
> stable [ relative to releng/13.0 ]. So you are less
> likely to run into version compatibility problems
> for the below suggestion.)
>=20
> I'd recommend using kernel and world materials from:
>=20
> https://artifact.ci.freebsd.org/snapshot/stable-13/?C=3DM&O=3DD
>=20
> on a separate microsd card updated from a normal context,
> avoiding builds. Remember that older stable/13 worlds can
> run on newer kernels generally. So you might only need to
> update the kernel after getting an initial, somewhat older
> context in place. (It is not obvious if it is a kernel-only
> problem or not.) If it is a kernel problem, you might be
> able to put down a releng/13.0 world and never change it
> during the approximate bisect activity.
>=20
> For what https://artifact.ci.freebsd.org/snapshot/ has
> available, this avoids having to build the versions.
> It also allows checking if your builds are behaving
> differently than the official snapshots do.
>=20
> https://artifact.ci.freebsd.org/snapshot/ may not be able
> to get you to the back-to-back FreeBSD version-pair: the
> range might be wider. Sometimes the wider range is enough
> by inspection of the types of commmits in the range. So
> I'd report whatever range you find wihtout having done
> any builds.
>=20
> I'll note that I have no problem with connecting via ssh
> to a RPi3B running my build of (line split for readability):
>=20
> # uname -apKU
> FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #28
> main-n252475-e76c0108990b-dirty: Sat Jan 15 23:39:27 PST 2022
> =
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm6=
4.aarch64/sys/GENERIC-NODBG-CA53
> arm64 aarch64 1400047 1400047
>=20
> I have no stable/13 context set up for a RPi3B, only
> stable/13's that have an untuned ZFS context. Still,
> I wonder if that might operate well enough to test
> the issue, despite the 1 GiByte of RAM limitation. I
> may test that later today.

Other than needing to put in place my u-boot.bin build
that has usb_pgood_delay=3D2000 built-in, I had no trouble
with booting and ssh'ing in to (line split for
readability):

# uname -apKU
FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #25
stable/13-n249004-a5f698599560-dirty: Sun Jan 16 15:07:11 PST 2022
=
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.=
aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1300524 1300524

# ~/fbsd-based-on-what-commit.sh -C /usr/13S-src/
branch: stable/13
merge-base: a5f69859956049b5153b0e1b67f8f4a99622dc6f
merge-base: CommitDate: 2022-01-15 12:55:32 +0000
a5f698599560 (HEAD -> stable/13, freebsd/stable/13) Ignore =
debugger-injected signals left after detaching
n249004 (--first-parent --count for merge-base)

SIDE NOTE
After the above, my patched top reports:

Mem: 32504Ki Active, 214888Ki Inact, 393248Ki Wired, 40960B Buf, =
321468Ki Free, 75516Ki MaxObsActive, 394108Ki MaxObsWired, 469624Ki =
MaxObs(Act+Wir+Lndry)
ARC: 316408Ki Total, 201575Ki MFU, 111090Ki MRU, 143360B Anon, 1024Ki =
Header, 2551Ki Other
     259140Ki Compressed, 346379Ki Uncompressed, 1.34:1 Ratio
Swap: 3584Mi Total, 3584Mi Free, 75516Ki MaxObs(Act+Lndry+SwapUsed), =
469624Ki MaxObs(Act+Wir+Lndry+SwapUsed)

So it is not an environment I'd want to do buildworld buildkernel on.

But it looks to be usable for less memory intensive activities.
END SIDE NOTE

So I've looked and found (from today):

=
https://artifact.ci.freebsd.org/snapshot/stable-13/371633ece3ae88e3b3d7a02=
8c372d4ac4f72b503/arm64/aarch64/kernel.txz

and downloaded it. Then I decided to try it with my
normal boot media, leaving world as it is. So:

# ls -Tld /boot/ker*
drwxr-xr-x  2 root  wheel  680 Jan 16 16:49:24 2022 /boot/kernel
drwxr-xr-x  2 root  wheel  680 Jan  4 23:08:57 2022 /boot/kernel.old

# mv /boot/kernel /boot/kernorm

# tar -xpf kernel.txz -C /

# ls -Tld /boot/ker*
drwxr-xr-x  2 root  wheel  679 Feb 12 11:14:27 2022 /boot/kernel
drwxr-xr-x  2 root  wheel  680 Jan  4 23:08:57 2022 /boot/kernel.old
drwxr-xr-x  2 root  wheel  680 Jan 16 16:49:24 2022 /boot/kernorm

(I choose to not replace the system's debug information --that
is not stored under /boot/ but in with world files. So I did not
download or install kernel-dbg.txz .)

So now a reboot with loader defaults (for that boot environment
in my context) will use the kernel that I got from:

https://artifact.ci.freebsd.org/snapshot/stable-13/. . .

[Hmm. Looks like the u-boot.bin is not sufficient to be sure
that shutdown -r now will boot the RPi3B. =46rom power-on seems
to boot so far. I might need another built-in setting added
(or more) in order to allow the RPi3B to shutdown -r now well
for the USB3 NVMe based SSD media that I'm using.]

Still no trouble connecting and logging-in via ssh. For
reference (line split for readability):

# uname -apKU
FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #0
371633e: Sat Feb 12 19:06:49 UTC 2022
=
root@FreeBSD-stable-13-aarch64-build.jail.ci.FreeBSD.org:/usr/obj/usr/src/=
arm64.aarch64/sys/GENERIC
arm64 aarch64 1300525 1300524

(I do not have matching source at this point.)


Recommended experiment . . .

Since I have a context working based on the kernel in:

=
https://artifact.ci.freebsd.org/snapshot/stable-13/371633ece3ae88e3b3d7a02=
8c372d4ac4f72b503/arm64/aarch64/kernel.txz

I recommend that you try that exact same kernel in your
stable/13 context. I recommend renaming the existing
/boot/kernel before expanding the kernel.txz into / and
so causing a new /boot/kernel/ to be filled in.

If that makes things work after rebooting, then your
kernel can be blamed. (More investigation to know more
about what is going on in your kernel build.)

But if the above does not make things work, that points
to investigating alternate worlds from:

https://artifact.ci.freebsd.org/snapshot/stable-13/. . .

That is a messier context. I only do that with media that
I can delete everything on, such as an independent microsd
card: chflags -R noschg /mnt/ ; rm -fr /mnt/ ; various
tar -xpf ???.txz -C /mnt/ commands --while not booted from
the microsd card. Repeat for each snapshot tried.

There is a bias to the world not being newer than the
kernel. But since stable/13 's 371633ece3ae seems to
work in my context, you might be able to hold the kernel
invariant and just try different world versions in this
messier context.

Also: You might be be to find:

https://artifact.ci.freebsd.org/snapshot/stable-13/. . .

materials for the specific builds that you have been
working with and do comparison/contrast with the
behavior of your builds that had issues.


Note: The above does not consider other networking
configuration issues --that might not even be on RPi*
devices. I'm not networking literate overall.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F5FC2E50-C1E8-4E84-B4D4-DC7BFC3C9C14>