Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Oct 2018 19:55:45 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Konstantin Belousov <kib@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
Message-ID:  <EDBFFACB-8582-4B16-AC1A-63F8C86C9BA4@yahoo.com>
In-Reply-To: <ACBB38EF-9A6A-40E5-AB6C-EEB9E292A919@yahoo.com>
References:  <79973E2B-F5C4-4E7C-B92B-1C8D4441C7D1@yahoo.com> <ACBB38EF-9A6A-40E5-AB6C-EEB9E292A919@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I built based on WITHOUT_ZFS=3D for other reasons. But,
after installing the build, Hyper-V based boots are
working.]

On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> I attempted to jump from head -r334014 to -r339076
>> on a threadripper 1950X board and the boot fails.
>> This is both native booting and under Hyper-V,
>> same machine and root file system in both cases.
>=20
> I did my investigation under Hyper-V after seeing
> a boot failure native.
>=20
> Looks like the native failure is even earlier,
> before db> is even possible, possibly during
> early loader activity.
>=20
> So this report is really for running under
> Hyper-V: -r338804 boots and -r338810 does
> not. By contrast -r334804 does not boot native.
> (But I've little information for that context.)
>=20
> Sorry for the confusion. I rushed the report
> in hopes of getting to sleep. It was not to be.
>=20
>> It fails just after the FreeBSD/SMP lines,
>> reporting "kernel trap 9 with interrupts disabled".
>>=20
>> It fails in pmap_force_invaldiate_cache_range at
>> a clflusl (%rax) instruction that produces a
>> "Fatal trap 9: general protection fault while
>> in kernel mode". cpudid=3D0 apic id=3D 00
>>=20
>> I used kernel.txz files from:
>>=20
>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
>>=20
>> to narrow the range of kernel builds for working -> failing
>> and got:
>>=20
>> -r338804 boots fine
>> (no amd64 kernel builds between to try)
>> -r338810+ fails (any that I tried, anyway)
>>=20
>> In that range is -r338807 :
>>=20
>> QUOTE
>> Author: kib
>> Date: Wed Sep 19 19:35:02 2018
>> New Revision: 338807
>> URL:=20
>> https://svnweb.freebsd.org/changeset/base/338807
>>=20
>>=20
>> Log:
>> Convert x86 cache invalidation functions to ifuncs.
>>=20
>> This simplifies the runtime logic and reduces the number of
>> runtime-constant branches.
>>=20
>> Reviewed by:	alc, markj
>> Sponsored by:	The FreeBSD Foundation
>> Approved by:	re (gjb)
>> Differential revision:=09
>> https://reviews.freebsd.org/D16736
>>=20
>> Modified:
>> head/sys/amd64/amd64/pmap.c
>> head/sys/amd64/include/pmap.h
>> head/sys/dev/drm2/drm_os_freebsd.c
>> head/sys/dev/drm2/i915/intel_ringbuffer.c
>> head/sys/i386/i386/pmap.c
>> head/sys/i386/i386/vm_machdep.c
>> head/sys/i386/include/pmap.h
>> head/sys/x86/iommu/intel_utils.c
>> END QUOTE
>>=20
>> There do seem to be changes associated with
>> clflush(...) use. Looking at:
>>=20
>> =
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=3D339=
432
>>=20
>> it appears that pmap_force_invalidate_cache_range has not
>> changed since -r338807.
>>=20
>> It seems that -r338806 and -r3388810 would be unlikely
>> contributors.
>=20

I went after my native-boot loader problem first because I
could switch kernels via the loader for booting FreeBSD under
Hyper-V. Switching loaders is more of a problem.

In order to avoid the loader-time crash I switched to building
installing based on WITHOUT_ZFS=3D . I've had no active use of
ZFS in years. (The old official-build loaders that worked were
non-ZFS ones.)

This took care of the native-boot loader-crash --and, to my
surprise, also the Hyper-V-boot kernel-time crash.

My private builds now boot the 1950X in both contexts just
fine.

During my early investigation I did pick up specific changes
from after -r339076 that seemed to be tied to Ryzen and such.
(They made no difference to the boot problems at the time
but I saw no reason to remove them.)

# uname -apKU
FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun =
Oct 21 16:44:25 PDT 2018     =
markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/G=
ENERIC-NODBG  amd64 amd64 1200084 1200084


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EDBFFACB-8582-4B16-AC1A-63F8C86C9BA4>