Date: Mon, 22 Oct 2018 14:07:31 +0300 From: Toomas Soome <tsoome@me.com> To: Mark Millard <marklmi@yahoo.com> Cc: Konstantin Belousov <kib@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, Warner Losh <imp@bsdimp.com> Subject: Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated Message-ID: <9AEF5EB3-C393-44D1-9BD4-D0E59FE97CCE@me.com> In-Reply-To: <085BCA2B-4451-406C-9CEE-57D8B8008201@yahoo.com> References: <79973E2B-F5C4-4E7C-B92B-1C8D4441C7D1@yahoo.com> <ACBB38EF-9A6A-40E5-AB6C-EEB9E292A919@yahoo.com> <EDBFFACB-8582-4B16-AC1A-63F8C86C9BA4@yahoo.com> <CANCZdfo=uqLn16r0FShz=WEv3Z34LbmC1gqzKabwfr3gEUXsJg@mail.gmail.com> <CANCZdfoHg8=FfuJchyPJ9qBDZBkR_7nYTWPiQedZkW4Cs1pR5A@mail.gmail.com> <3CA4C94F-A062-44FE-B507-948A6F88C83D@me.com> <085BCA2B-4451-406C-9CEE-57D8B8008201@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 22 Oct 2018, at 13:58, Mark Millard <marklmi@yahoo.com> wrote: >=20 > On 2018-Oct-22, at 2:27 AM, Toomas Soome <tsoome at me.com = <http://me.com/>> wrote: >>=20 >>> On 22 Oct 2018, at 06:30, Warner Losh <imp@bsdimp.com> wrote: >>>=20 >>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp@bsdimp.com> wrote: >>>=20 >>>>=20 >>>>=20 >>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >>>> freebsd-stable@freebsd.org> wrote: >>>>=20 >>>>> [I built based on WITHOUT_ZFS=3D for other reasons. But, >>>>> after installing the build, Hyper-V based boots are >>>>> working.] >>>>>=20 >>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> = wrote: >>>>>=20 >>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> = wrote: >>>>>>=20 >>>>>>> I attempted to jump from head -r334014 to -r339076 >>>>>>> on a threadripper 1950X board and the boot fails. >>>>>>> This is both native booting and under Hyper-V, >>>>>>> same machine and root file system in both cases. >>>>>>=20 >>>>>> I did my investigation under Hyper-V after seeing >>>>>> a boot failure native. >>>>>>=20 >>>>>> Looks like the native failure is even earlier, >>>>>> before db> is even possible, possibly during >>>>>> early loader activity. >>>>>>=20 >>>>>> So this report is really for running under >>>>>> Hyper-V: -r338804 boots and -r338810 does >>>>>> not. By contrast -r334804 does not boot native. >>>>>> (But I've little information for that context.) >>>>>>=20 >>>>>> Sorry for the confusion. I rushed the report >>>>>> in hopes of getting to sleep. It was not to be. >>>>>>=20 >>>>>>> It fails just after the FreeBSD/SMP lines, >>>>>>> reporting "kernel trap 9 with interrupts disabled". >>>>>>>=20 >>>>>>> It fails in pmap_force_invaldiate_cache_range at >>>>>>> a clflusl (%rax) instruction that produces a >>>>>>> "Fatal trap 9: general protection fault while >>>>>>> in kernel mode". cpudid=3D0 apic id=3D 00 >>>>>>>=20 >>>>>>> I used kernel.txz files from: >>>>>>>=20 >>>>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >>>>>>>=20 >>>>>>> to narrow the range of kernel builds for working -> failing >>>>>>> and got: >>>>>>>=20 >>>>>>> -r338804 boots fine >>>>>>> (no amd64 kernel builds between to try) >>>>>>> -r338810+ fails (any that I tried, anyway) >>>>>>>=20 >>>>>>> In that range is -r338807 : >>>>>>>=20 >>>>>>> QUOTE >>>>>>> Author: kib >>>>>>> Date: Wed Sep 19 19:35:02 2018 >>>>>>> New Revision: 338807 >>>>>>> URL: >>>>>>> https://svnweb.freebsd.org/changeset/base/338807 >>>>>>>=20 >>>>>>>=20 >>>>>>> Log: >>>>>>> Convert x86 cache invalidation functions to ifuncs. >>>>>>>=20 >>>>>>> This simplifies the runtime logic and reduces the number of >>>>>>> runtime-constant branches. >>>>>>>=20 >>>>>>> Reviewed by: alc, markj >>>>>>> Sponsored by: The FreeBSD Foundation >>>>>>> Approved by: re (gjb) >>>>>>> Differential revision: >>>>>>> https://reviews.freebsd.org/D16736 >>>>>>>=20 >>>>>>> Modified: >>>>>>> head/sys/amd64/amd64/pmap.c >>>>>>> head/sys/amd64/include/pmap.h >>>>>>> head/sys/dev/drm2/drm_os_freebsd.c >>>>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c >>>>>>> head/sys/i386/i386/pmap.c >>>>>>> head/sys/i386/i386/vm_machdep.c >>>>>>> head/sys/i386/include/pmap.h >>>>>>> head/sys/x86/iommu/intel_utils.c >>>>>>> END QUOTE >>>>>>>=20 >>>>>>> There do seem to be changes associated with >>>>>>> clflush(...) use. Looking at: >>>>>>>=20 >>>>>>>=20 >>>>> = https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=3D339= 432 >>>>>>>=20 >>>>>>> it appears that pmap_force_invalidate_cache_range has not >>>>>>> changed since -r338807. >>>>>>>=20 >>>>>>> It seems that -r338806 and -r3388810 would be unlikely >>>>>>> contributors. >>>>>>=20 >>>>>=20 >>>>> I went after my native-boot loader problem first because I >>>>> could switch kernels via the loader for booting FreeBSD under >>>>> Hyper-V. Switching loaders is more of a problem. >>>>>=20 >>>>> In order to avoid the loader-time crash I switched to building >>>>> installing based on WITHOUT_ZFS=3D . I've had no active use of >>>>> ZFS in years. (The old official-build loaders that worked were >>>>> non-ZFS ones.) >>>>>=20 >>>>> This took care of the native-boot loader-crash --and, to my >>>>> surprise, also the Hyper-V-boot kernel-time crash. >>>>>=20 >>>>> My private builds now boot the 1950X in both contexts just >>>>> fine. >>>>>=20 >>>>> During my early investigation I did pick up specific changes >>>>> from after -r339076 that seemed to be tied to Ryzen and such. >>>>> (They made no difference to the boot problems at the time >>>>> but I saw no reason to remove them.) >>>>>=20 >>>>> # uname -apKU >>>>> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 = r339076:339432M: Sun >>>>> Oct 21 16:44:25 PDT 2018 = markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/G= ENERIC-NODBG >>>>> amd64 amd64 1200084 1200084 >>>>=20 >>>>=20 >>> (stupid gmail) >>>=20 >>> The phrase "no active use" bothers me. What does that mean? Are = there any >>> ZFS pools or any disks that any whiff of ZFSish thing on it at all? >>> Clearly, there's something in the zfs boot loader that's freaking = out by >>> something on your system, but absent that information I can't help = you. >>>=20 >>=20 >> It would help to get output from loader lsdev -v command. >=20 > That turned out to be very interesting: The non-ZFS loader > crashes during the listing, during disk8, which shows a > x0 instead of a x512. >=20 Yes, thats the root cause there. The non-zfs loader does only *read* the = boot disk, thats why the issue was not revealed there.=20 It would help to identify the sector size for that disk, at least from = OS, so we can compare with what we can get from INT13. I have pretty good idea what to look there, but I am afraid we need to = run few tests with you to understand why that disk is reporting sector = size 0 there. rgds, toomas > Hand transcribed from pictures: >=20 > OK lsdev -v > disk devices > disk0: BIOS drive C (937703088 x 512): > disk0p1: FreeBSD boot 512K > disk0p2: FreeBSD UFS 356G > disk0p3: FreeBSD swap 15G > disp0p4: FreeBSD swap 76G > disk1: BIOS drive D (16514064 x 512): > disk1s1: Linux 2048KB > disk1s2: Unknown 952GB > disk2: BIOS drive E (16514064 x 512): > disk2p1: Unknown 128MB > disk3: BIOS drive F (16514064 x 512): > disk3p1: Unknown 128MB > disk4: BIOS drive G (16434495 x 512): > disk2p1: Unknown 128MB > disk4p2: DOS/Windwos 1716GB > disk5: BIOS drive H (16434495 x 512): > disk5p1: FreeBSD boot 512K > disk5p2: FreeBSD UFS 176G > disk5p3: FreeBSD swap 193G > disp5p4: FreeBSD swap 15G > disk6: BIOS drive I (16434495 x 512): > disk6p1: Unknown 499MB > disk6p2: EFI 99MB > disk6p3: Unknown 16MB > disp6p4: DOS/Windows 886G > dis7: BIOS drive H (16434495 x 512): > disk7p1: FreeBSD boot 512K > disk7p2: FreeBSD UFS 953G > disk8: BIOS drive K (262144 x 0): >=20 > int=3D00000000 err=3D00000000 efl=3D00010246 eip=3D000286bd > eax=3D00000000 ebx=3D72b50430 ecx=3D00000000 edx=3D00000000 > esi=3D00000000 edi=3D00092080 ebp=3D00091eec esp=3D00091ea8 > cs=3D002b ds=3D0033 es=3D0033 fs=3D0033 gs=3D0033 ss=3D0033 > cs:eip=3Df7 f1 89 c1 85 d2 0f 85-d8 01 00 00 6a 05 58 85 > f6 0f 88 75 01 00 00 <tel:88%2075%2001%2000%2000> 89-cb c1 fb 1f = 89 ca 03 55 > ss:esp=3D09 00 00 00 00 00 00 <tel:09%2000%2000%2000%2000%2000%2000> = 00-0a 00 00 00 02 00 00 00 <tel:00%2000%2000%2002%2000%2000%2000> > 00 00 00 00 00 00 00 <tel:00%2000%2000%2000%2000%2000%2000> = 00-78 1f 09 00 33 45 04 00 <tel:09%2000%2033%2045%2004%2000> > BTX halted >=20 > I expect that "disk8" is what gpart show -p > from a native boot showed as: >=20 > =3D> 1 60062499 da1 MBR (29G) > 1 31 - free - (16K) > 32 =C2=A060062468 <tel:32%20%C2%A060062468> da1s1 fat32lba = (29G) >=20 > (That gpart show -p output is in another of the > list messages.) >=20 >> Also if you could test boot loader with UEFI - for example get to = loader prompt via usb/cd boot and then get the same lsdev -v output. >=20 > Still true given the above crash? Or, going the > other way, should "drive8" be left as it is in > order to be sure to do this test with the drive > present? >=20 > If I do this test later, it will take a bit to > get media to do it with. (It is about 4AM in the > morning and I've yet to get to sleep.) >=20 > Note: I've never tried a UEFI based boot of FreeBSD > on this machine (but the Windows 10 Pro x64 is EFI > based). The only FreeBSD context using a EFI partition > to boot that I have used is on an arm aarch64 > Cortex-A57 system. >=20 >> I would be interested to see the sector size information and if the = UEFI loader does also have issues. >=20 > Understood. >=20 >> If it does, I=E2=80=99d like to see the outputs from commands: >=20 >> zpool status >> zpool import >=20 > Independent of the UEFI test . . . >=20 > I do have a -r331924 head version on another one > of the devices and can native-boot that. It still > has its ZFS software (but a default loader without > ZFS). >=20 > Trying from that context, hand transcribed: >=20 > # zpool status > ZFS filesystem version: 5 > ZFS storage pool version: features support (5000) > no pools available > # zpool import > # >=20 > [That was based on the old (default) loader being > a non-ZFS one.] >=20 >=20 > =3D=3D=3D > Mark Millard > marklmi at yahoo.com <http://yahoo.com/> > ( dsl-only.net <http://dsl-only.net/> went > away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9AEF5EB3-C393-44D1-9BD4-D0E59FE97CCE>