Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Oct 2018 14:07:31 +0300
From:      Toomas Soome <tsoome@me.com>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        Konstantin Belousov <kib@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>, Warner Losh <imp@bsdimp.com>
Subject:   Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
Message-ID:  <9AEF5EB3-C393-44D1-9BD4-D0E59FE97CCE@me.com>
In-Reply-To: <085BCA2B-4451-406C-9CEE-57D8B8008201@yahoo.com>
References:  <79973E2B-F5C4-4E7C-B92B-1C8D4441C7D1@yahoo.com> <ACBB38EF-9A6A-40E5-AB6C-EEB9E292A919@yahoo.com> <EDBFFACB-8582-4B16-AC1A-63F8C86C9BA4@yahoo.com> <CANCZdfo=uqLn16r0FShz=WEv3Z34LbmC1gqzKabwfr3gEUXsJg@mail.gmail.com> <CANCZdfoHg8=FfuJchyPJ9qBDZBkR_7nYTWPiQedZkW4Cs1pR5A@mail.gmail.com> <3CA4C94F-A062-44FE-B507-948A6F88C83D@me.com> <085BCA2B-4451-406C-9CEE-57D8B8008201@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 22 Oct 2018, at 13:58, Mark Millard <marklmi@yahoo.com> wrote:
>=20
> On 2018-Oct-22, at 2:27 AM, Toomas Soome <tsoome at me.com =
<http://me.com/>>; wrote:
>>=20
>>> On 22 Oct 2018, at 06:30, Warner Losh <imp@bsdimp.com> wrote:
>>>=20
>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp@bsdimp.com> wrote:
>>>=20
>>>>=20
>>>>=20
>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>>>> freebsd-stable@freebsd.org> wrote:
>>>>=20
>>>>> [I built based on WITHOUT_ZFS=3D for other reasons. But,
>>>>> after installing the build, Hyper-V based boots are
>>>>> working.]
>>>>>=20
>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> =
wrote:
>>>>>=20
>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> =
wrote:
>>>>>>=20
>>>>>>> I attempted to jump from head -r334014 to -r339076
>>>>>>> on a threadripper 1950X board and the boot fails.
>>>>>>> This is both native booting and under Hyper-V,
>>>>>>> same machine and root file system in both cases.
>>>>>>=20
>>>>>> I did my investigation under Hyper-V after seeing
>>>>>> a boot failure native.
>>>>>>=20
>>>>>> Looks like the native failure is even earlier,
>>>>>> before db> is even possible, possibly during
>>>>>> early loader activity.
>>>>>>=20
>>>>>> So this report is really for running under
>>>>>> Hyper-V: -r338804 boots and -r338810 does
>>>>>> not. By contrast -r334804 does not boot native.
>>>>>> (But I've little information for that context.)
>>>>>>=20
>>>>>> Sorry for the confusion. I rushed the report
>>>>>> in hopes of getting to sleep. It was not to be.
>>>>>>=20
>>>>>>> It fails just after the FreeBSD/SMP lines,
>>>>>>> reporting "kernel trap 9 with interrupts disabled".
>>>>>>>=20
>>>>>>> It fails in pmap_force_invaldiate_cache_range at
>>>>>>> a clflusl (%rax) instruction that produces a
>>>>>>> "Fatal trap 9: general protection fault while
>>>>>>> in kernel mode". cpudid=3D0 apic id=3D 00
>>>>>>>=20
>>>>>>> I used kernel.txz files from:
>>>>>>>=20
>>>>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
>>>>>>>=20
>>>>>>> to narrow the range of kernel builds for working -> failing
>>>>>>> and got:
>>>>>>>=20
>>>>>>> -r338804 boots fine
>>>>>>> (no amd64 kernel builds between to try)
>>>>>>> -r338810+ fails (any that I tried, anyway)
>>>>>>>=20
>>>>>>> In that range is -r338807 :
>>>>>>>=20
>>>>>>> QUOTE
>>>>>>> Author: kib
>>>>>>> Date: Wed Sep 19 19:35:02 2018
>>>>>>> New Revision: 338807
>>>>>>> URL:
>>>>>>> https://svnweb.freebsd.org/changeset/base/338807
>>>>>>>=20
>>>>>>>=20
>>>>>>> Log:
>>>>>>> Convert x86 cache invalidation functions to ifuncs.
>>>>>>>=20
>>>>>>> This simplifies the runtime logic and reduces the number of
>>>>>>> runtime-constant branches.
>>>>>>>=20
>>>>>>> Reviewed by: alc, markj
>>>>>>> Sponsored by:        The FreeBSD Foundation
>>>>>>> Approved by: re (gjb)
>>>>>>> Differential revision:
>>>>>>> https://reviews.freebsd.org/D16736
>>>>>>>=20
>>>>>>> Modified:
>>>>>>> head/sys/amd64/amd64/pmap.c
>>>>>>> head/sys/amd64/include/pmap.h
>>>>>>> head/sys/dev/drm2/drm_os_freebsd.c
>>>>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c
>>>>>>> head/sys/i386/i386/pmap.c
>>>>>>> head/sys/i386/i386/vm_machdep.c
>>>>>>> head/sys/i386/include/pmap.h
>>>>>>> head/sys/x86/iommu/intel_utils.c
>>>>>>> END QUOTE
>>>>>>>=20
>>>>>>> There do seem to be changes associated with
>>>>>>> clflush(...) use. Looking at:
>>>>>>>=20
>>>>>>>=20
>>>>> =
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=3D339=
432
>>>>>>>=20
>>>>>>> it appears that pmap_force_invalidate_cache_range has not
>>>>>>> changed since -r338807.
>>>>>>>=20
>>>>>>> It seems that -r338806 and -r3388810 would be unlikely
>>>>>>> contributors.
>>>>>>=20
>>>>>=20
>>>>> I went after my native-boot loader problem first because I
>>>>> could switch kernels via the loader for booting FreeBSD under
>>>>> Hyper-V. Switching loaders is more of a problem.
>>>>>=20
>>>>> In order to avoid the loader-time crash I switched to building
>>>>> installing based on WITHOUT_ZFS=3D . I've had no active use of
>>>>> ZFS in years. (The old official-build loaders that worked were
>>>>> non-ZFS ones.)
>>>>>=20
>>>>> This took care of the native-boot loader-crash --and, to my
>>>>> surprise, also the Hyper-V-boot kernel-time crash.
>>>>>=20
>>>>> My private builds now boot the 1950X in both contexts just
>>>>> fine.
>>>>>=20
>>>>> During my early investigation I did pick up specific changes
>>>>> from after -r339076 that seemed to be tied to Ryzen and such.
>>>>> (They made no difference to the boot problems at the time
>>>>> but I saw no reason to remove them.)
>>>>>=20
>>>>> # uname -apKU
>>>>> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 =
r339076:339432M: Sun
>>>>> Oct 21 16:44:25 PDT 2018     =
markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/G=
ENERIC-NODBG
>>>>> amd64 amd64 1200084 1200084
>>>>=20
>>>>=20
>>> (stupid gmail)
>>>=20
>>> The phrase "no active use" bothers me. What does that mean? Are =
there any
>>> ZFS pools or any disks that any whiff of ZFSish thing on it at all?
>>> Clearly, there's something in the zfs boot loader that's freaking =
out by
>>> something on your system, but absent that information I can't help =
you.
>>>=20
>>=20
>> It would help to get output from loader lsdev -v command.
>=20
> That turned out to be very interesting: The non-ZFS loader
> crashes during the listing, during disk8, which shows a
> x0 instead of a x512.
>=20

Yes, thats the root cause there. The non-zfs loader does only *read* the =
boot disk, thats why the issue was not revealed there.=20

It would help to identify the sector size for that disk, at least from =
OS, so we can compare with what we can get from INT13.

I have pretty good idea what to look there, but I am afraid we need to =
run few tests with you to understand why that disk is reporting sector =
size 0 there.

rgds,
toomas


> Hand transcribed from pictures:
>=20
> OK lsdev -v
> disk devices
> disk0: BIOS drive C (937703088 x 512):
> disk0p1: FreeBSD boot 512K
> disk0p2: FreeBSD UFS  356G
> disk0p3: FreeBSD swap 15G
> disp0p4: FreeBSD swap 76G
> disk1: BIOS drive D (16514064 x 512):
> disk1s1: Linux   2048KB
> disk1s2: Unknown 952GB
> disk2: BIOS drive E (16514064 x 512):
> disk2p1: Unknown 128MB
> disk3: BIOS drive F (16514064 x 512):
> disk3p1: Unknown 128MB
> disk4: BIOS drive G (16434495 x 512):
> disk2p1: Unknown     128MB
> disk4p2: DOS/Windwos 1716GB
> disk5: BIOS drive H (16434495 x 512):
> disk5p1: FreeBSD boot 512K
> disk5p2: FreeBSD UFS  176G
> disk5p3: FreeBSD swap 193G
> disp5p4: FreeBSD swap 15G
> disk6: BIOS drive I (16434495 x 512):
> disk6p1: Unknown     499MB
> disk6p2: EFI         99MB
> disk6p3: Unknown     16MB
> disp6p4: DOS/Windows 886G
> dis7: BIOS drive H (16434495 x 512):
> disk7p1: FreeBSD boot 512K
> disk7p2: FreeBSD UFS  953G
> disk8: BIOS drive K (262144 x 0):
>=20
> int=3D00000000  err=3D00000000  efl=3D00010246  eip=3D000286bd
> eax=3D00000000  ebx=3D72b50430  ecx=3D00000000  edx=3D00000000
> esi=3D00000000  edi=3D00092080  ebp=3D00091eec  esp=3D00091ea8
> cs=3D002b  ds=3D0033  es=3D0033    fs=3D0033  gs=3D0033  ss=3D0033
> cs:eip=3Df7 f1 89 c1 85 d2 0f 85-d8 01 00 00 6a 05 58 85
>       f6 0f 88 75 01 00 00 <tel:88%2075%2001%2000%2000> 89-cb c1 fb 1f =
89 ca 03 55
> ss:esp=3D09 00 00 00 00 00 00 <tel:09%2000%2000%2000%2000%2000%2000> =
00-0a 00 00 00 02 00 00 00 <tel:00%2000%2000%2002%2000%2000%2000>
>       00 00 00 00 00 00 00 <tel:00%2000%2000%2000%2000%2000%2000> =
00-78 1f 09 00 33 45 04 00 <tel:09%2000%2033%2045%2004%2000>
> BTX halted
>=20
> I expect that "disk8" is what gpart show -p
> from a native boot showed as:
>=20
> =3D>       1  60062499    da1  MBR  (29G)
>        1        31         - free -  (16K)
>       32 =C2=A060062468 <tel:32%20%C2%A060062468>  da1s1  fat32lba  =
(29G)
>=20
> (That gpart show -p output is in another of the
> list messages.)
>=20
>> Also if you could test boot loader with UEFI - for example get to =
loader prompt via usb/cd boot and then get the same lsdev -v output.
>=20
> Still true given the above crash? Or, going the
> other way, should "drive8" be left as it is in
> order to be sure to do this test with the drive
> present?
>=20
> If I do this test later, it will take a bit to
> get media to do it with. (It is about 4AM in the
> morning and I've yet to get to sleep.)
>=20
> Note: I've never tried a UEFI based boot of FreeBSD
> on this machine (but the Windows 10 Pro x64 is EFI
> based). The only FreeBSD context using a EFI partition
> to boot that I have used is on an arm aarch64
> Cortex-A57 system.
>=20
>> I would be interested to see the sector size information and if the =
UEFI loader does also have issues.
>=20
> Understood.
>=20
>> If it does, I=E2=80=99d like to see the outputs from commands:
>=20
>> zpool status
>> zpool import
>=20
> Independent of the UEFI test . . .
>=20
> I do have a -r331924 head version on another one
> of the devices and can native-boot that. It still
> has its ZFS software (but a default loader without
> ZFS).
>=20
> Trying from that context, hand transcribed:
>=20
> # zpool status
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> no pools available
> # zpool import
> #
>=20
> [That was based on the old (default) loader being
> a non-ZFS one.]
>=20
>=20
> =3D=3D=3D
> Mark Millard
> marklmi at yahoo.com <http://yahoo.com/>;
> ( dsl-only.net <http://dsl-only.net/>; went
> away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9AEF5EB3-C393-44D1-9BD4-D0E59FE97CCE>