Date: Tue, 8 Dec 2020 19:10:26 +0100 From: Alban Hertroys <haramrae@gmail.com> To: John Kennedy <warlock@phouka.net> Cc: freebsd-current@freebsd.org Subject: Re: KLD zfs.ko: depends on kernel - not available or version mismatch Message-ID: <2B044A92-500F-4121-85DB-D486865C75B5@gmail.com> In-Reply-To: <X8%2BebiETBdNpbRXt@phouka1.phouka.net> References: <42AC7323-5AD6-401D-9A7D-F1D962EE5717@gmail.com> <X8%2BebiETBdNpbRXt@phouka1.phouka.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 8 Dec 2020, at 16:40, John Kennedy <warlock@phouka.net> wrote: >=20 > On Tue, Dec 08, 2020 at 08:56:25AM +0100, Alban Hertroys wrote: >> This seems to have gotten lost in the moderate queue, but after a = week I am no closer to a solution, so here???s a resend: >>=20 >> I???ve been trying to get a fresh world running (for the eventual = purpose of running amdgpu against my recent graphics adapter), but I run = into trouble with core loadable kernel modules, such as zfs.ko from the = subject. It also happens with other modules that I tried randomly, for = example, geom_mirror.ko. >>=20 >> I updated to the latest current using svn up in /usr/src, then: >> make clean >> make buildworld kernel -j12 >> shutdown -r now >>=20 >> boot to single user mode >>=20 >> kldload zfs >=20 > I'm not sure you've provided enough information for a one-shot = armchair > diagnosis, but some things seem factually wrong. For example, my = normal > rebuild procedure is: >=20 > cd /usr/src && make buildworld && make buildkernel > make installkernel > shutdown -r now >=20 > cd /usr/src && mergemaster -pi > make installworld > mergemaster -Fi > make -DBATCH_DELETE_OLD_FILES delete-old Aha! So that=E2=80=99s how to prevent having to press =E2=80=98y=E2=80=99 = for every deprecated file! > shutdown -r now >=20 > cd /usr/src && make -DBATCH_DELETE_OLD_FILES delete-old-libs >=20 > (I'm on a desktop system here. You haven't described your setup.) This is also a desktop system. > You didn't say that you've installed the new kernel, which at least = starts > you down the road towards a driver/kernel mismatch. You presumably = have a > non-ZFS boot+root. I=E2=80=99m fairly sure I did, actually. Last time I checked, "make buildworld buildkernel" was equivalent to = "make buildworld && make buildkernel", while "make kernel=E2=80=9D is a = shorthand for =E2=80=9Cmake buildkernel && make installkernel=E2=80=9D So, unless I=E2=80=99m mistaken, =E2=80=9Cmake buildworld kernel=E2=80=9D = should be equivalent to your first two lines. Nevertheless, I retried without these assumptions, the result was the = same. I forgot to =E2=80=9Cmake delete-old=E2=80=9D though, I rarely = remember to do that=E2=80=A6 > Did you mess around with the ZFS from ports (ZoL -> ZoF) > at some point so you're not using the kernel's ZFS drivers? What ZFS > entries do you have in /etc/loader.conf, /etc/rc.conf, and some of the > varients that may get dragged in? (see rc.conf(5) for possibilities) Nope, stock modules here. > At the bottom of your email, you say / is UFS and /usr is ZFS, but I = guess we > have the extra fun of wondering what is under /usr on your /? If you = have a > pre-ZFS /usr that is populated by something now presumably very old = (because > all the new, current stuff went onto ZFS /usr, now unavailable). There is no populated directory /usr on the UFS file-system. This = install was created on a fresh NVME disk based on an existing install on = a spinning platter. The install happened with /usr mounted at the ZFS = file-system. I had to copy over several files from /etc and /usr/local/etc and = re-installed the most important packages. This was admittedly a bit = messy, it is possible that I forgot to copy something over. (Originally my intention was to dd the contents of the spinning disk = over, but apparently that disk has a few wonky sectors, dd failed after = a few device timeouts) I did sort-of manage to fix things, but recent kernels keep causing the = same issue: I noticed that uname -a said I was at revision 366335, while I had the = source tree up-to-date. For a test, I reverted back to that revision and = went through: make buildworld make buildkernel Which broke on /usr/local/sys/drm-current-kmod, which I turned out to = have installed through pkg. There have been changes to the linux_kpi = shortly after above revision - probably what broke compatibility between = HEAD and r366335. After removing that pkg, the kernel built and installed, world installed = fine too and I have a working system again, with kernel and world in = sync. So I tried again to move to HEAD: cd /usr/src svn up make buildworld -j12 make buildkernel -j12 make installkernel shutdown -r now <single user mode> mount -u / zpool import -Nf system (my /usr FS) KLD zfs.ko: depends on kernel - not available or version mismatch linker_load_file: /boot/kernel/zfs.ko - unsupported file type >> Which results in dmesg messages: >>=20 >> KLD zfs.ko: depends on kernel - not available or version mismatch >> linker_load_file: /boot/kernel/zfs.ko - unsupported file type >> KLD zfs.ko: depends on kernel - not available or version mismatch >> linker_load_file: /boot/kernel/zfs.ko - unsupported file type >> KLD zfs.ko: depends on kernel - not available or version mismatch >> linker_load_file: /boot/kernel/zfs.ko - unsupported file type >> KLD zfs.ko: depends on kernel - not available or version mismatch >> linker_load_file: /boot/kernel/zfs.ko - unsupported file type >=20 > Be sure to check out /var/log/messages for extra issues. For example, = with > the bug I mentioned below, I couldn't load my nvidia driver and that = manifested > as one driver having issues because it depended on another, which had = the root > of the problem. I forgot to look there. If I find anything suspicious there, I=E2=80=99ll = let you know. That system doesn=E2=80=99t have a convenient mail client = yet, so for now its copying output to files and scp-ing that to the Mac. >> I can load the zfs kernel module from kernel.old just fine: >>=20 >> ZFS filesystem version: 5 >> ZFS storage pool version: features support (5000) >=20 > I kicked my more bleeding-edge system over from 12.2-rel (r366954) up = into > 13.0-current (r367044, 1300123) on 2020/10/26. OpenZFS kicked in = 2020/8/24? > I think the CFT was ~2018/8/21, not sure when we had the OpenZFS = ports. > Current bumps the ABI version pretty frequently so I'd think you'd = have > tripped across versioning issues a long time ago if you had some = drivers not > being rebuilt. Having a conflict between kernel and world was what I was expecting too, = but I can=E2=80=99t figure out what got me into that situation. For all = I know, they should be in sync now, especially after I reverted the tree = back to rev 366335 and making world again (acc. to above method). >=20 >> This happens with any kernel module I???ve tried, such as geom_mirror = and amdgpu (from ports/graphics/drm-current-kmod - the latter causes a = kernel panic with kernel.old BTW). >>=20 >> I???ve gone back as far as Oct 7 (before changes to = kern/elf_load_obj.c off the top of my head), looked at mailing list = archives and forums etc, all to no avail. >>=20 >> I have / on UFS+J and /usr on ZFS and nothing in /etc/src.conf. I had = /etc/malloc.conf with the recommended symlink from UPDATING, but the = same happens with that moved out of the way. Nothing seems to help. >>=20 >> Do I need to go back further to get into a usable state or is there = something else I should be doing? >=20 > With very few exceptions (bug 250897, 2020/11/6), I've found = 13-current > bootable since 10/26 (up through my current system, 13.0 r368388 = (2020/12/6). > You obviously need to make sure that an extra drivers you add in are = compiled > against the kernel, but ZFS is typically one of those. I think we covered that. Thanks for the help and the pointers, but unfortunately the mystery = remains. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2B044A92-500F-4121-85DB-D486865C75B5>