Date: Sat, 20 Dec 2025 23:31:00 +0100 From: A FreeBSD User <freebsd@walstatt-de.de> To: Warner Losh <imp@bsdimp.com> Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: CURRENT: havock: elf_load_section: truncated ELF file Message-ID: <20251220233127.2ad04793@thor.sb211.local> In-Reply-To: <CANCZdfo7SeJkYOO9eun%2Bfz-0yY5MW0OJ1G%2B--Ysnb3H2dR8qAA@mail.gmail.com> References: <20251220141124.1606aa7c@thor.sb211.local> <CANCZdfo7SeJkYOO9eun%2Bfz-0yY5MW0OJ1G%2B--Ysnb3H2dR8qAA@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700 Warner Losh <imp@bsdimp.com> schrieb: > On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <freebsd@walstatt-de.de> > wrote: > > > Hello, > > > > recently a small server running recent CURRENT with a UFS basesd system > > SSD (NVMe) and a data > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA > > controler) gets corrupted > > because of "loosing" a driver - this time the system reported TWO drives a > > removed froma RAID > > level 5 - which is like a death sentence. > > > > I guess this is a fallout of the recently changed timie parameters to the > > CAM infrastructure > > (I can't find any notes on this in man cam, so I feel lost). > > > > Unlikely, but you can set this in the boot loader: > kern.cam.tur_timeout=60 > kern.cam.inquiry_timeout=60 > kern.cam.modesense_timeout=60 I'll check, thanks. Are these OIDs documented somewhere to be at hand just in case? I searched the recent cam manpage ... > > and see if that works. You should see new errors on boot if his is the > issue. Can you share a dmesg? > > I kinda doubt they'd cause the issues that you've had. If disks are gone, > then there'd be different errors to what you are seeing, I'd think. > > To recover, your best bet is to use a USB stick from one of the release or > snapshots. In earlier times, when "make installkernel and/or make installworld crashed midair, some binaries in the installed tree were corrupted and since I run CURRENT which has a tough pace at the moment, the USB image booting should be close to the CURRENT made via "make world" ... I assume. I did so and had some problems with the new pkg concept ... (working offline, is a problem with the install-blob.txz ...) > > Warner > > > > A very desastrous side effect of this crash was the inability to reboot > > the box (CURRENT pre- > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET > > 2025amd64, the runtime > > system was from 16th or 17th of December). > > After several tenth of minutes I had to hadr reboot the box - with obvious > > data loss on the > > system SSD. And here my problems start to turn into a mess. > > > > After the first initial reboot I performed a fsck -fy, rebootet and > > whitnessed that > > jails didn't come up anymore and SSHD didn't work. So I installed prior to > > the crash already > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae" (as > > the sibling box which > > is runnig great by the way, but different CPU and smaller RAID, but also > > system SSD based on > > UFS filesystem, same HBA. So CURRENT seem to operate in general on similar > > hardware. > > > > After the second reboot with the old kernel the box in question went into > > debugger, rebooting > > in single user mode and performing fsck -fy revealed a lot of repairs on > > the first partitions, > > /, /var, /usr. After a reboot I realized that most services now are broken > > - jails do not > > start, sshd doesn't start and the whole system is going into multiuser, > > but seems to have > > serious problems. > > > > uname -a remains empty > > cd /usr/src; make buildworld returns immediately empty, no further action > > service ldconfig start also returns complete empty on console > > > > Several onboard/base tools simply return nothing. > > > > trying "/resucue/sh" (install date indicates 20th of December, so it is > > the latest ) seems to > > give me the first indication of something has terribly gone wrong or even > > /rescue/vi (to edit > > loader to change to boot.old): > > > > elf_load_section: truncated ELF file > > Abort trap > > > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt > > (as far as I can > > check, all timestamps are 20th Dec 2025, 9:48 UTC). > > > > Well, since this is not the first time I ran into some problems using > > CURRENT, the outage due > > to two lost ZFS drives after the recent chenges seems worthy to make some > > note here. > > > > Can you provide error messages at boot for this? You talk about fsck and > about ZFS, so I'm a little confused as to your setup. No need to be confused: the CURRENT crashed/froze after two of five HDD were reported as "removed" from a RAIDZ pool. The box hung forever. The OS resides on a SSD with UFS. After > 30 min I had to switch off/on the box physically. So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed" after reboot and checking the HDD. UFS SSD didn't ... I spent a while now to bring back everything. Boot device is now ZFS, too. And, therefore, obvious slower but somehow save. The only issue I have now is a crash after a reboot. While rebooting and killing jails, the box drops into kernel debugger ... Somehow I need to copy the picture I made from the box, since the machine isn't connected to the net at the moment ... > > Warner > > > > The other question would be how to fix: one strategy would be to boot from > > an official image > > from flash drive and try to perform a "make installkernel installworld". > > Maybe there is > > another way idicativ to that what I described above ... > > > > > > > > Thanks in advance, > > > > oh > > > > > > -- > > > > A FreeBSD user > > -- A FreeBSD user [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQRQheDybVktG5eW/1Kxzvs8OqokrwUCaUcjvwAKCRCxzvs8Oqok r2+yAQC3BqDvqRP4NtrqKTrwXOyb2Z2RhwFVyF6lCMVVB/HCVAEAjMNla3Px2A6a 8hTrBPW3Yb2jk4CNKTW4EwgEHO8tSQE= =6xmJ -----END PGP SIGNATURE-----help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20251220233127.2ad04793>
