Date: Sat, 20 Dec 2025 23:11:37 -0700 From: Warner Losh <imp@bsdimp.com> To: A FreeBSD User <freebsd@walstatt-de.de> Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: CURRENT: havock: elf_load_section: truncated ELF file Message-ID: <CANCZdfqUJe7TwTQh%2B5rUU34mYNGWr=4EsmCnb_%2BhWrKG57QrcA@mail.gmail.com> In-Reply-To: <20251220233127.2ad04793@thor.sb211.local> References: <20251220141124.1606aa7c@thor.sb211.local> <CANCZdfo7SeJkYOO9eun%2Bfz-0yY5MW0OJ1G%2B--Ysnb3H2dR8qAA@mail.gmail.com> <20251220233127.2ad04793@thor.sb211.local>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On Sat, Dec 20, 2025 at 3:31 PM A FreeBSD User <freebsd@walstatt-de.de>
wrote:
> Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700
> Warner Losh <imp@bsdimp.com> schrieb:
>
> > On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <freebsd@walstatt-de.de>
> > wrote:
> >
> > > Hello,
> > >
> > > recently a small server running recent CURRENT with a UFS basesd system
> > > SSD (NVMe) and a data
> > > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA
> > > controler) gets corrupted
> > > because of "loosing" a driver - this time the system reported TWO
> drives a
> > > removed froma RAID
> > > level 5 - which is like a death sentence.
> > >
> > > I guess this is a fallout of the recently changed timie parameters to
> the
> > > CAM infrastructure
> > > (I can't find any notes on this in man cam, so I feel lost).
> > >
> >
> > Unlikely, but you can set this in the boot loader:
> > kern.cam.tur_timeout=60
> > kern.cam.inquiry_timeout=60
> > kern.cam.modesense_timeout=60
>
> I'll check, thanks. Are these OIDs documented somewhere to be at hand just
> in case? I searched
> the recent cam manpage ...
>
scsi.4:
SYSCTL VARIABLES
The following variables are available as both sysctl(8) variables and
loader(8) tunables:
kern.cam.cam_srch_hi
Search above LUN 7 for SCSI3 and greater devices.
kern.cam.tur_timeout
Timeout, in ms, for the initial TESTUNITREADY command we send to
the
devices during their initial probing. Defaults to 1s. FreeBSD 15
and earlier set this to 60s.
kern.cam.inquiry_timeout
Timeout, in ms, for the initial INQUIRY command we send to the
devices during their initial probing. Defaults to 1s. FreeBSD 15
and earlier set this to 60s.
kern.cam.reportluns_timeout
Timeout, in ms, for the initial REPORTLUNS command we send to the
devices during their initial probing. Defaults to 50s.
kern.cam.modesense_timeout
Timeout, in ms, for the initial MODESENSE command we send to the
devices during their initial probing. Defaults to 1s. FreeBSD 15
and earlier set this to 60s.
> >
> > and see if that works. You should see new errors on boot if his is the
> > issue. Can you share a dmesg?
> >
> > I kinda doubt they'd cause the issues that you've had. If disks are gone,
> > then there'd be different errors to what you are seeing, I'd think.
> >
> > To recover, your best bet is to use a USB stick from one of the release
> or
> > snapshots.
>
> In earlier times, when "make installkernel and/or make installworld
> crashed midair, some
> binaries in the installed tree were corrupted and since I run CURRENT
> which has a tough pace
> at the moment, the USB image booting should be close to the CURRENT made
> via "make world" ...
> I assume. I did so and had some problems with the new pkg concept ...
> (working offline, is a
> problem with the install-blob.txz ...)
>
Yuck. Sorry that was a source of trouble for you.
> >
> > Warner
> >
> >
> > > A very desastrous side effect of this crash was the inability to reboot
> > > the box (CURRENT pre-
> > > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET
> > > 2025amd64, the runtime
> > > system was from 16th or 17th of December).
> > > After several tenth of minutes I had to hadr reboot the box - with
> obvious
> > > data loss on the
> > > system SSD. And here my problems start to turn into a mess.
> > >
> > > After the first initial reboot I performed a fsck -fy, rebootet and
> > > whitnessed that
> > > jails didn't come up anymore and SSHD didn't work. So I installed
> prior to
> > > the crash already
> > > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae"
> (as
> > > the sibling box which
> > > is runnig great by the way, but different CPU and smaller RAID, but
> also
> > > system SSD based on
> > > UFS filesystem, same HBA. So CURRENT seem to operate in general on
> similar
> > > hardware.
> > >
> > > After the second reboot with the old kernel the box in question went
> into
> > > debugger, rebooting
> > > in single user mode and performing fsck -fy revealed a lot of repairs
> on
> > > the first partitions,
> > > /, /var, /usr. After a reboot I realized that most services now are
> broken
> > > - jails do not
> > > start, sshd doesn't start and the whole system is going into multiuser,
> > > but seems to have
> > > serious problems.
> > >
> > > uname -a remains empty
> > > cd /usr/src; make buildworld returns immediately empty, no further
> action
> > > service ldconfig start also returns complete empty on console
> > >
> > > Several onboard/base tools simply return nothing.
> > >
> > > trying "/resucue/sh" (install date indicates 20th of December, so it is
> > > the latest ) seems to
> > > give me the first indication of something has terribly gone wrong or
> even
> > > /rescue/vi (to edit
> > > loader to change to boot.old):
> > >
> > > elf_load_section: truncated ELF file
> > > Abort trap
> > >
> > > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt
> > > (as far as I can
> > > check, all timestamps are 20th Dec 2025, 9:48 UTC).
> > >
> > > Well, since this is not the first time I ran into some problems using
> > > CURRENT, the outage due
> > > to two lost ZFS drives after the recent chenges seems worthy to make
> some
> > > note here.
> > >
> >
> > Can you provide error messages at boot for this? You talk about fsck and
> > about ZFS, so I'm a little confused as to your setup.
>
> No need to be confused: the CURRENT crashed/froze after two of five HDD
> were reported as
> "removed" from a RAIDZ pool. The box hung forever.
>
> The OS resides on a SSD with UFS. After > 30 min I had to switch off/on
> the box physically.
> So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed"
> after reboot and
> checking the HDD. UFS SSD didn't ...
>
>
> I spent a while now to bring back everything. Boot device is now ZFS, too.
> And, therefore,
> obvious slower but somehow save.
>
> The only issue I have now is a crash after a reboot. While rebooting and
> killing jails, the
> box drops into kernel debugger ...
>
> Somehow I need to copy the picture I made from the box, since the machine
> isn't connected to
> the net at the moment ...
>
> >
> > Warner
> >
> >
> > > The other question would be how to fix: one strategy would be to boot
> from
> > > an official image
> > > from flash drive and try to perform a "make installkernel
> installworld".
> > > Maybe there is
> > > another way idicativ to that what I described above ...
> > >
> >
> >
> >
> >
> > > Thanks in advance,
> > >
> > > oh
> > >
> > >
> > > --
> > >
> > > A FreeBSD user
> > >
>
>
>
> --
>
> A FreeBSD user
>
[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Sat, Dec 20, 2025 at 3:31 PM A FreeBSD User <<a href="mailto:freebsd@walstatt-de.de">freebsd@walstatt-de.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Am Tage des Herren Sat, 20 Dec 2025 08:10:59 -0700<br>
Warner Losh <<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>> schrieb:<br>
<br>
> On Sat, Dec 20, 2025 at 6:12 AM A FreeBSD User <<a href="mailto:freebsd@walstatt-de.de" target="_blank">freebsd@walstatt-de.de</a>><br>
> wrote:<br>
> <br>
> > Hello,<br>
> ><br>
> > recently a small server running recent CURRENT with a UFS basesd system<br>
> > SSD (NVMe) and a data<br>
> > graveyard based on RAID level 5 with ZFS (attached to a Fujitsu HBA<br>
> > controler) gets corrupted<br>
> > because of "loosing" a driver - this time the system reported TWO drives a<br>
> > removed froma RAID<br>
> > level 5 - which is like a death sentence.<br>
> ><br>
> > I guess this is a fallout of the recently changed timie parameters to the<br>
> > CAM infrastructure<br>
> > (I can't find any notes on this in man cam, so I feel lost).<br>
> > <br>
> <br>
> Unlikely, but you can set this in the boot loader:<br>
> kern.cam.tur_timeout=60<br>
> kern.cam.inquiry_timeout=60<br>
> kern.cam.modesense_timeout=60<br>
<br>
I'll check, thanks. Are these OIDs documented somewhere to be at hand just in case? I searched<br>
the recent cam manpage ...<br></blockquote><div><br></div><div>scsi.4:</div><div>SYSCTL VARIABLES<br> The following variables are available as both sysctl(8) variables and<br> loader(8) tunables:<br><br> kern.cam.cam_srch_hi<br> Search above LUN 7 for SCSI3 and greater devices.<br><br> kern.cam.tur_timeout<br> Timeout, in ms, for the initial TESTUNITREADY command we send to the<br> devices during their initial probing. Defaults to 1s. FreeBSD 15<br> and earlier set this to 60s.<br><br> kern.cam.inquiry_timeout<br> Timeout, in ms, for the initial INQUIRY command we send to the<br> devices during their initial probing. Defaults to 1s. FreeBSD 15<br> and earlier set this to 60s.<br><br> kern.cam.reportluns_timeout<br> Timeout, in ms, for the initial REPORTLUNS command we send to the<br> devices during their initial probing. Defaults to 50s.<br><br> kern.cam.modesense_timeout<br> Timeout, in ms, for the initial MODESENSE command we send to the<br> devices during their initial probing. Defaults to 1s. FreeBSD 15<br> and earlier set this to 60s.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> <br>
> and see if that works. You should see new errors on boot if his is the<br>
> issue. Can you share a dmesg?<br>
> <br>
> I kinda doubt they'd cause the issues that you've had. If disks are gone,<br>
> then there'd be different errors to what you are seeing, I'd think.<br>
> <br>
> To recover, your best bet is to use a USB stick from one of the release or<br>
> snapshots.<br>
<br>
In earlier times, when "make installkernel and/or make installworld crashed midair, some<br>
binaries in the installed tree were corrupted and since I run CURRENT which has a tough pace<br>
at the moment, the USB image booting should be close to the CURRENT made via "make world" ...<br>
I assume. I did so and had some problems with the new pkg concept ... (working offline, is a<br>
problem with the install-blob.txz ...)<br></blockquote><div><br></div><div>Yuck. Sorry that was a source of trouble for you.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> <br>
> Warner<br>
> <br>
> <br>
> > A very desastrous side effect of this crash was the inability to reboot<br>
> > the box (CURRENT pre-<br>
> > 16.0-CURRENT #11 master-n282659-7f39d05b67ae: Sat Dec 20 09:35:32 CET<br>
> > 2025amd64, the runtime<br>
> > system was from 16th or 17th of December).<br>
> > After several tenth of minutes I had to hadr reboot the box - with obvious<br>
> > data loss on the<br>
> > system SSD. And here my problems start to turn into a mess.<br>
> ><br>
> > After the first initial reboot I performed a fsck -fy, rebootet and<br>
> > whitnessed that<br>
> > jails didn't come up anymore and SSHD didn't work. So I installed prior to<br>
> > the crash already<br>
> > compiled CURRENT from /usr/src which is "master-n282659-7f39d05b67ae" (as<br>
> > the sibling box which<br>
> > is runnig great by the way, but different CPU and smaller RAID, but also<br>
> > system SSD based on<br>
> > UFS filesystem, same HBA. So CURRENT seem to operate in general on similar<br>
> > hardware.<br>
> ><br>
> > After the second reboot with the old kernel the box in question went into<br>
> > debugger, rebooting<br>
> > in single user mode and performing fsck -fy revealed a lot of repairs on<br>
> > the first partitions,<br>
> > /, /var, /usr. After a reboot I realized that most services now are broken<br>
> > - jails do not<br>
> > start, sshd doesn't start and the whole system is going into multiuser,<br>
> > but seems to have<br>
> > serious problems.<br>
> ><br>
> > uname -a remains empty<br>
> > cd /usr/src; make buildworld returns immediately empty, no further action<br>
> > service ldconfig start also returns complete empty on console<br>
> ><br>
> > Several onboard/base tools simply return nothing.<br>
> ><br>
> > trying "/resucue/sh" (install date indicates 20th of December, so it is<br>
> > the latest ) seems to<br>
> > give me the first indication of something has terribly gone wrong or even<br>
> > /rescue/vi (to edit<br>
> > loader to change to boot.old):<br>
> ><br>
> > elf_load_section: truncated ELF file<br>
> > Abort trap<br>
> ><br>
> > Checking /boot/kernel, /lib, /usr/lib, /bin or /sbin seems to be intakt<br>
> > (as far as I can<br>
> > check, all timestamps are 20th Dec 2025, 9:48 UTC).<br>
> ><br>
> > Well, since this is not the first time I ran into some problems using<br>
> > CURRENT, the outage due<br>
> > to two lost ZFS drives after the recent chenges seems worthy to make some<br>
> > note here.<br>
> > <br>
> <br>
> Can you provide error messages at boot for this? You talk about fsck and<br>
> about ZFS, so I'm a little confused as to your setup.<br>
<br>
No need to be confused: the CURRENT crashed/froze after two of five HDD were reported as<br>
"removed" from a RAIDZ pool. The box hung forever. <br>
<br>
The OS resides on a SSD with UFS. After > 30 min I had to switch off/on the box physically.<br>
So the UFS filesystem had a bump (journalling didn't fix it). ZFS "healed" after reboot and<br>
checking the HDD. UFS SSD didn't ...<br>
<br>
<br>
I spent a while now to bring back everything. Boot device is now ZFS, too. And, therefore,<br>
obvious slower but somehow save. <br>
<br>
The only issue I have now is a crash after a reboot. While rebooting and killing jails, the<br>
box drops into kernel debugger ...<br>
<br>
Somehow I need to copy the picture I made from the box, since the machine isn't connected to<br>
the net at the moment ...<br></blockquote><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> <br>
> Warner<br>
> <br>
> <br>
> > The other question would be how to fix: one strategy would be to boot from<br>
> > an official image<br>
> > from flash drive and try to perform a "make installkernel installworld".<br>
> > Maybe there is<br>
> > another way idicativ to that what I described above ...<br>
> > <br>
> <br>
> <br>
> <br>
> <br>
> > Thanks in advance,<br>
> ><br>
> > oh<br>
> ><br>
> ><br>
> > --<br>
> ><br>
> > A FreeBSD user<br>
> > <br>
<br>
<br>
<br>
-- <br>
<br>
A FreeBSD user<br>
</blockquote></div></div>
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqUJe7TwTQh%2B5rUU34mYNGWr=4EsmCnb_%2BhWrKG57QrcA>
