Date: Thu, 31 Oct 2024 19:48:53 -0600 From: Warner Losh <imp@bsdimp.com> To: Justin Hibbits <jhibbits@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: Direct dumped kernel cores Message-ID: <CANCZdfph7NzmuGgJN6vXEzzz2cyAGM28kPH2Mzj%2Bxk5Li=37eQ@mail.gmail.com> In-Reply-To: <20241031211151.795eba3e@ralga.knownspace> References: <20241031182354.14fa48aa@ralga.knownspace> <CANCZdfrobB-ZM3aMmD%2BAsjud3%2BM-_kkMB3SqTpaKTxtmY1x3Yg@mail.gmail.com> <20241031211151.795eba3e@ralga.knownspace>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000ce2d4e0625d02319 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 31, 2024, 7:11=E2=80=AFPM Justin Hibbits <jhibbits@freebsd.org>= wrote: > On Thu, 31 Oct 2024 16:32:51 -0600 > Warner Losh <imp@bsdimp.com> wrote: > > > On Thu, Oct 31, 2024 at 4:24=E2=80=AFPM Justin Hibbits <jhibbits@freebs= d.org> > > wrote: > > > > > Hi everyone, > > > > > > At Juniper we've been using a so-called 'rescue' kernel for dumping > > > vmcores directly to the filesystem after a panic. We're now > > > contributing this feature, implemented by Klara Systems, to > > > FreeBSD, and looking for feedback. I posted a review > > > at https://reviews.freebsd.org/D47358 for anyone interested. > > > > > > Interesting bits to keep in mind: > > > * It requires a 2-stage build process, one to build the rescue > > > kernel, the other to build the main kernel, which embeds the rescue > > > kernel inside its image. This might need some further work. > > > * Thus far it's been implemented for amd64 and arm64, once proven > > > out, other architectures (powerpc64/le, riscv64) can follow suit. > > > * Kernel environment bits to pass down to the rescue kernel are > > > prefixed `debug.rescue.`, for instance > > > `debug.rescue.vfs.root.mountfrom`. > > > > > > > First off, this is kinda cool. I've wanted this occasionally when my > > swap partition is too small (though in my case, it was easy enough to > > add another drive to the system that was panicking and dump to that). > > > > I do have a question: I'm curious why you didn't follow the Linux > > lead of having > > a kexec_load(2) system call to load the 'rescue kernel' to make this > > more generic. > > That would make the leap to having full kexec support (eg > > reboot(CMD_KEXEC) a lot easier to implement. > > > > Warner > > One problem with trying to kexec_load() a rescue kernel is that the > rescue kernel needs its own memory to work with, a contiguous block, so > needs to be loaded early, or at least reserved early. Without its > reserved memory it would be stomping over the 'host' kernel's > memory. That said, I do like that direction, and it's definitely worth > exploring. > That's exactly what kexec_load does. When the crash happens, the current kernel constructs a new memory map and passes that to the preloaded crash kernel so it knows what memory can safely be used plus info needed to do the crash dump. For the replacement kernel, the reboot copies a miniloader that copies the kernel to the load address, tears the cpu down to the warm reset state and jumps to the trampoline used to start the kernel. Loader.kboot writes that trampoline, creates the EFIlike style metadata and a memory map. And then calls reboot to boot into the new kernel. Warner - Justin > > > > > > > > There are many more details in the review summary. > > > > > > We'd love to get feedback from anyone interested. > > > > > > Thanks, > > > Justin Hibbits > > > > > > > > --000000000000ce2d4e0625d02319 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"gmail_attr">On Thu, Oct 31, 2024, 7:11=E2=80=AFPM Justin Hibbits &= lt;<a href=3D"mailto:jhibbits@freebsd.org">jhibbits@freebsd.org</a>> wro= te:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b= order-left:1px #ccc solid;padding-left:1ex">On Thu, 31 Oct 2024 16:32:51 -0= 600<br> Warner Losh <<a href=3D"mailto:imp@bsdimp.com" target=3D"_blank" rel=3D"= noreferrer">imp@bsdimp.com</a>> wrote:<br> <br> > On Thu, Oct 31, 2024 at 4:24=E2=80=AFPM Justin Hibbits <<a href=3D"= mailto:jhibbits@freebsd.org" target=3D"_blank" rel=3D"noreferrer">jhibbits@= freebsd.org</a>><br> > wrote:<br> > <br> > > Hi everyone,<br> > ><br> > > At Juniper we've been using a so-called 'rescue' kern= el for dumping<br> > > vmcores directly to the filesystem after a panic.=C2=A0 We're= now<br> > > contributing this feature, implemented by Klara Systems, to<br> > > FreeBSD, and looking for feedback. I posted a review<br> > > at <a href=3D"https://reviews.freebsd.org/D47358" rel=3D"noreferr= er noreferrer" target=3D"_blank">https://reviews.freebsd.org/D47358</a> for= anyone interested.<br> > ><br> > > Interesting bits to keep in mind:<br> > > * It requires a 2-stage build process, one to build the rescue<br= > > > kernel, the other to build the main kernel, which embeds the resc= ue<br> > > kernel inside its image.=C2=A0 This might need some further work.= <br> > > * Thus far it's been implemented for amd64 and arm64, once pr= oven<br> > > out, other architectures (powerpc64/le, riscv64) can follow suit.= <br> > > * Kernel environment bits to pass down to the rescue kernel are<b= r> > >=C2=A0 =C2=A0prefixed `debug.rescue.`, for instance<br> > >=C2=A0 =C2=A0`debug.rescue.vfs.root.mountfrom`.<br> > >=C2=A0 <br> > <br> > First off, this is kinda cool. I've wanted this occasionally when = my<br> > swap partition is too small (though in my case, it was easy enough to<= br> > add another drive to the system that was panicking and dump to that).<= br> > <br> > I do have a question: I'm curious why you didn't follow the Li= nux<br> > lead of having<br> > a kexec_load(2) system call to load the 'rescue kernel' to mak= e this<br> > more generic.<br> > That would make the leap to having full kexec support (eg<br> > reboot(CMD_KEXEC) a lot easier to implement.<br> > <br> > Warner<br> <br> One problem with trying to kexec_load() a rescue kernel is that the<br> rescue kernel needs its own memory to work with, a contiguous block, so<br> needs to be loaded early, or at least reserved early.=C2=A0 Without its<br> reserved memory it would be stomping over the 'host' kernel's<b= r> memory.=C2=A0 That said, I do like that direction, and it's definitely = worth<br> exploring.<br></blockquote></div></div><div dir=3D"auto"><br></div><div dir= =3D"auto">That's exactly what kexec_load does. When the crash happens, = the current kernel constructs a new memory map and passes that to the prelo= aded crash kernel so it knows what memory can safely be used plus info need= ed to do the crash dump.</div><div dir=3D"auto"><br></div><div dir=3D"auto"= >For the replacement kernel, the reboot copies a miniloader that copies the= kernel to the load address, tears the cpu down to the warm reset state and= jumps to the trampoline used to start the kernel.</div><div dir=3D"auto"><= br></div><div dir=3D"auto">Loader.kboot writes that trampoline, creates the= EFIlike style metadata and a memory map. And then calls reboot to boot int= o the new kernel.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Warner= </div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quo= te"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> - Justin<br> <br> > <br> > <br> > > There are many more details in the review summary.<br> > ><br> > > We'd love to get feedback from anyone interested.<br> > ><br> > > Thanks,<br> > > Justin Hibbits<br> > ><br> > >=C2=A0 <br> <br> </blockquote></div></div></div> --000000000000ce2d4e0625d02319--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfph7NzmuGgJN6vXEzzz2cyAGM28kPH2Mzj%2Bxk5Li=37eQ>