From nobody Fri Nov 1 01:48:53 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XfkLB0TGdz5bNSP for ; Fri, 01 Nov 2024 01:49:06 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XfkL95j9Wz4RCc for ; Fri, 1 Nov 2024 01:49:05 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x1034.google.com with SMTP id 98e67ed59e1d1-2e2eba31d3aso1156635a91.2 for ; Thu, 31 Oct 2024 18:49:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1730425744; x=1731030544; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZgIa43oSCKv7TBLcdTPRvgK19rp0VCREzaHEPW59GOs=; b=K44SUPHndncmLyuL6ToKCacufCxBHI2btrh1CgOSwSosBSVThrcFwuRnCFG7r/5WPw xdlu1nziTfxvNNsnab334g4Y7ZBmr+Mng/MZpXv1njRIfqH5xMIZlkm5mqujsqJL5/nO 87hXuDZMp8bVURriT76i+csyaT5ZUOXwUTxibc2lOwMwtdIzXEUxyGrSnr27IlojuGH7 l+iTYEl8ZfCQXZI23JI0s864HjKFTYHGOfEjI0RXV+RH6oUiP9WZLu3g9UoIM0yLABHg BgnOsBePMICZMhE53hXfQeiOP/cyjjD1E0etsfyAjT8ZkSPUBfVHTihE3BbhRkIXs1Y6 GDZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730425744; x=1731030544; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZgIa43oSCKv7TBLcdTPRvgK19rp0VCREzaHEPW59GOs=; b=uw93miIwEPNWmi7EjktbjOgXdMY+djIOwQ/YeaitAEfOysozM1Kp6M1YYkNJj2pXKn Bwk4WllbyqgICFBIcKdkDBcqOohjup/HPsVd1SaL07u9TrdIstBtev90hUvG4Q/p3Itt RrWjDklJ8OxPrTaSbaj4JoSTU8VJBgIYxzwKGS8dKB5mb2VskLkWDkgLiAdQ/VzmjVgi Gnw4hpTcqZjE+JUIZ6oYDxfxJfAdN4R1Gr3tHJrSzmxxtcXq+svtU7ZgjeJP+Vqu0btG o7EWA5g+72ncZc2DwxNRrXlGQul1yHvtUs/+DJJNL3AoSDEtWheX7AI2U6FIpNOSyu9R 4xYA== X-Gm-Message-State: AOJu0Yzyczr2mNXBlz/wPX9IjnWkz2ad7361U+9WxP61B5oP1tJAL8dR RLgFKa3C9+9RO+UFeY/r/497W5vBNPvp/9pgThrpnsDmpW1gt7C2uAWVP/zHK7zI5InFTxcTfyY InnvEZ9ya8fULSpAZ5IMiPoKw0P6fRJCY/JCG9/HQfXs7OGgzgZsv8A== X-Google-Smtp-Source: AGHT+IGlO/fhRa2dr7ooHebdla/xfQr7VLhFZD6Zv/bHA1oWvouHxTwoENZX7fNWz/mfK3mUKzsrqfhHAbgy6w9f+9U= X-Received: by 2002:a17:90a:1b8e:b0:2e1:e19f:609b with SMTP id 98e67ed59e1d1-2e8f1088115mr23748499a91.24.1730425744395; Thu, 31 Oct 2024 18:49:04 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 References: <20241031182354.14fa48aa@ralga.knownspace> <20241031211151.795eba3e@ralga.knownspace> In-Reply-To: <20241031211151.795eba3e@ralga.knownspace> From: Warner Losh Date: Thu, 31 Oct 2024 19:48:53 -0600 Message-ID: Subject: Re: Direct dumped kernel cores To: Justin Hibbits Cc: FreeBSD Hackers , "freebsd-arch@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000ce2d4e0625d02319" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4XfkL95j9Wz4RCc X-Spamd-Bar: ---- --000000000000ce2d4e0625d02319 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 31, 2024, 7:11=E2=80=AFPM Justin Hibbits = wrote: > On Thu, 31 Oct 2024 16:32:51 -0600 > Warner Losh wrote: > > > On Thu, Oct 31, 2024 at 4:24=E2=80=AFPM Justin Hibbits > > wrote: > > > > > Hi everyone, > > > > > > At Juniper we've been using a so-called 'rescue' kernel for dumping > > > vmcores directly to the filesystem after a panic. We're now > > > contributing this feature, implemented by Klara Systems, to > > > FreeBSD, and looking for feedback. I posted a review > > > at https://reviews.freebsd.org/D47358 for anyone interested. > > > > > > Interesting bits to keep in mind: > > > * It requires a 2-stage build process, one to build the rescue > > > kernel, the other to build the main kernel, which embeds the rescue > > > kernel inside its image. This might need some further work. > > > * Thus far it's been implemented for amd64 and arm64, once proven > > > out, other architectures (powerpc64/le, riscv64) can follow suit. > > > * Kernel environment bits to pass down to the rescue kernel are > > > prefixed `debug.rescue.`, for instance > > > `debug.rescue.vfs.root.mountfrom`. > > > > > > > First off, this is kinda cool. I've wanted this occasionally when my > > swap partition is too small (though in my case, it was easy enough to > > add another drive to the system that was panicking and dump to that). > > > > I do have a question: I'm curious why you didn't follow the Linux > > lead of having > > a kexec_load(2) system call to load the 'rescue kernel' to make this > > more generic. > > That would make the leap to having full kexec support (eg > > reboot(CMD_KEXEC) a lot easier to implement. > > > > Warner > > One problem with trying to kexec_load() a rescue kernel is that the > rescue kernel needs its own memory to work with, a contiguous block, so > needs to be loaded early, or at least reserved early. Without its > reserved memory it would be stomping over the 'host' kernel's > memory. That said, I do like that direction, and it's definitely worth > exploring. > That's exactly what kexec_load does. When the crash happens, the current kernel constructs a new memory map and passes that to the preloaded crash kernel so it knows what memory can safely be used plus info needed to do the crash dump. For the replacement kernel, the reboot copies a miniloader that copies the kernel to the load address, tears the cpu down to the warm reset state and jumps to the trampoline used to start the kernel. Loader.kboot writes that trampoline, creates the EFIlike style metadata and a memory map. And then calls reboot to boot into the new kernel. Warner - Justin > > > > > > > > There are many more details in the review summary. > > > > > > We'd love to get feedback from anyone interested. > > > > > > Thanks, > > > Justin Hibbits > > > > > > > > --000000000000ce2d4e0625d02319 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Oct 31, 2024, 7:11=E2=80=AFPM Justin Hibbits &= lt;jhibbits@freebsd.org> wro= te:
On Thu, 31 Oct 2024 16:32:51 -0= 600
Warner Losh <imp@bsdimp.com> wrote:

> On Thu, Oct 31, 2024 at 4:24=E2=80=AFPM Justin Hibbits <jhibbits@= freebsd.org>
> wrote:
>
> > Hi everyone,
> >
> > At Juniper we've been using a so-called 'rescue' kern= el for dumping
> > vmcores directly to the filesystem after a panic.=C2=A0 We're= now
> > contributing this feature, implemented by Klara Systems, to
> > FreeBSD, and looking for feedback. I posted a review
> > at https://reviews.freebsd.org/D47358 for= anyone interested.
> >
> > Interesting bits to keep in mind:
> > * It requires a 2-stage build process, one to build the rescue > > kernel, the other to build the main kernel, which embeds the resc= ue
> > kernel inside its image.=C2=A0 This might need some further work.=
> > * Thus far it's been implemented for amd64 and arm64, once pr= oven
> > out, other architectures (powerpc64/le, riscv64) can follow suit.=
> > * Kernel environment bits to pass down to the rescue kernel are > >=C2=A0 =C2=A0prefixed `debug.rescue.`, for instance
> >=C2=A0 =C2=A0`debug.rescue.vfs.root.mountfrom`.
> >=C2=A0
>
> First off, this is kinda cool. I've wanted this occasionally when = my
> swap partition is too small (though in my case, it was easy enough to<= br> > add another drive to the system that was panicking and dump to that).<= br> >
> I do have a question: I'm curious why you didn't follow the Li= nux
> lead of having
> a kexec_load(2) system call to load the 'rescue kernel' to mak= e this
> more generic.
> That would make the leap to having full kexec support (eg
> reboot(CMD_KEXEC) a lot easier to implement.
>
> Warner

One problem with trying to kexec_load() a rescue kernel is that the
rescue kernel needs its own memory to work with, a contiguous block, so
needs to be loaded early, or at least reserved early.=C2=A0 Without its
reserved memory it would be stomping over the 'host' kernel's memory.=C2=A0 That said, I do like that direction, and it's definitely = worth
exploring.

That's exactly what kexec_load does. When the crash happens, = the current kernel constructs a new memory map and passes that to the prelo= aded crash kernel so it knows what memory can safely be used plus info need= ed to do the crash dump.

For the replacement kernel, the reboot copies a miniloader that copies the= kernel to the load address, tears the cpu down to the warm reset state and= jumps to the trampoline used to start the kernel.
<= br>
Loader.kboot writes that trampoline, creates the= EFIlike style metadata and a memory map. And then calls reboot to boot int= o the new kernel.

Warner=

- Justin

>
>
> > There are many more details in the review summary.
> >
> > We'd love to get feedback from anyone interested.
> >
> > Thanks,
> > Justin Hibbits
> >
> >=C2=A0

--000000000000ce2d4e0625d02319--