From nobody Thu Feb 8 18:20:23 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TW4zS0Gv9z5B8hF for ; Thu, 8 Feb 2024 18:20:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TW4zR3qcBz4h2T for ; Thu, 8 Feb 2024 18:20:35 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-5601eb97b29so328105a12.0 for ; Thu, 08 Feb 2024 10:20:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1707416434; x=1708021234; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PnjMT1LtGANngJhwhkLg6HqTeGvyYSAnqMlvXFstoeY=; b=JQj7UfLYOfv6aefy87SBnOZrwiQeZ/uYQuDXmekbeA/jPA5w3zlO1MrmCC7MPTSZBj +fZJ2iLguMZ8vZaUcUigb5EpshgL/ndzWNagVkzHzXTFppN9meBGn/5DjxeSN3P7hqjH ieCEEX7Xi+TEzFGT0vOa3KpIQ+DFrCTea5SkwCz0xe5KM9n5GLepAXZrUIOzoOXNqCz4 TRb/8Nbvj9/uFJgE07E/HKs7dOciYGupAsYhEYCNdBQNPXjEQ/BE5LEmZgw+6Qw+RsPZ NFVaMnIgl7dFJGM+uLrpuwGhfO/KBu0WVR9tEEwRlotuCsGfSFxa5Dkc3j99KmE79Iis 7Tng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707416434; x=1708021234; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PnjMT1LtGANngJhwhkLg6HqTeGvyYSAnqMlvXFstoeY=; b=w7WMnNrsj4y+vguq9t8UaIgzCRWWiZeXheHmOOECGxHyyuDk7dcU+3CX6Xw+kJuUok wWFlmSTkHzsbgopSyVmm1k+T1cHakeCgdLsjPW9M1Goh6Z0IVyWcEB3n6QPEBBmB2P1E 4pous73aCmn96ehmIdUc2MSu/xSYO5dRnJcgkkTYlyqLQMmA5E2XWLXr4eWvINyV+SdS 58rnWolyeN7fx0RFWC05uET4w8emO+gI/0q8kjGkwJ7BHPticjnVCLFffYuTmkymqbGn DikCfD6WAAJ9nV3rkxaGDOdcfQXLnMyeBErw1T7lnhVk7susKUPi3QYwUqx380ofjyLH zIrQ== X-Gm-Message-State: AOJu0Yw1qlPA7pg9Qm0OGUhROS1ZJL091OZ4c2j69VzBlro41/t2nQ7F Ny9IuP/W4qhFReKfKn2t52Parusuxm+zy8WRZS+oAF6nl4VsH0A8XqUgykHdcQQLAg2n0sdRY5T QTAGryUjgQaY8yMDS3gU6Nl9jD/b5JxJNGVVxFw== X-Google-Smtp-Source: AGHT+IGGVRKIVotgQ/dIHfmlVgcyj+KJXjG/+CSaapwYMe7tYXw1YMTFFG+9x7wav7EQHaVSjLugJ+zOiw2DxcRHsyE= X-Received: by 2002:a50:d795:0:b0:560:bb98:6b7d with SMTP id w21-20020a50d795000000b00560bb986b7dmr2785152edi.10.1707416434317; Thu, 08 Feb 2024 10:20:34 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <72def5a9-ffcc-4dcc-9b85-875ba7f46539@FreeBSD.org> <175dce9b-ee44-434c-b6b2-20717a04f6aa@FreeBSD.org> In-Reply-To: <175dce9b-ee44-434c-b6b2-20717a04f6aa@FreeBSD.org> From: Warner Losh Date: Thu, 8 Feb 2024 11:20:23 -0700 Message-ID: Subject: Re: git: e4ab361e5394 - main - fix poweroff regression from 9cdf326b4f by delaying shutdown_halt To: John Baldwin Cc: Andriy Gapon , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="0000000000000d2c090610e2de7c" X-Rspamd-Queue-Id: 4TW4zR3qcBz4h2T X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] --0000000000000d2c090610e2de7c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey John, On Thu, Feb 8, 2024 at 10:52=E2=80=AFAM John Baldwin wrot= e: > On 2/6/24 2:13 AM, Andriy Gapon wrote: > > On 06/02/2024 11:41, Andriy Gapon wrote: > >> The branch main has been updated by avg: > >> > >> URL: > https://cgit.FreeBSD.org/src/commit/?id=3De4ab361e53945a6c3e9d68c5e5ffc11= de40a35f2 > >> > >> commit e4ab361e53945a6c3e9d68c5e5ffc11de40a35f2 > >> Author: Andriy Gapon > >> AuthorDate: 2024-02-06 08:55:13 +0000 > >> Commit: Andriy Gapon > >> CommitDate: 2024-02-06 08:55:13 +0000 > >> > >> fix poweroff regression from 9cdf326b4f by delaying shutdown_hal= t > >> > >> The regression affected ACPI-based systems without EFI poweroff > support > >> (including VMs). > >> > >> The key reason for the regression is that I overlooked that > poweroff is > >> requested by RB_POWEROFF | RB_HALT combination of flags. In my > opinion, > >> that command is a bit bipolar, but since we've been doing that > forever, > >> then so be it. Because of that flag combination, the order of > >> shutdown_final handlers that check for either flag does matter. > >> > >> Some additional complexity comes from platform-specific > shutdown_final > >> handlers that aim to handle multiple reboot options at once. > E.g., > >> acpi_shutdown_final handles both poweroff and reboot / reset. A= s > >> explained in 9cdf326b4f, such a handler must run after > shutdown_panic to > >> give it a chance. But as the change revealed, the handler must > also run > >> before shutdown_halt, so that the system can actually power off > before > >> entering the halt limbo. > >> > >> Previously, shutdown_panic and shutdown_halt had the same > priority which > >> appears to be incompatible with handlers that can do both > poweroff and > >> reset. > > > > I want to add that having many handlers with priorities expressed like > > SHUTDOWN_PRI_LAST =C2=B1 N while some of those handlers have implicit > > inter-dependencies (interactions, interference) also does not help to > see a > > clear picture. > > > > Perhaps it would be better to handle all (reasonable) RB flag > combinations > > centrally in kern_reboot and then dispatch events like shutdown_reset, > > shutdown_poweroff, etc. Handlers for those events would have a single > and > > simple job of performing that one action (perhaps failing and letting > another > > handler try). > > I think having separate eventhandlers for shutdown, reset, and poweroff > seems > sensible. It also permits a given driver to use different priorities > (maybe it > wants to be first for poweroff but last for reset, etc.) > I'd come to this conclusion as well. The handlers shouldn't even look at the flags IMHO. We can create a hierarchy of power cycle > reset > power off > halt with power unchanged easily enough, and call the handlers in that order, letting individual drivers duke it out. > > Also, I would split reboot howto into command and flag portions, so tha= t > only > > one command can be specified at a time. E.g., I would consider > RB_AUTOBOOT > > ("RB_REBOOT"), RB_POWEROFF, RB_HALT to be distinct commands. Then, > flags like > > RB_NOSYNC or RB_DUMP could be optional flags. > > > > As an aside, some flags documented for reboot(2) do not seem to have > much to do > > with reboot. E.g., RB_DFLTROOT affects how a system boots up, but not > how the > > system goes for a reboot. Not surprisingly, that option is not handled > by > > anything kicked off with reboot(2). > > Maybe, it would make more sense if we had fast reboot support and the > running > > kernel could instruct the next kernel directly. But, it's still a bit > weird > > that flags like RB_POWEROFF and RB_DFLTROOT belong in the same domain > and can be > > set together. > > I would suggest deprecating flags that are no-ops. In modern systems if > you > want to control the next boot you do it via other means (nextboot, > efibootmgr, > etc.) and reboot(2) is not a good API for that. > Part of the problem is that they aren't NO-OPs. We use the same howto flags in the early boot that we use for reboot. There the flags mean something. This is passed in by the boot loader, and in this case, still does something. This dates as near as I can tell, to the VAX and other early Unix machines being able to pass a word (and maybe a little more) from one kernel to the next, a feature that's fallen out of fashion. > It might be hard to fully cleanup some of the hackiness here, but if you > can > at least isolate the flag weirdness handling in kern_reboot by having the > more > specific eventhandlers then that might fix most of the ugliness. > Yea, I think we should isolate the drivers from looking at 'howto' and have separate handlers for the following cases: power cycle, power off, reset and halt. I agree that some of the features that were hung on this word should be tor= n down and only done via boot next or possibly from the boot loader -> kernel handoff only. Now, what we do with the 'reboot' system call? It seems like we should mayb= e rework it in some way? Warner --0000000000000d2c090610e2de7c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey John,

On Thu, Feb 8, 2024 at 10:52= =E2=80=AFAM John Baldwin <jhb@freebsd= .org> wrote:
On 2/6/24 2:13 AM, Andriy Gapon wrote:
> On 06/02/2024 11:41, Andriy Gapon wrote:
>> The branch main has been updated by avg:
>>
>> URL: http= s://cgit.FreeBSD.org/src/commit/?id=3De4ab361e53945a6c3e9d68c5e5ffc11de40a3= 5f2
>>
>> commit e4ab361e53945a6c3e9d68c5e5ffc11de40a35f2
>> Author:=C2=A0 =C2=A0 =C2=A0Andriy Gapon <avg@FreeBSD.org> >> AuthorDate: 2024-02-06 08:55:13 +0000
>> Commit:=C2=A0 =C2=A0 =C2=A0Andriy Gapon <avg@FreeBSD.org> >> CommitDate: 2024-02-06 08:55:13 +0000
>>
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0fix poweroff regression from 9cdf326b4f = by delaying shutdown_halt
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0The regression affected ACPI-based syste= ms without EFI poweroff support
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0(including VMs).
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0The key reason for the regression is tha= t I overlooked that poweroff is
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0requested by RB_POWEROFF | RB_HALT combi= nation of flags.=C2=A0 In my opinion,
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0that command is a bit bipolar, but since= we've been doing that forever,
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0then so be it.=C2=A0 Because of that fla= g combination, the order of
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0shutdown_final handlers that check for e= ither flag does matter.
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0Some additional complexity comes from pl= atform-specific shutdown_final
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0handlers that aim to handle multiple reb= oot options at once.=C2=A0 E.g.,
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0acpi_shutdown_final handles both powerof= f and reboot / reset.=C2=A0 As
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0explained in 9cdf326b4f, such a handler = must run after shutdown_panic to
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0give it a chance.=C2=A0 But as the chang= e revealed, the handler must also run
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0before shutdown_halt, so that the system= can actually power off before
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0entering the halt limbo.
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0Previously, shutdown_panic and shutdown_= halt had the same priority which
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0appears to be incompatible with handlers= that can do both poweroff and
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0reset.
>
> I want to add that having many handlers with priorities expressed like=
> SHUTDOWN_PRI_LAST =C2=B1 N while some of those handlers have implicit<= br> > inter-dependencies (interactions, interference) also does not help to = see a
> clear picture.
>
> Perhaps it would be better to handle all (reasonable) RB flag combinat= ions
> centrally in kern_reboot and then dispatch events like shutdown_reset,=
> shutdown_poweroff, etc.=C2=A0 Handlers for those events would have a s= ingle and
> simple job of performing that one action (perhaps failing and letting = another
> handler try).

I think having separate eventhandlers for shutdown, reset, and poweroff see= ms
sensible.=C2=A0 It also permits a given driver to use different priorities = (maybe it
wants to be first for poweroff but last for reset, etc.)

I'd come to this conclusion as well. The handlers sho= uldn't even look at the flags
IMHO. We can create a hierarchy= of power cycle > reset > power off > halt with
power un= changed easily enough, and call the handlers in that order, letting individ= ual
drivers duke it out.=C2=A0
=C2=A0
> Also, I would split reboot howto into command and flag portions, so th= at only
> one command can be specified at a time.=C2=A0 E.g., I would consider R= B_AUTOBOOT
> ("RB_REBOOT"), RB_POWEROFF, RB_HALT to be distinct commands.= =C2=A0 Then, flags like
> RB_NOSYNC or RB_DUMP could be optional flags.
>
> As an aside, some flags documented for reboot(2) do not seem to have m= uch to do
> with reboot.=C2=A0 E.g., RB_DFLTROOT affects how a system boots up, bu= t not how the
> system goes for a reboot.=C2=A0 Not surprisingly, that option is not h= andled by
> anything kicked off with reboot(2).
> Maybe, it would make more sense if we had fast reboot support and the = running
> kernel could instruct the next kernel directly.=C2=A0 But, it's st= ill a bit weird
> that flags like RB_POWEROFF and RB_DFLTROOT belong in the same domain = and can be
> set together.

I would suggest deprecating flags that are no-ops.=C2=A0 In modern systems = if you
want to control the next boot you do it via other means (nextboot, efibootm= gr,
etc.) and reboot(2) is not a good API for that.

Part of the problem is that they aren't NO-OPs. We use the sam= e howto flags
in the early boot that we use for reboot. There the= flags mean something. This
is passed in by the boot loader, and = in this case, still does something. This dates
as near as I can t= ell, to the VAX and other early Unix machines being able to pass
= a word (and maybe a little more) from one kernel to the next, a feature tha= t's
fallen out of fashion.
=C2=A0
It might be hard to fully cleanup some of the hackiness here, but if you ca= n
at least isolate the flag weirdness handling in kern_reboot by having the m= ore
specific eventhandlers then that might fix most of the ugliness.

Yea, I think we should isolate the drivers from l= ooking at 'howto' and have
separate handlers for the foll= owing cases: power cycle, power off, reset and halt.=C2=A0
I agre= e that some of the features that were hung on this word should be torn
down and only done via boot next or possibly from the boot loader -&g= t; kernel
handoff only.

Now, what we do = with the 'reboot' system call? It seems like we should maybe
<= div>rework it in some way?

Warner
--0000000000000d2c090610e2de7c--