Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Feb 2019 16:53:17 -0800
From:      Conrad Meyer <cem@freebsd.org>
To:        Cy Schubert <Cy.Schubert@cschubert.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: nosh init system
Message-ID:  <CAG6CVpWXOA6r_aJcefxQBu2QZxprf1ZpDoTb4eb2JSwWsE2m%2Bg@mail.gmail.com>
In-Reply-To: <201902092334.x19NYtZe036559@slippy.cwsent.com>
References:  <yaneurabeya@gmail.com> <CF8D2DCD-F63A-4E79-9CBC-CD45D5D596DD@gmail.com> <201902092334.x19NYtZe036559@slippy.cwsent.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Cy,

On Sat, Feb 9, 2019 at 3:35 PM Cy Schubert <Cy.Schubert@cschubert.com> wrot=
e:
> I don't see what's so "incredibly fragile" about rc(8). That's not to
> say there aren't better solutions, like SMF.

Maybe "incredibly" as a choice of adjective is inappropriate.  I think
we (you, me, and ngie@) can all agree it is somewhat fragile, and
there are things SMF/systemd/nosh get right that rc(8) does not
(today).  Anyway, your next paragraph goes on to be a good start at
describing some of rc's fragility.  :-)

> Where rc(8) falls down is any port or a customer's (user of FreeBSD) rc
> script could fail hosing the boot or worse hosing the system*. Where a
> solution like SMF solves the problem is that should a service which
> other services depend on fail, only that branch of the startup tree
> would fail.

Right; that's a great example.

> In that scenario, if a service fails but sshd start, a
> sysadmin would still be able to login remotely to resolve the problem.
> So in this regard rc(8) is at a disadvantage.
>
> We could address the above paragraph by starting sshd earlier during
> boot thereby allowing the opportunity to fix remotely.

I don't think that is really sufficient without substantially
modifying init+rc to be closer to something like systemd or SMF,
anyway.  And then we'd rather just have something like SMF :-).

As soon as *any* rc service fails to start (signal, non-zero exit,
stop_boot), rc(8) exits non-zero, causing init(8) to go to single
user.  All service state is thrown away with rc(8) exit, but any rc.d
"services" that managed to start before boot failed are not
terminated.  Even if an admin manages to log in and fix the
configuration, re-starting rc(8) restarts the runcom process from
scratch, as if nothing had already been done, without first stopping
anything that was already running.  The only safe, reproducible way to
re-start rc(8) is to fully reboot the system.

E.g., the major pain point we run into repeatedly with restarted boot
is that cleanvar / cleartmp run again.  This breaks ld-elf.so.hints
cache (anything linking /usr/local libraries =E2=80=94 hope your admin is
running base sshd and not openssh-portable!) as well as wiping out
/var/run pid files (breaking "already running?" rc pid checks).  As a
result, services get double-started.

Cleanvar could maybe be improved to avoid this problem =E2=80=94 e.g., we
could coordinate with the kernel to set a per-boot, persisted flag
that cleanvar has completed, even if rc(8) exits =E2=80=94 but the broad cl=
ass
of problems would remain (rc.d autostart is stateful, but any partial
failure destroys all state).

Best regards,
Conrad



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpWXOA6r_aJcefxQBu2QZxprf1ZpDoTb4eb2JSwWsE2m%2Bg>