Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Feb 2022 03:46:21 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 261671] rc script fails to start gssd on 12.3
Message-ID:  <bug-261671-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261671

            Bug ID: 261671
           Summary: rc script fails to start gssd on 12.3
           Product: Base System
           Version: 12.3-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: conf
          Assignee: bugs@FreeBSD.org
          Reporter: bugs.freebsd@scourger.nl

Created attachment 231515
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D231515&action=
=3Dedit
Patch with with a workaround.

On FreeBSD 12.3, gssd fails to start on boot.

## Environment

I installed a clean FreeBSD 12.3 system with minimal configuration changes.
It mounts a few NVSv4 filesystems using Kerberos for authentication. Users =
and
groups are stored in LDAP. A very minimal set of packages is installed to
provide the functionality (see attached pkg.txt).
NFS mounts are specified in /etc/fstab with (among others) the "late" flag =
set.
Contents of /etc/rc.conf are included as an attachment.
The system uses boot environments with subordinate filesystems like shown b=
elow
(currently only one BE):

# zfs list -r -o name,mountpoint,canmount,mounted fenrir/ROOT
NAME                           MOUNTPOINT  CANMOUNT  MOUNTED
fenrir/ROOT                    none              on       no
fenrir/ROOT/default            none          noauto      yes
fenrir/ROOT/default/usr        /usr          noauto      yes
fenrir/ROOT/default/usr/local  /usr/local    noauto      yes
fenrir/ROOT/default/var        /var          noauto      yes

After configuration of the system, I tested my setup by starting the daemons
and invoking "mount -a -l", and the NFS filesystems got mounted succesfully.
Then came the moment of the first reboot, where I was confronted with an
interrupted boot process at the "mountlate" stage (asking to go into single
user mode or proceed to multi-user).

I have used virtually the same setup on earlier hosts without problems since
the 10.X era (including the FreeBSD 12.2 system I'm writing this on). For g=
ood
measure, I also tried to upgrade an existing 12.2 install to 12.3 in a boot
environment without subordinate datasets. This resulted in the same error
condition.

## Problem description

During boot, gssd(8) fails to start properly on FreeBSD 12.3. Any "late" NF=
Sv4
filesystem in /etc/fstab fail to mount during boot.

The console shows an error message when it tries to start gssd, as shown in=
 the
following snippet:
  Starting file system checks:
  Mounting local filesystems:.
  /etc/rc: WARNING: run_rc_command: cannot run /usr/sbin/gssd
  ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
/usr/local/lib/compat/pkg /usr/local/lib/compat/pkg
  32-bit compatibility ldconfig path: /usr/lib32

The same configuration works fine on FreeBSD 12.2. It appears that the culp=
rit
is a change in the ordering of rc files.
On FreeBSD 12.3, the 'gssd' script gets wedged between 'zfsbe' and 'zfs' (s=
ee
the attached rcorder-12.3.orig).
On 12.2, gssd is started much later in the boot process (well after NETWORK=
ING;
see attached rcorder-12.2.orig).

As a test, I made a minor change to the gssd script to see if the rc orderi=
ng
was indeed the problem. Adding NETWORKING to the REQUIRE line seems to be
sufficient to fix the booting problem. I also added "BEFORE:  mountcritremo=
te"
to make sure gssd doesn't start too late on diskless clients (though I have=
n't
tested diskless). See the attached gssd.patch for the exact changes that I
made. The patch changes the startup order to the one listed in
rcorder-12.3.fixed.

To test the hypothesis that rc ordering is indeed the issue, I tried 4
testcases:
Case 1: default /etc/rc.d/gssd, no NFS filesystems in /etc/fstab
  The system boots without obvious issues, but gssd is not running.
  Trying to mount a NFSv4 filesystem immediately returns "Permission denied=
".
  If you start gssd manually, mounting NFSv4 works.

Case 2: default /etc/rc.d/gssd, NFS filesystems in /etc/fstab
  gssd doen't start during boot, as in case 1.
  The boot process is interrupted during the "mountlate" stage, when it tri=
es
to mount the NFS filesystems.
  If you choose to proceed into multi-user mode, you'll have to manually ca=
ncel
further mount attempts during boot.
  Once in multi-user mode, depending on how quickly/often CTRC-c was presse=
d to
abort "mountlate", 0 or more instances of gssd are running (I've observed 1=
 and
2).
  Even if only 1 instance of gssd is running, it is not possible to mount N=
FSv4
filesystems. A manual mount hangs in the "[rpccon]" state before timing out
with a "Permission denied" error:
    root@fenrir:~ # mount /net/cerberus/incoming/
    load: 0.01  cmd: mount_nfs 48471 [rpccon] 0.86r 0.00u 0.00s 0% 8080k
    load: 0.01  cmd: mount_nfs 48471 [rpccon] 1.88r 0.00u 0.00s 0% 8080k
    load: 0.01  cmd: mount_nfs 48471 [rpccon] 2.99r 0.00u 0.00s 0% 8080k
    mount_nfs: nmount: /net/cerberus/incoming: Permission denied
  After killing all gssd instances and running "service gssd restart", moun=
ting
the filesystems is possible.

Case 3: modified /etc/rc.d/gssd, no NFS filesystems in /etc/fstab
  The system boots without issue, gssd is running and NFSv4 filesystems can=
 be
mounted manually.

Case 4: modified /etc/rc.d/gssd, NFS filesystems in /etc/fstab
  The system boots as expected, gssd is running and filesystems are
automatically mounted as expected.

These results seem to confirm that the problem stems from an attempt to sta=
rt
gssd too early.

Note that I haven't tested this with NFSv3 or non-Kerberized NFSv4, so it is
possible that those work fine.

## How to reproduce

Do a fresh installation of FreeBSD 12.3, and perform the minimal required
configuration for gssd. Running "service gssd start" should succesfully lau=
nch
the daemon.
Reboot, and observe that gssd hasn't started.

## Solution

A simple fix would be to change the REQUIRE line in the gssd rc file. But t=
hat
might just be patchwork that hides the actual problem.

It is unclear to me why the rc ordering is so different between 12.2 and 12=
.3;
as far as I can see there haven't been any big changes to any of the files =
in
/etc/rc.d. However, one of the few rc scripts that changed is in fact gssd =
(see
review D27203 ). Ironically, that commit doesn't seem to cause the problem.
Using the 12.2 version of the gssd rc script on FreeBSD 12.3 still causes a
startup failure.

In any case, there are huge differences when comparing the output of "rcord=
er
/etc/rc.d/*" between 12.2 and 12.3, while the contents of files in /etc/rc.d
are almost exactly the same. At this point, my guess is that something has
changed in the behaviour of rcorder(8) itself. I can't say if that is inten=
ded,
or a bug.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261671-227>