Date: Wed, 02 Feb 2022 03:46:21 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 261671] rc script fails to start gssd on 12.3 Message-ID: <bug-261671-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261671 Bug ID: 261671 Summary: rc script fails to start gssd on 12.3 Product: Base System Version: 12.3-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: conf Assignee: bugs@FreeBSD.org Reporter: bugs.freebsd@scourger.nl Created attachment 231515 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D231515&action= =3Dedit Patch with with a workaround. On FreeBSD 12.3, gssd fails to start on boot. ## Environment I installed a clean FreeBSD 12.3 system with minimal configuration changes. It mounts a few NVSv4 filesystems using Kerberos for authentication. Users = and groups are stored in LDAP. A very minimal set of packages is installed to provide the functionality (see attached pkg.txt). NFS mounts are specified in /etc/fstab with (among others) the "late" flag = set. Contents of /etc/rc.conf are included as an attachment. The system uses boot environments with subordinate filesystems like shown b= elow (currently only one BE): # zfs list -r -o name,mountpoint,canmount,mounted fenrir/ROOT NAME MOUNTPOINT CANMOUNT MOUNTED fenrir/ROOT none on no fenrir/ROOT/default none noauto yes fenrir/ROOT/default/usr /usr noauto yes fenrir/ROOT/default/usr/local /usr/local noauto yes fenrir/ROOT/default/var /var noauto yes After configuration of the system, I tested my setup by starting the daemons and invoking "mount -a -l", and the NFS filesystems got mounted succesfully. Then came the moment of the first reboot, where I was confronted with an interrupted boot process at the "mountlate" stage (asking to go into single user mode or proceed to multi-user). I have used virtually the same setup on earlier hosts without problems since the 10.X era (including the FreeBSD 12.2 system I'm writing this on). For g= ood measure, I also tried to upgrade an existing 12.2 install to 12.3 in a boot environment without subordinate datasets. This resulted in the same error condition. ## Problem description During boot, gssd(8) fails to start properly on FreeBSD 12.3. Any "late" NF= Sv4 filesystem in /etc/fstab fail to mount during boot. The console shows an error message when it tries to start gssd, as shown in= the following snippet: Starting file system checks: Mounting local filesystems:. /etc/rc: WARNING: run_rc_command: cannot run /usr/sbin/gssd ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg 32-bit compatibility ldconfig path: /usr/lib32 The same configuration works fine on FreeBSD 12.2. It appears that the culp= rit is a change in the ordering of rc files. On FreeBSD 12.3, the 'gssd' script gets wedged between 'zfsbe' and 'zfs' (s= ee the attached rcorder-12.3.orig). On 12.2, gssd is started much later in the boot process (well after NETWORK= ING; see attached rcorder-12.2.orig). As a test, I made a minor change to the gssd script to see if the rc orderi= ng was indeed the problem. Adding NETWORKING to the REQUIRE line seems to be sufficient to fix the booting problem. I also added "BEFORE: mountcritremo= te" to make sure gssd doesn't start too late on diskless clients (though I have= n't tested diskless). See the attached gssd.patch for the exact changes that I made. The patch changes the startup order to the one listed in rcorder-12.3.fixed. To test the hypothesis that rc ordering is indeed the issue, I tried 4 testcases: Case 1: default /etc/rc.d/gssd, no NFS filesystems in /etc/fstab The system boots without obvious issues, but gssd is not running. Trying to mount a NFSv4 filesystem immediately returns "Permission denied= ". If you start gssd manually, mounting NFSv4 works. Case 2: default /etc/rc.d/gssd, NFS filesystems in /etc/fstab gssd doen't start during boot, as in case 1. The boot process is interrupted during the "mountlate" stage, when it tri= es to mount the NFS filesystems. If you choose to proceed into multi-user mode, you'll have to manually ca= ncel further mount attempts during boot. Once in multi-user mode, depending on how quickly/often CTRC-c was presse= d to abort "mountlate", 0 or more instances of gssd are running (I've observed 1= and 2). Even if only 1 instance of gssd is running, it is not possible to mount N= FSv4 filesystems. A manual mount hangs in the "[rpccon]" state before timing out with a "Permission denied" error: root@fenrir:~ # mount /net/cerberus/incoming/ load: 0.01 cmd: mount_nfs 48471 [rpccon] 0.86r 0.00u 0.00s 0% 8080k load: 0.01 cmd: mount_nfs 48471 [rpccon] 1.88r 0.00u 0.00s 0% 8080k load: 0.01 cmd: mount_nfs 48471 [rpccon] 2.99r 0.00u 0.00s 0% 8080k mount_nfs: nmount: /net/cerberus/incoming: Permission denied After killing all gssd instances and running "service gssd restart", moun= ting the filesystems is possible. Case 3: modified /etc/rc.d/gssd, no NFS filesystems in /etc/fstab The system boots without issue, gssd is running and NFSv4 filesystems can= be mounted manually. Case 4: modified /etc/rc.d/gssd, NFS filesystems in /etc/fstab The system boots as expected, gssd is running and filesystems are automatically mounted as expected. These results seem to confirm that the problem stems from an attempt to sta= rt gssd too early. Note that I haven't tested this with NFSv3 or non-Kerberized NFSv4, so it is possible that those work fine. ## How to reproduce Do a fresh installation of FreeBSD 12.3, and perform the minimal required configuration for gssd. Running "service gssd start" should succesfully lau= nch the daemon. Reboot, and observe that gssd hasn't started. ## Solution A simple fix would be to change the REQUIRE line in the gssd rc file. But t= hat might just be patchwork that hides the actual problem. It is unclear to me why the rc ordering is so different between 12.2 and 12= .3; as far as I can see there haven't been any big changes to any of the files = in /etc/rc.d. However, one of the few rc scripts that changed is in fact gssd = (see review D27203 ). Ironically, that commit doesn't seem to cause the problem. Using the 12.2 version of the gssd rc script on FreeBSD 12.3 still causes a startup failure. In any case, there are huge differences when comparing the output of "rcord= er /etc/rc.d/*" between 12.2 and 12.3, while the contents of files in /etc/rc.d are almost exactly the same. At this point, my guess is that something has changed in the behaviour of rcorder(8) itself. I can't say if that is inten= ded, or a bug. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261671-227>