Date: Wed, 27 Apr 2016 20:51:51 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 209112] /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Message-ID: <bug-209112-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209112 Bug ID: 209112 Summary: /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Product: Base System Version: 10.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: agifford@infowest.com On a host with multiple jails (configured using /etc/jail.conf) that mount a common directory as read-only with the jail, some jails will randomly, sile= ntly fail to launch due to nullfs mounting failing (silently). This also occurs= on FreeBSD 10.2 and possibly earlier. DETAILS: I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a co= mmon read-only base system. On reboot, only one or two of the three will start, = and I cannot predict which ones. The first jail listed in /etc/jail.conf will usually launch just fine. But subsequent ones fail (and I cannot predict wh= ich ones will succedd or fail). There are NO logs indicating the reason for fai= lure on the main system, nor in the jails' individual console log files. To track down the problem, I added some debugging logging into the /etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my jail.conf configuration: /etc/jail.conf: jail1 { host.hostname =3D "jail1.example.org"; path =3D "/usr/local/jail/jail1"; ip4.addr =3D 127.0.0.11; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail2 { host.hostname =3D "jail2.example.org"; path =3D "/usr/local/jail/jail2"; ip4.addr =3D 127.0.0.12; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail3 { host.hostname =3D "jail3.example.org"; path =3D "/usr/local/jail/jail3"; ip4.addr =3D 127.0.0.11; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case statement subsection, to capture the output stored in the $_tmp file on error/failure I added: echo "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG cat $_tmp >> /tmp/DEBUG echo "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG I reboot the FreeBSD 10.3 system. Only a SINGLE jail started, the first on= e. The output of /tmp/DEBUG showed me: PRESTART_jail1 POSTSTART_jail1 DEBUG: Contents of '/tmp/jail.hyLntGie' are: mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by de= vice mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by de= vice jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail: failed jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail: failed jail1: created DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS Ah ha! The nullfs mounting failed! BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails = in parallel and there may be some file system resource that the parallel mount= ing of the common directory is encountering. And unfortunately the failure was SILENT! No logs! BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information! WORKAROUND FOR THE INTERIM: I can force /usr/sbin/jail to launch my jails sequentially by adding to each jail's /etc/jail.conf section a "depend =3D" line, like: jail2 { ... depend =3D jail1; ... } jail3 { ... depend =3D jail2; ... } This strikes me as a very brittle work-around. And if one jail fails to la= unch for some other reason, all subsequent jails would fail. The best solution would be to eliminate whatever resource contention is goi= ng on here. Google searches revealed a jail_parallel_start=3DNO rc.conf variable, but t= hose appeared to be related to the /etc/rc.d/jail script doing things in paralle= l.=20 In this bug, it is the /usr/sbin/jail command executing as a single process that is likely doing things in parallel (or perhaps sequentially but quickly enough that there is some resource contention in the nullfs mounting still) unless the depend=3D settings are included. Thanks for your help! Aaron out. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209112-8>