Date: Wed, 27 Apr 2016 20:51:51 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 209112] /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Message-ID: <bug-209112-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209112 Bug ID: 209112 Summary: /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Product: Base System Version: 10.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: agifford@infowest.com On a host with multiple jails (configured using /etc/jail.conf) that mount a common directory as read-only with the jail, some jails will randomly, silently fail to launch due to nullfs mounting failing (silently). This also occurs on FreeBSD 10.2 and possibly earlier. DETAILS: I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a common read-only base system. On reboot, only one or two of the three will start, and I cannot predict which ones. The first jail listed in /etc/jail.conf will usually launch just fine. But subsequent ones fail (and I cannot predict which ones will succedd or fail). There are NO logs indicating the reason for failure on the main system, nor in the jails' individual console log files. To track down the problem, I added some debugging logging into the /etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my jail.conf configuration: /etc/jail.conf: jail1 { host.hostname = "jail1.example.org"; path = "/usr/local/jail/jail1"; ip4.addr = 127.0.0.11; mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/basejail nullfs ro 0 0"; exec.consolelog = "/var/log/jail_${host.hostname}.log"; exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'"; exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail2 { host.hostname = "jail2.example.org"; path = "/usr/local/jail/jail2"; ip4.addr = 127.0.0.12; mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail nullfs ro 0 0"; exec.consolelog = "/var/log/jail_${host.hostname}.log"; exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'"; exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail3 { host.hostname = "jail3.example.org"; path = "/usr/local/jail/jail3"; ip4.addr = 127.0.0.11; mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail nullfs ro 0 0"; exec.consolelog = "/var/log/jail_${host.hostname}.log"; exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'"; exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case statement subsection, to capture the output stored in the $_tmp file on error/failure I added: echo "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG cat $_tmp >> /tmp/DEBUG echo "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG I reboot the FreeBSD 10.3 system. Only a SINGLE jail started, the first one. The output of /tmp/DEBUG showed me: PRESTART_jail1 POSTSTART_jail1 DEBUG: Contents of '/tmp/jail.hyLntGie' are: mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by device mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by device jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail: failed jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail: failed jail1: created DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS Ah ha! The nullfs mounting failed! BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails in parallel and there may be some file system resource that the parallel mounting of the common directory is encountering. And unfortunately the failure was SILENT! No logs! BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information! WORKAROUND FOR THE INTERIM: I can force /usr/sbin/jail to launch my jails sequentially by adding to each jail's /etc/jail.conf section a "depend =" line, like: jail2 { ... depend = jail1; ... } jail3 { ... depend = jail2; ... } This strikes me as a very brittle work-around. And if one jail fails to launch for some other reason, all subsequent jails would fail. The best solution would be to eliminate whatever resource contention is going on here. Google searches revealed a jail_parallel_start=NO rc.conf variable, but those appeared to be related to the /etc/rc.d/jail script doing things in parallel. In this bug, it is the /usr/sbin/jail command executing as a single process that is likely doing things in parallel (or perhaps sequentially but quickly enough that there is some resource contention in the nullfs mounting still) unless the depend= settings are included. Thanks for your help! Aaron out. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209112-8>
