From owner-freebsd-bugs@freebsd.org Wed Apr 27 20:51:51 2016 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A9320B1F145 for ; Wed, 27 Apr 2016 20:51:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8DFC819B9 for ; Wed, 27 Apr 2016 20:51:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u3RKpp8s002876 for ; Wed, 27 Apr 2016 20:51:51 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 209112] /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Date: Wed, 27 Apr 2016 20:51:51 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: agifford@infowest.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Apr 2016 20:51:51 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209112 Bug ID: 209112 Summary: /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs Product: Base System Version: 10.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: agifford@infowest.com On a host with multiple jails (configured using /etc/jail.conf) that mount a common directory as read-only with the jail, some jails will randomly, sile= ntly fail to launch due to nullfs mounting failing (silently). This also occurs= on FreeBSD 10.2 and possibly earlier. DETAILS: I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a co= mmon read-only base system. On reboot, only one or two of the three will start, = and I cannot predict which ones. The first jail listed in /etc/jail.conf will usually launch just fine. But subsequent ones fail (and I cannot predict wh= ich ones will succedd or fail). There are NO logs indicating the reason for fai= lure on the main system, nor in the jails' individual console log files. To track down the problem, I added some debugging logging into the /etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my jail.conf configuration: /etc/jail.conf: jail1 { host.hostname =3D "jail1.example.org"; path =3D "/usr/local/jail/jail1"; ip4.addr =3D 127.0.0.11; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail2 { host.hostname =3D "jail2.example.org"; path =3D "/usr/local/jail/jail2"; ip4.addr =3D 127.0.0.12; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } jail3 { host.hostname =3D "jail3.example.org"; path =3D "/usr/local/jail/jail3"; ip4.addr =3D 127.0.0.11; mount =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/bas= ejail nullfs ro 0 0"; exec.consolelog =3D "/var/log/jail_${host.hostname}.log"; exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB= UG'"; exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'"; } To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case statement subsection, to capture the output stored in the $_tmp file on error/failure I added: echo "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG cat $_tmp >> /tmp/DEBUG echo "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG I reboot the FreeBSD 10.3 system. Only a SINGLE jail started, the first on= e. The output of /tmp/DEBUG showed me: PRESTART_jail1 POSTSTART_jail1 DEBUG: Contents of '/tmp/jail.hyLntGie' are: mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by de= vice mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by de= vice jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail: failed jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail: failed jail1: created DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS Ah ha! The nullfs mounting failed! BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails = in parallel and there may be some file system resource that the parallel mount= ing of the common directory is encountering. And unfortunately the failure was SILENT! No logs! BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information! WORKAROUND FOR THE INTERIM: I can force /usr/sbin/jail to launch my jails sequentially by adding to each jail's /etc/jail.conf section a "depend =3D" line, like: jail2 { ... depend =3D jail1; ... } jail3 { ... depend =3D jail2; ... } This strikes me as a very brittle work-around. And if one jail fails to la= unch for some other reason, all subsequent jails would fail. The best solution would be to eliminate whatever resource contention is goi= ng on here. Google searches revealed a jail_parallel_start=3DNO rc.conf variable, but t= hose appeared to be related to the /etc/rc.d/jail script doing things in paralle= l.=20 In this bug, it is the /usr/sbin/jail command executing as a single process that is likely doing things in parallel (or perhaps sequentially but quickly enough that there is some resource contention in the nullfs mounting still) unless the depend=3D settings are included. Thanks for your help! Aaron out. --=20 You are receiving this mail because: You are the assignee for the bug.=