Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Apr 2016 20:51:51 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 209112] /usr/sbin/jail jails fail to launch with possible race when jails mount common dir with nullfs
Message-ID:  <bug-209112-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209112

            Bug ID: 209112
           Summary: /usr/sbin/jail jails fail to launch with possible race
                    when jails mount common dir with nullfs
           Product: Base System
           Version: 10.3-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: agifford@infowest.com

On a host with multiple jails (configured using /etc/jail.conf) that mount a
common directory as read-only with the jail, some jails will randomly, sile=
ntly
fail to launch due to nullfs mounting failing (silently).  This also occurs=
 on
FreeBSD 10.2 and possibly earlier.

DETAILS:

I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a co=
mmon
read-only base system. On reboot, only one or two of the three will start, =
and
I cannot predict which ones. The first jail listed in /etc/jail.conf will
usually launch just fine. But subsequent ones fail (and I cannot predict wh=
ich
ones will succedd or fail). There are NO logs indicating the reason for fai=
lure
on the main system, nor in the jails' individual console log files.

To track down the problem, I added some debugging logging into the
/etc/rc.d/jail script, and some exec.prestart/exec.poststart lines to my
jail.conf configuration:

/etc/jail.conf:

jail1 {
  host.hostname  =3D "jail1.example.org";
  path  =3D "/usr/local/jail/jail1";
  ip4.addr  =3D 127.0.0.11;
  mount  =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/bas=
ejail
nullfs ro 0 0";
  exec.consolelog =3D "/var/log/jail_${host.hostname}.log";
  exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB=
UG'";
  exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}

jail2 {
  host.hostname  =3D "jail2.example.org";
  path  =3D "/usr/local/jail/jail2";
  ip4.addr  =3D 127.0.0.12;
  mount  =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/bas=
ejail
nullfs ro 0 0";
  exec.consolelog =3D "/var/log/jail_${host.hostname}.log";
  exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB=
UG'";
  exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}

jail3 {
  host.hostname  =3D "jail3.example.org";
  path  =3D "/usr/local/jail/jail3";
  ip4.addr  =3D 127.0.0.11;
  mount  =3D "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/bas=
ejail
nullfs ro 0 0";
  exec.consolelog =3D "/var/log/jail_${host.hostname}.log";
  exec.prestart =3D "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEB=
UG'";
  exec.poststart =3D "/bin/sh -c 'echo POSTSTART_${host.hostname} >>
/tmp/DEBUG'";
}


To the /etc/rc.d/jail script, in the jail_start() function, in the _ALL case
statement subsection, to capture the output stored in the $_tmp file on
error/failure I added:

echo  "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG
cat $_tmp >> /tmp/DEBUG
echo  "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG

I reboot the FreeBSD 10.3 system.  Only a SINGLE jail started, the first on=
e.
The output of /tmp/DEBUG showed me:

PRESTART_jail1
POSTSTART_jail1
DEBUG: Contents of '/tmp/jail.hyLntGie' are:
mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by de=
vice
mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by de=
vice
jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19
/usr/local/jail/jail2/basejail: failed
jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19
/usr/local/jail/jail3/basejail: failed
jail1: created
DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS

Ah ha!  The nullfs mounting failed!

BUG #1: Apparently the /usr/sbin/jail command must attempt to launch jails =
in
parallel and there may be some file system resource that the parallel mount=
ing
of the common directory is encountering.

And unfortunately the failure was SILENT!  No logs!


BUG #2: The /etc/rc.d/jail script is NOT LOGGING the failure information!


WORKAROUND FOR THE INTERIM:

I can force /usr/sbin/jail to launch my jails sequentially by adding to each
jail's /etc/jail.conf section a "depend =3D" line, like:

jail2 {
  ...
  depend =3D jail1;
  ...
}

jail3 {
  ...
  depend =3D jail2;
  ...
}


This strikes me as a very brittle work-around.  And if one jail fails to la=
unch
for some other reason, all subsequent jails would fail.

The best solution would be to eliminate whatever resource contention is goi=
ng
on here.

Google searches revealed a jail_parallel_start=3DNO rc.conf variable, but t=
hose
appeared to be related to the /etc/rc.d/jail script doing things in paralle=
l.=20
In this bug, it is the /usr/sbin/jail command executing as a single process
that is likely doing things in parallel (or perhaps sequentially but quickly
enough that there is some resource contention in the nullfs mounting still)
unless the depend=3D settings are included.

Thanks for your help!

Aaron out.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209112-8>