Date: Sat, 8 Nov 2008 17:19:55 +0100 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Scott Burns <scott@bqinternet.com> Cc: freebsd-current@freebsd.org Subject: Re: ZFS panic in zone_dataset_visible Message-ID: <20081108161955.GA2340@garage.freebsd.pl> In-Reply-To: <48D7D212.7090908@bqinternet.com> References: <48D4E974.2020008@bqinternet.com> <48D7D212.7090908@bqinternet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--FL5UXtIhxfXey3p5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Sep 22, 2008 at 01:12:50PM -0400, Scott Burns wrote: > Scott Burns wrote: > >Hello, > > > >I am running several servers using Pawel's July 27 ZFS patchset, applied= =20 > >against 8-current source from the same day. I have seen a similar panic= =20 > >on two different servers: > ... > >Stopped at _mtx_lock_flags+0x15: lock cmpxchgq %rsi,0x18(%rdi) > >db> bt > >Tracing pid 95276 tid 100432 td 0xffffff010b3cc000 > >_mtx_lock_flags() at _mtx_lock_flags+0x15 > >zone_dataset_visible() at zone_dataset_visible+0x94 > >zfs_mount() at zfs_mount+0x3e5 > ... >=20 > With a bit of testing, I found that this panic is easily reproducible.=20 > Simply try to list the contents of a snapshot from within a jail, as=20 > long as the snapshot isn't already mounted, and the system panics. If I= =20 > mount the snapshot from outside of the jail first, and then list it=20 > inside the jail, it does not panic. >=20 > I spent a bit of time debugging this weekend. Trying to list an=20 > unmounted snapshot triggers a zfs_mount() for the snapshot, which calls= =20 > zone_dataset_visible() to determine if the snapshot should be visible in= =20 > the current zone. When it is run outside of a jail, it returns true=20 > early on because INGLOBALZONE(curproc) is true, otherwise it takes=20 > another code path. >=20 > The panic is happening after that check, at mtx_lock(&pr->cr_mtx),=20 > because (pr =3D curthread->td_ucred->cr_prison) is NULL. Interestingly,= =20 > it's not NULL if zone_dataset_visible() is triggered by a "zfs list"=20 > command, but it is NULL if zone_dataset_visible() is called from=20 > zfs_mount(). >=20 > As a temporary workaround, I modified my copy of=20 > cddl/compat/opensolaris/kern/opensolaris_zone.c to have=20 > zone_dataset_visible() return true if it is being called for a snapshot.= =20 > I modified it as below: >=20 > -if (INGLOBALZONE(curproc)) > +if (INGLOBALZONE(curproc) || strchr(dataset, '@')) >=20 > This is obviously not ideal, since it allows the manipulation of the=20 > snapshot from another jail if the caller knows that it exists. Since I= =20 > am the only one with root access to any of the jails, I am not concerned= =20 > with that. "zfs list" continues to behave normally. >=20 > I will continue looking at this, but since my main goal of working=20 > around the panic has been taken care of, I am not sure how long my=20 > attention span will last. If the cause of=20 > curthread->td_ucred->cr_prison being NULL under these conditions is=20 > obvious to anyone, please let me know. Thanks for the report. The problem is that we have an ugly hack to allow regular users to mount snapshots automatically. It works by changing td_ucred to kcred for VFS_MOUNT() call. This makes p_ucred to point at jailed cred and td_ucred to point at unjailed thread in zone_dataset_visible(), which is confusing of course. I fixed it by reimplementing INGLOBALZONE() macro to take thread, not process. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --FL5UXtIhxfXey3p5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFJFbwrForvXbEpPzQRAvq3AJwMDKLqrE46CDqUC4iBgOLqWeX7BACgg3jF FknEhzFJE3W2BOO6vubg560= =xAWM -----END PGP SIGNATURE----- --FL5UXtIhxfXey3p5--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081108161955.GA2340>