Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Jan 2014 16:54:59 +0100
From:      =?utf-8?Q?Hans_Petter_Selasky?= <hans.petter.selasky@bitfrost.no>
To:        =?utf-8?Q?freebsd-fs=40freebsd=2Eorg?= <freebsd-fs@freebsd.org>
Subject:   FW: lock order reversals w/ backtrace
Message-ID:  <zarafa.52e3de53.0da5.7253d5c71778a401@mail.lockless.no>

next in thread | raw e-mail | index | archive | help
FYI - can anyone comment=3F=0D=0A=0D=0A--HPS=0D=0A=20=0D=0A=20=0D=0A-----=
Original message-----=0D=0A> From:Thomas Hoffmann <trh411@gmail.com <mail=
to:trh411@gmail.com> >=0D=0A> Sent: Saturday 25th January 2014 15:55=0D=0A=
> To: freebsd-current <freebsd-current@freebsd.org <mailto:freebsd-curren=
t@freebsd.org> >=0D=0A> Cc: Hans Petter Selasky <hans.petter.selasky@bitf=
rost.no <mailto:hans.petter.selasky@bitfrost.no> >=0D=0A> Subject: Re: lo=
ck order reversals w/ backtrace=0D=0A>=20=0D=0A> On Fri, Jan 24, 2014 at =
11:47 PM, Thomas Hoffmann <trh411@gmail.com <mailto:trh411@gmail.com> > w=
rote:=0D=0A>=20=0D=0A> > On Fri, Jan 24, 2014 at 9:52 AM, Thomas Hoffmann=
 <trh411@gmail.com <mailto:trh411@gmail.com> > wrote:=0D=0A> >=0D=0A> >> =
On Thu, Jan 23, 2014 at 11:49 AM, Hans Petter Selasky <=0D=0A> >> hans.pe=
tter.selasky@bitfrost.no <mailto:hans.petter.selasky@bitfrost.no> > wrote=
:=0D=0A> >>=0D=0A> >>>  Hi,=0D=0A> >>>=0D=0A> >>> Can you see if you can =
snap some keywords of the backtraces, like usb_xxx usbdx_xxx cam scsi or =
something like that.=0D=0A> >>>=0D=0A> >>> Else I believe there are some =
sysctl options to prevent the final reboot somehow so that you can write =
down the messages.=0D=0A> >>>=0D=0A> >>>=0D=0A> >>> --HPS=0D=0A> >>>=0D=0A=
> >>>=0D=0A> >>> -----Original message-----=0D=0A> >>> > From:Thomas Hoff=
mann <trh411@gmail.com <mailto:trh411@gmail.com> >=0D=0A> >>> > Sent: Thu=
rsday 23rd January 2014 17:15=0D=0A> >>>=0D=0A> >>> > To: freebsd-current=
 <freebsd-current@freebsd.org <mailto:freebsd-current@freebsd.org> >=0D=0A=
> >>> > Subject: lock order reversals w/ backtrace=0D=0A> >>> >=0D=0A> >>=
> > A few days ago I started running 11.0-CURRENT at r260971 for the firs=
t time.=0D=0A> >>>=0D=0A> >>>=0D=0A> >>> >=0D=0A> >>> > The last couple o=
f times I shutdown my system I noticed 2 or 3 short "lock=0D=0A> >>> > or=
der reversal" messages with accompanying backtraces scroll by. Do these=0D=
=0A> >>> > messages represent a problem that I should report or can I ign=
ore them as=0D=0A> >>>=0D=0A> >>>=0D=0A> >>> > debug in nature=3F If I sh=
ould report them, how or where do these messages=0D=0A> >>> > get logged=3F=
 I can find no reference to them in syslog or anywhere else upon=0D=0A> >=
>> > my subsequent reboot.=0D=0A> >>> >=0D=0A> >>> > I also had a couple =
of these messages pop up the other day while=0D=0A> >>>=0D=0A> >>>=0D=0A>=
 >>> > mounting/umounting USB thumb drives. I did not think anything of t=
hem at=0D=0A> >>> > the time as the mounts/umounts completed successfully=
=2E=0D=0A> >>> >=0D=0A> >>> > Please advise. Thanks.=0D=0A> >>> >=0D=0A> =
>>> > -Tom=0D=0A> >>>=0D=0A> >>>  I managed to snap a photo of my screen =
during shutdown.=0D=0A> >> Here is the full text of the first of two lock=
 order reversals I got last=0D=0A> >> night during system shutdown, both =
of which are zfs-related to (it=0D=0A> >> appears=3F) unmounts. A few lin=
es got chopped as they were out-of-frame when=0D=0A> >> I took the photo:=
=0D=0A> >>=0D=0A> >> shutting down local daemons:=0D=0A> >> lock order re=
versal:=0D=0A> >>   1st 0xfffff801194f67c8 zfs (zfs) @ /usr/src/sys/kern/=
vfs_mount.c:1237=0D=0A> >>   2nd 0xfffff801194f6420 syncer (syncer) @=0D=0A=
> >> /usr/src/sys/kern/vfs_subr.c:22[..chopped...]=0D=0A> >> KDB: stack b=
acktrace:=0D=0A> >> db_trace_self_wrapper() at db_trace_delf_wrapper+0x2b=
/frame=0D=0A> >> 0xfffffe01ac78[...chopped...]=0D=0A> >> kdb_backtrace() =
at kdb_backtrace+0x39/frame 0xfffffe01ac784650=0D=0A> >> witness_checkord=
er() at witness_checkorder+0xd23/frame 0xfffffe01ac7846e0=0D=0A> >> __loc=
kmgr_args() __lockmgr_args+0x878/frame 0xfffffe01ac784810=0D=0A> >> vop_s=
tdlock() at vop_stdlock+0x3c/frame 0xfffffe01ac784830=0D=0A> >> VOP_LOCK1=
_APV() at VOP_LOCK1_APV+0xf5/frame 0xfffffe01ac784860=0D=0A> >> _vn_lock(=
) at _vn_lock+0xab/frame 0xfffffe01ac7848d0=0D=0A> >> vputx() at vputx+0x=
240/frame 0xfffffe01ac784930=0D=0A> >> dounmount at dounmount+0x327/frame=
 0xfffffe01ac7849b0=0D=0A> >> sys_unmount() at sys_unmount+0x356/frame 0x=
fffffe01ac784ac0=0D=0A> >> amd64_syscall() at amd64_syscall+0x265/frame 0=
xfffffe01ac784bf0=0D=0A> >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0=
xfffffe01ac784bf0=0D=0A> >> --- syscall (22, FreeBSD ELF64, sys_unmount, =
rip =3D 0x80191c72a,=0D=0A> >> rsp[...chopped...]=0D=0A> >>=0D=0A> >> I h=
ave a zpool on an external USB HDD that I export at shutdown via=0D=0A> >=
> rc.shutdown.local. Don't know if that is contributing to this or not. W=
hen=0D=0A> >> I get a chance to bring down the system I will manually exp=
ort it before=0D=0A> >> shutdown to see if that makes any difference.=0D=0A=
> >>=0D=0A> >=0D=0A> > Was able to capture the full text of the lock orde=
r reversal I've been=0D=0A> > experiencing at shutdown. Turns out to be a=
ssociated with the export of one=0D=0A> > of my zpools that is hosted on =
an external USB HDD. Normally I export the=0D=0A> > zpool from rc.shutdow=
n.local, but tonight I exported it manually before I=0D=0A> > shutdown an=
d got the following.=0D=0A> >=0D=0A> > lock order reversal:=0D=0A> > : 1s=
t 0xfffff8011952b5f0 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1237=0D=0A=
> > : 2nd 0xfffff8011952b068 syncer (syncer) @=0D=0A> > /usr/src/sys/kern=
/vfs_subr.c:2212=0D=0A> > KDB: stack backtrace:=0D=0A> > db_trace_self_wr=
apper() at db_trace_self_wrapper+0x2b/frame=0D=0A> > 0xfffffe01add6e5a0=0D=
=0A> > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe01add6e650=0D=0A=
> > witness_checkorder() at witness_checkorder+0xd23/frame 0xfffffe01add6=
e6e0=0D=0A> > __lockmgr_args() at __lockmgr_args+0x878/frame 0xfffffe01ad=
d6e810=0D=0A> > vop_stdlock() at vop_stdlock+0x3c/frame 0xfffffe01add6e83=
0=0D=0A> > VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xfffffe01add6e860=
=0D=0A> > _vn_lock() at _vn_lock+0xab/frame 0xfffffe01add6e8d0=0D=0A> > v=
putx() at vputx+0x240/frame 0xfffffe01add6e930=0D=0A> > dounmount() at do=
unmount+0x327/frame 0xfffffe01add6e9b0=0D=0A> > sys_unmount() at sys_unmo=
unt+0x356/frame 0xfffffe01add6eae0=0D=0A> > amd64_syscall() at amd64_sysc=
all+0x265/frame 0xfffffe01add6ebf0=0D=0A> > Xfast_syscall() at Xfast_sysc=
all+0xfb/frame 0xfffffe01add6ebf0=0D=0A> > --- syscall (22, FreeBSD ELF64=
, sys_unmount), rip =3D 0x80191c72a, rsp =3D=0D=0A> > 0x7fffffffc4c8, rbp=
 =3D 0x7fffffffc960 ---=0D=0A> >=0D=0A> > When imported and mounted, the =
zpool reports no errors and I have not=0D=0A> > experienced any anomalies=
 reading or writing to the zpool. Should I be=0D=0A> > concerned=3F=0D=0A=
> >=0D=0A> > Hmm, seems like I'm the only one interested in this.=0D=0A>=20=
=0D=0A> I think I have resolved the problem with the zpool on the externa=
l USB HDD.=0D=0A> I copied off the data, destroyed the datasets and zpool=
, recreated the=0D=0A> zpool and datasets and restored the data. I have e=
xecuted several exports=0D=0A> on the zpool and have received no "lock or=
der reversal" messages.=0D=0A>=20=0D=0A> I am still getting lock order re=
versal messages for my bootpool and zroot=0D=0A> zpools at shutdown. Recr=
eating bootpool would be relatively easy, but if=0D=0A> I'm going to recr=
eate zroot, I might as well do a reinistall, am I am not=0D=0A> prepared =
to do at this time. So I'm still in search of an alternative way=0D=0A> t=
o resolve this issue. My zpools all show "no known data errors", scrub=0D=
=0A> cleanly, zpool upgrade reports that all zpools support all features,=
 and=0D=0A> I've not experienced any other issues related to my zpools.=0D=
=0A>=20=0D=0A> Based on the above lock order reversal data, the error is =
in this block of=0D=0A> code (, specifically /usr/src/sys/kern/vfs_mount.=
c:1237)=0D=0A>=20=0D=0A>  if ((coveredvp =3D mp->mnt_vnodecovered) !=3D N=
ULL) {=0D=0A>                 mnt_gen_r =3D mp->mnt_gen;=0D=0A>          =
       VI_LOCK(coveredvp);=0D=0A>                 vholdl(coveredvp);=0D=0A=
>                 vn_lock(coveredvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_RET=
RY);=0D=0A>    /* LINE 1237 */=0D=0A>                 vdrop(coveredvp);=0D=
=0A>                 /*=0D=0A>                  * Check for mp being unmo=
unted while waiting for the=0D=0A>                  * covered vnode lock.=
=0D=0A>                  */=0D=0A>                 if (coveredvp->v_mount=
edhere !=3D mp ||=0D=0A>                     coveredvp->v_mountedhere->mn=
t_gen !=3D mnt_gen_r) {=0D=0A>                         VOP_UNLOCK(covered=
vp, 0);=0D=0A>                         return (EBUSY);=0D=0A>            =
     }=0D=0A> }=0D=0A>=20=0D=0A>=20=0D=0A> -Tom=0D=0A> __________________=
_____________________________=0D=0A> freebsd-current@freebsd.org <mailto:=
freebsd-current@freebsd.org>  mailing list=0D=0A> http://lists.freebsd.or=
g/mailman/listinfo/freebsd-current <http://lists.freebsd.org/mailman/list=
info/freebsd-current>=20=0D=0A> To unsubscribe, send any mail to "freebsd=
-current-unsubscribe@freebsd.org <mailto:freebsd-current-unsubscribe@free=
bsd.org> "=0D=0A>=20=0D=0A=0D=0A
From owner-freebsd-fs@FreeBSD.ORG  Sat Jan 25 16:28:19 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 68EDA7F4;
 Sat, 25 Jan 2014 16:28:19 +0000 (UTC)
Received: from tensor.andric.com (unknown
 [IPv6:2001:7b8:3a7:1:2d0:b7ff:fea0:8c26])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 131141E7F;
 Sat, 25 Jan 2014 16:28:19 +0000 (UTC)
Received: from [IPv6:2001:7b8:3a7::719e:bbe8:ba8a:34e0] (unknown
 [IPv6:2001:7b8:3a7:0:719e:bbe8:ba8a:34e0])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id E54C85C44;
 Sat, 25 Jan 2014 17:28:12 +0100 (CET)
Subject: Re: BUG: possible NULL pointer dereference in nfs server
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Content-Type: multipart/signed;
 boundary="Apple-Mail=_70801F41-810B-4C7F-9BA3-A9451B541D04";
 protocol="application/pgp-signature"; micalg=pgp-sha1
X-Pgp-Agent: GPGMail 2.1 (6062eb4)
From: Dimitry Andric <dim@FreeBSD.org>
In-Reply-To: <1577222508.16050114.1390610315654.JavaMail.root@uoguelph.ca>
Date: Sat, 25 Jan 2014 17:27:58 +0100
Message-Id: <97D8A7EA-12F3-4EFC-A933-BD8528B6306A@FreeBSD.org>
References: <1577222508.16050114.1390610315654.JavaMail.root@uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1827)
Cc: Roman Divacky <rdivacky@freebsd.org>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>;
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Jan 2014 16:28:19 -0000


--Apple-Mail=_70801F41-810B-4C7F-9BA3-A9451B541D04
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 25 Jan 2014, at 01:38, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Roman Divacky wrote:
>> Hi,
>>=20
>> In nfs_nfsdstate.c:nfsrv_lockctrl() we call
>>=20
>> getlckret =3D nfsrv_getlockfh(vp, new_stp->ls_flags, NULL, &nfh, p);
>>=20
>> then in nfsrv_getlockfh() we, based on the value of flags, might
>> dereference the NULL pointer:
>>=20
>>=20
>> nfsrv_getlockfh(vnode_t vp, u_short flags,
>>    struct nfslockfile **new_lfpp, fhandle_t *nfhp, NFSPROC_T *p)
>>=20
>>=20
>>        if (flags & NFSLCK_OPEN) {
>>                new_lfp =3D *new_lfpp;
>>                fhp =3D &new_lfp->lf_fh;
>>=20
>>=20
> I took a look and don't see a problem. NFSLCK_OPEN is only set for =
Opens
> { nfsrvd_open() } and it never calls nfsrv_lockctrl(). =
nfsrv_lockctrl() is
> always called with other flags in new_stp->ls_flags set, but never =
NFSLCK_OPEN,
> because nfsrv_lockctrl() is handling byte range lock cases after a =
file has been
> opened { which means the "else" case for this "if" always applies.

The point is that the compiler cannot always know this, when it inlines
nfsrv_getlockfh().  In the first invocation of nfsrv_getlockfh(), there
is no problem:

   712  APPLESTATIC void
   713  nfsrv_dumplocks(vnode_t vp, struct nfsd_dumplocks *ldumpp, int =
maxcnt,
   714      NFSPROC_T *p)
   715  {
...
   726          ret =3D nfsrv_getlockfh(vp, 0, NULL, &nfh, p);

Here both the 'flags' and 'new_lfpp' arguments are zero, so the part in
nfsrv_getlockfh() where it writes to a NULL pointer can never be
executed, therefore it can be optimized away safely.

In the next invocation, the state of the 'flags' argument is not known
to the compiler, however:

  1369  tryagain:
  1370          if (new_stp->ls_flags & NFSLCK_LOCK)
  1371                  MALLOC(other_lop, struct nfslock *, sizeof =
(struct nfslock),
  1372                      M_NFSDLOCK, M_WAITOK);
...
  1387          getlckret =3D nfsrv_getlockfh(vp, new_stp->ls_flags, =
NULL, &nfh, p);

If it inlines this, the result looks approximately like:

     1	{
     2		fhandle_t *fhp =3D NULL;
     3		struct nfslockfile *new_lfp;
     4		int error;
     5
     6		if (new_stp->ls_flags & NFSLCK_OPEN) {
     7			new_lfp =3D *NULL;
     8			fhp =3D &new_lfp->lf_fh;
     9		} else if (&nfh) {
    10			fhp =3D &nfh;
    11		} else {
    12			panic("nfsrv_getlockfh");
    13		}
    14		error =3D nfsvno_getfh(vp, fhp, p);
    15		NFSEXITCODE(error);
    16		getlckret =3D error;
    17	}

The code in line 7 is the problematic part.  Since this is undefined,
the compiler inserts a trap instruction here.  I think the problem Roman
encountered is that on sparc64, there is no equivalent to x86's ud2
instruction, so it inserts a call to abort() instead, and that function
is not available in kernel-land.


...
> Sorry, I'm not a compiler guy, so I don't know why a compiler would
> generate a trap instruction, but since new_lfpp is never NULL when
> this is executed, I don't see a problem.
>=20
> If others feel that this needs to be re-coded, please let me know what
> you think the code should look like? (A test for non-NULL with a =
panic()
> before it is used?)
>=20
> Is a trap instruction that never gets executed a problem?

It's better to avoid undefined behavior in any case.  Just add a NULL
check, that should be sufficient.

-Dimitry


--Apple-Mail=_70801F41-810B-4C7F-9BA3-A9451B541D04
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iEYEARECAAYFAlLj5hkACgkQsF6jCi4glqO2hACeKUFQagxvdOrYIgc3YqdmERwq
VbIAnR5Td7N+Dqjj4BAiAN8nVTmqkW2A
=9r0K
-----END PGP SIGNATURE-----

--Apple-Mail=_70801F41-810B-4C7F-9BA3-A9451B541D04--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?zarafa.52e3de53.0da5.7253d5c71778a401>