From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 17:56:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3CD781065673;
	Wed, 27 Jul 2011 17:56:10 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 70FB08FC12;
	Wed, 27 Jul 2011 17:56:09 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: As4AAPpPME6DaFvO/2dsb2JhbAA1AQEFKQRGEh0OCgICDQceAhYSPwcXhFaTLJA/uWyRSIErgXuCC4EPBJJ1iDOBOIcT
X-IronPort-AV: E=Sophos;i="4.67,277,1309752000"; d="scan'208";a="132455253"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 27 Jul 2011 13:56:08 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57F7BB40F7;
	Wed, 27 Jul 2011 13:56:08 -0400 (EDT)
Date: Wed, 27 Jul 2011 13:56:08 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <1847245041.1083168.1311789368340.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201107270840.57104.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: Attilio Rao <attilio@freebsd.org>, freebsd-fs@freebsd.org
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 17:56:10 -0000

John Baldwin wrote:
> On Tuesday, July 26, 2011 9:56:37 pm Attilio Rao wrote:
> > 2011/7/26 Kostik Belousov <kostikbel@gmail.com>:
> > > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote:
> > >> Kostik Belousov wrote:
> > >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> > >> > > Hi,
> > >> > >
> > >> > > Currently both NFS servers set the vnode lock LK_SHARED
> > >> > > and so do the local syscalls (at least that's how it looks
> > >> > > by inspection?).
> > >> > >
> > >> > > Peter Holm just posted me this panic, where a test for an
> > >> > > exclusive vnode lock fails in msdosfs_readdir().
> > >> > > KDB: stack backtrace:
> > >> > >
> db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...)
> > >> > > at db_trace_self_wrapper+0x26
> > >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...)
> > >> > > at
> > >> > > kdb_backtrace+0x2a
> > >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at
> > >> > > vfs_badlock+0x23
> > >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
> > >> > > assert_vop_elocked+0x55
> > >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at
> > >> > > pcbmap+0x45
> > >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
> > >> > > msdosfs_readdir+0x528
> > >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
> > >> > > VOP_READDIR_APV+0xc5
> > >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
> > >> > > nfsrvd_readdir+0x38e
> > >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
> > >> > > nfsrvd_dorpc+0x1f79
> > >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
> > >> > > nfssvc_program+0x40f
> > >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
> > >> > > at svc_run_internal+0x952
> > >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...)
> > >> > > at
> > >> > > svc_thread_start+0x10
> > >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> > >> > > fork_trampoline() at fork_trampoline+0x8
> > >> > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 ---
> > >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> > >> > > KDB: enter: lock violation
> > >> > >
> > >> > > So, does anyone know if the msdosfs_readdir() really requires
> > >> > > a
> > >> > > LK_EXCLUSIVE
> > >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in
> > >> > > pcbmap()?
> > >> >
> > >> > Yes, msdosfs currently requires all vnode locks to be
> > >> > exclusive. One
> > >> > of
> > >> > the reasons is that each denode (the msdosfs-private vnode
> > >> > data)
> > >> > carries
> > >> > the fat entries cache, and this cache is updated even by the
> > >> > operations
> > >> > that do not modify vnode from the VFS POV.
> > >> >
> > >> > The locking regime is enforced by the getnewvnode()
> > >> > initializing the
> > >> > vnode
> > >> > lock with LK_NOSHARE flag, and msdosfs code not calling
> > >> > VN_LOCK_ASHARE()
> > >> > on the newly instantiated vnode.
> > >> >
> > >> > My question is, was the vnode in question locked at all ?
> > >> I think the problem is that I do a LK_DOWNGRADE. From a quick
> > >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
> > >> LK_DOWNGRADE.
> > >>
> > >> Maybe __lockmgr_args() should have something like:
> > >>    if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags &
> > >>    LK_NOSHARE))
> > >>         return (0); /* noop */
> > >> after the
> > >>    if (op == LK_SHARED && (lk->lock_object.lo_flags &
> > >>    LK_NOSHARE))
> > >>         op = LK_EXCLUSIVE;
> > >> lines?
> > > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade,
> > > but I agree with the essence of your proposal.
> >
> > As long as the difference in semantic with the old lockmgr is
> > correctly stressed out in the doc (and eventually comments) I'm fine
> > with this change.
> 
> I think it is a bug in the LK_NOSHARE implementation if the old
> lockmgr()
> didn't silently nop downgrade requests when LK_NOSHARE was set. :) We
> should definitely fix it to ignore downgrades for LK_NOSHARE.
> 
By the way, I think that __lockmgr_args() in -current doesn't check for
LK_NOSHARE. That was what pho@ was testing when he found the problem.

At this point, I believe that the new NFS server (which I have a patch for
that pho@ is testing to avoid LK_DOWNGRADE) is the only place that is
broken. (compute_cn_lkflags() only sets LK_SHARED if MNT_LOOKUP_SHARED
is set and the only LK_DOWNGRADE I see is in vfs_cache.c when
cn_lkflags == LK_SHARED. The rest are in file systems that handle LK_SHARED
locked vnodes, from what I can see at a glance.)

So, it isn't a difference between old/current behaviour, just a suggestion
that adding a check in __lockmgr_args() might be a nice safety belt for
the future, since __lockargs_mgr() already checks for the LK_SHARED case.

rick, who will get the fix for the new NFS server to re@ soon