From owner-freebsd-hackers  Wed Jan 21 14:37:20 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id OAA01341
          for hackers-outgoing; Wed, 21 Jan 1998 14:37:20 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA01315
          for <hackers@FreeBSD.ORG>; Wed, 21 Jan 1998 14:36:56 -0800 (PST)
          (envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.8.8/8.8.8) id PAA14951;
	Wed, 21 Jan 1998 15:36:55 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp01.primenet.com, id smtpd014924; Wed Jan 21 15:36:48 1998
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id PAA09564;
	Wed, 21 Jan 1998 15:36:44 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199801212236.PAA09564@usr09.primenet.com>
Subject: Re: Locking on disk slice I/O--yes, no or how?
To: grog@lemis.com (Greg Lehey)
Date: Wed, 21 Jan 1998 22:36:44 +0000 (GMT)
Cc: hackers@FreeBSD.ORG
In-Reply-To: <19980121185627.21744@lemis.com> from "Greg Lehey" at Jan 21, 98 06:56:27 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk

> I'm currently trying to perform low-level I/O to disk slices in a
> driver.  I've read section 9 of the manual, which tells me that all
> reads and writes should be protected with a VOP_LOCK/VOP_UNLOCK pair.
> I've tried this, and get a panic: "lockmgr: locking against myself"

Yick.  Someone's trying to use the lockmgr for finer grained SMP
locking.  That'll never work... it must have snuck in when I wasn't
looking.


> The following code leads to a panic:
> 
>     VOP_LOCK (devvp, LK_EXCLUSIVE, curproc);		    /* lock the vnode */
>     error = VOP_WRITE (devvp, &uio, IO_NODELOCKED, FSCRED); /* write the header */
>     VOP_UNLOCK (devvp, 0, curproc);			    /* and unlock it again */

In general, you are trying to do I/O in a driver, which holds another
lock.  The panic is an artifact of someone using the lockmgr as if
it actually worked the way you'd expect an SMP lock manager to work.

As you've discovered, it doesn't.

The panic is because you are attempting to obtain the lock while you
hold a mutually exclusive lock at the same time.

Technically, I believe you should be able to do file I/O from a
device not involved in file I/O.  Practically, the VFS is not
really organized correctly to allow you to do this (or a lot of
other things.

Since you already hold a mutually exclusive lock, you technically
don't need to do the VOP_LOCK to protect the vnode from reentry
(this assumes your UIO is in sysspace, not userspace, so the
uiomove won't cause paging; you can't page in the kernel except
in rare circumstances, and if you went to sleep without really
holding the lock, it would be bad.


> Looking at the stack frame for vop_stdlock, it looks as if I'm trying
> to use my devvp->v_data as lock data to lock against my own vnode.  I
> can't make sense of this (except that, under these circumstances, I
> can understand the panic).  I have a number of questions:
> 
> 1.  What is the lock manager trying to do here?  What's the content of
>     devvp->v_data?

It's a pointer to the in core per FS data.  Given the contents of
the structure you showed, it's a pointer to an FFS inode, so you are
actually calling ufs_lock in ufs/ufs/ufs_vnops.c.

If the vnodes were handled differently, there wouldn't be a VOP_LOCK,
per se, in each FS implementation.  But alas, they are not.

Unfortunately, the lockmgr() code is called to apply the lock onto
a lock list pointer that is in the in-core inode; look at the UFS
code for details (yes, I know that spec_lock doesn't do anything;
you are not using the descriptor vector from miscfs/specfs/spec_vnops.c
linke you think -- you are using the one in ufs/ffs/ffs_vnops.c).


> 2.  Why do I need to lock at all?  According to the 4.4BSD book, it's
>     "advisory", and then only for directory searches and such.

You need to lock in case you are doing something to the vnode that
causes you to sleep waiting for resources (pages, etc.), and while you
are asleep, someone else wants to come in and do something to the same
vnode while you are in the middle of what you are doing.

The panic occurs because your curproc is the process holding the other
lock you are attempting to assert against, and the IN_RECURSE flag
is not set (you can't set it because you can't get at the inode).
That code is all a kludge anyway, since it should *always* be a counting
lock, not just sometimes.  Terrible implementation, there...

> 3.  Are the parameters correct?  It looks as if I shouldn't specify
>     curproc in the calls, but I can't find any documentation that
>     tells me what to put there, nor indeed what use the whole thing
>     is.

"According to the source code, this is correct behaviour".

Since your device is operating on behalf of a process, you should
provide that process to the lock.

In reality, it's a kludge and a half to use the lockmgr the way it
is being used by the VOP_LOCK right now; the lockmgr code is for
advisory locking, and overloading it with another VOP entry point
is just asking for exactly this type of collision, even when you
aren't doing something strange, like I/O from a driver.

You can:

	cd /sys/kern
	grep VOP_LOCK * | more

But I suspect you won't find one not for the current process (which
is used to manufacture the lock collision domain).


> 4.  This device is /dev/rsd1e.  Should I even be calling
>     ufs_vnoperatespec?

Probably not.  Probably you should be calling bwrite(); this is
highly dependent on what exactly your driver is, though.  You can
see that tty_tty.c (the tty driver) calls VOP_WRITE.  It really
depends on if your device is a real device or a pseudo-device, and
whether you got into the driver through a system call or as the
result of an interrupt.  If you got in via an interrupt, you should
probably not be calling VOP_WRITE.

But if you call VOP_WRITE, then it's going to go through the FFS
(and therefore the UFS common code).


> 5.  Am I doing anything else obviously wrong?

Well, there is a lot of kernel code in this area that is obviously
wrong.  Probably, if you are going to call VOP_WRITE, then you
should call LEASE_CHECK to notify NFS, or any additional user
space opportunistic locking interface that you're doing something;
many of the places in the kernel where people call VOP_WRITE or
VOP_READ directly actually fail to call this, though.  This
doesn't really apply to things that can't be exported, though,
so if there's no chance you will be given a vp for a file, then
it's not a problem.  If you are going to operate on general devices,
though, you may find that vnconfig'ed devices might qualify.  The
Samba maintainer has expressed an interest in a user space
interface to an opportunistic locking API -- it makes sense,
given that both Samba and NFS support opportunistic locking, but
neither gets notified of the others lock breaks (NFS calls this
lease breaking).  Technically, the code in kern/kern_ktrace.c is
bad sample code for calling VOP_WRITE, as is the code in tty_tty.c,
as is the code in vn_rdwr() in vfs_vnops.c, though vn_write is OK.
Luckily, this is the code the system calls use.  Unluckily, it's
in a struct fileops, from when the Heideman code was pounded in
4.4; this is a legacy for the pipe and socket code which weren't
modernized when everything else was.

I'd really suggest examining what you are doing to see if the
way you are doing it is the right way.  Then look at the code
in the functions starting with vn_ in kern/vfs_vnops.c; at
least that code does the lease checking, etc..  You might even
want to be calling vn_write instead of VOP_WRITE; it depends
again on how you obtained your vnode.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.