From owner-freebsd-hackers Wed Jan 21 14:37:20 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA01341 for hackers-outgoing; Wed, 21 Jan 1998 14:37:20 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA01315 for ; Wed, 21 Jan 1998 14:36:56 -0800 (PST) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id PAA14951; Wed, 21 Jan 1998 15:36:55 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp01.primenet.com, id smtpd014924; Wed Jan 21 15:36:48 1998 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id PAA09564; Wed, 21 Jan 1998 15:36:44 -0700 (MST) From: Terry Lambert Message-Id: <199801212236.PAA09564@usr09.primenet.com> Subject: Re: Locking on disk slice I/O--yes, no or how? To: grog@lemis.com (Greg Lehey) Date: Wed, 21 Jan 1998 22:36:44 +0000 (GMT) Cc: hackers@FreeBSD.ORG In-Reply-To: <19980121185627.21744@lemis.com> from "Greg Lehey" at Jan 21, 98 06:56:27 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk > I'm currently trying to perform low-level I/O to disk slices in a > driver. I've read section 9 of the manual, which tells me that all > reads and writes should be protected with a VOP_LOCK/VOP_UNLOCK pair. > I've tried this, and get a panic: "lockmgr: locking against myself" Yick. Someone's trying to use the lockmgr for finer grained SMP locking. That'll never work... it must have snuck in when I wasn't looking. > The following code leads to a panic: > > VOP_LOCK (devvp, LK_EXCLUSIVE, curproc); /* lock the vnode */ > error = VOP_WRITE (devvp, &uio, IO_NODELOCKED, FSCRED); /* write the header */ > VOP_UNLOCK (devvp, 0, curproc); /* and unlock it again */ In general, you are trying to do I/O in a driver, which holds another lock. The panic is an artifact of someone using the lockmgr as if it actually worked the way you'd expect an SMP lock manager to work. As you've discovered, it doesn't. The panic is because you are attempting to obtain the lock while you hold a mutually exclusive lock at the same time. Technically, I believe you should be able to do file I/O from a device not involved in file I/O. Practically, the VFS is not really organized correctly to allow you to do this (or a lot of other things. Since you already hold a mutually exclusive lock, you technically don't need to do the VOP_LOCK to protect the vnode from reentry (this assumes your UIO is in sysspace, not userspace, so the uiomove won't cause paging; you can't page in the kernel except in rare circumstances, and if you went to sleep without really holding the lock, it would be bad. > Looking at the stack frame for vop_stdlock, it looks as if I'm trying > to use my devvp->v_data as lock data to lock against my own vnode. I > can't make sense of this (except that, under these circumstances, I > can understand the panic). I have a number of questions: > > 1. What is the lock manager trying to do here? What's the content of > devvp->v_data? It's a pointer to the in core per FS data. Given the contents of the structure you showed, it's a pointer to an FFS inode, so you are actually calling ufs_lock in ufs/ufs/ufs_vnops.c. If the vnodes were handled differently, there wouldn't be a VOP_LOCK, per se, in each FS implementation. But alas, they are not. Unfortunately, the lockmgr() code is called to apply the lock onto a lock list pointer that is in the in-core inode; look at the UFS code for details (yes, I know that spec_lock doesn't do anything; you are not using the descriptor vector from miscfs/specfs/spec_vnops.c linke you think -- you are using the one in ufs/ffs/ffs_vnops.c). > 2. Why do I need to lock at all? According to the 4.4BSD book, it's > "advisory", and then only for directory searches and such. You need to lock in case you are doing something to the vnode that causes you to sleep waiting for resources (pages, etc.), and while you are asleep, someone else wants to come in and do something to the same vnode while you are in the middle of what you are doing. The panic occurs because your curproc is the process holding the other lock you are attempting to assert against, and the IN_RECURSE flag is not set (you can't set it because you can't get at the inode). That code is all a kludge anyway, since it should *always* be a counting lock, not just sometimes. Terrible implementation, there... > 3. Are the parameters correct? It looks as if I shouldn't specify > curproc in the calls, but I can't find any documentation that > tells me what to put there, nor indeed what use the whole thing > is. "According to the source code, this is correct behaviour". Since your device is operating on behalf of a process, you should provide that process to the lock. In reality, it's a kludge and a half to use the lockmgr the way it is being used by the VOP_LOCK right now; the lockmgr code is for advisory locking, and overloading it with another VOP entry point is just asking for exactly this type of collision, even when you aren't doing something strange, like I/O from a driver. You can: cd /sys/kern grep VOP_LOCK * | more But I suspect you won't find one not for the current process (which is used to manufacture the lock collision domain). > 4. This device is /dev/rsd1e. Should I even be calling > ufs_vnoperatespec? Probably not. Probably you should be calling bwrite(); this is highly dependent on what exactly your driver is, though. You can see that tty_tty.c (the tty driver) calls VOP_WRITE. It really depends on if your device is a real device or a pseudo-device, and whether you got into the driver through a system call or as the result of an interrupt. If you got in via an interrupt, you should probably not be calling VOP_WRITE. But if you call VOP_WRITE, then it's going to go through the FFS (and therefore the UFS common code). > 5. Am I doing anything else obviously wrong? Well, there is a lot of kernel code in this area that is obviously wrong. Probably, if you are going to call VOP_WRITE, then you should call LEASE_CHECK to notify NFS, or any additional user space opportunistic locking interface that you're doing something; many of the places in the kernel where people call VOP_WRITE or VOP_READ directly actually fail to call this, though. This doesn't really apply to things that can't be exported, though, so if there's no chance you will be given a vp for a file, then it's not a problem. If you are going to operate on general devices, though, you may find that vnconfig'ed devices might qualify. The Samba maintainer has expressed an interest in a user space interface to an opportunistic locking API -- it makes sense, given that both Samba and NFS support opportunistic locking, but neither gets notified of the others lock breaks (NFS calls this lease breaking). Technically, the code in kern/kern_ktrace.c is bad sample code for calling VOP_WRITE, as is the code in tty_tty.c, as is the code in vn_rdwr() in vfs_vnops.c, though vn_write is OK. Luckily, this is the code the system calls use. Unluckily, it's in a struct fileops, from when the Heideman code was pounded in 4.4; this is a legacy for the pipe and socket code which weren't modernized when everything else was. I'd really suggest examining what you are doing to see if the way you are doing it is the right way. Then look at the code in the functions starting with vn_ in kern/vfs_vnops.c; at least that code does the lease checking, etc.. You might even want to be calling vn_write instead of VOP_WRITE; it depends again on how you obtained your vnode. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.