From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 5 23:39:19 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F32F16A41F; Fri, 5 Aug 2005 23:39:19 +0000 (GMT) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe11.swipnet.se [212.247.155.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 254D743D45; Fri, 5 Aug 2005 23:39:17 +0000 (GMT) (envelope-from hselasky@c2i.net) X-T2-Posting-ID: gvlK0tOCzrqh9CPROFOFPw== Received: from mp-217-203-29.daxnet.no ([193.217.203.29] verified) by mailfe11.swip.net (CommuniGate Pro SMTP 4.3.4) with ESMTP id 4205728; Sat, 06 Aug 2005 01:39:14 +0200 From: Hans Petter Selasky To: freebsd-hackers@freebsd.org Date: Sat, 6 Aug 2005 01:39:55 +0200 User-Agent: KMail/1.7 References: <200508030023.04748.hselasky@c2i.net> <200508042253.34165.hselasky@c2i.net> <200508051329.14767.jhb@FreeBSD.org> In-Reply-To: <200508051329.14767.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508060139.57143.hselasky@c2i.net> Cc: Subject: Re: How to do proper locking X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: hselasky@c2i.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Aug 2005 23:39:19 -0000 On Friday 05 August 2005 19:29, John Baldwin wrote: > On Thursday 04 August 2005 04:53 pm, Hans Petter Selasky wrote: > > On Thursday 04 August 2005 20:15, John Baldwin wrote: > > > On Thursday 04 August 2005 12:50 pm, Hans Petter Selasky wrote: > > > > On Thursday 04 August 2005 15:53, John Baldwin wrote: > > > > > On Thursday 04 August 2005 07:40 am, Hans Petter Selasky wrote: > > > > > > On Wednesday 03 August 2005 19:21, John Baldwin wrote: > > > > > > > On Tuesday 02 August 2005 06:23 pm, Hans Petter Selasky wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I am looking for a safe way to access structures that can be > > > > > > > > freed. The solution I am looking for must not: > > > > > > > > > > > > > > > > - hinder scaleability > > > > > > > > - lead to use of a single lock > > > > > > > > - lead to lock order reversal > > > > > > > > > This is only going to work if you are detaching. But consider the > > following: > > > > FOO_LOCK(sc); > > callout_stop(); > > callout_start(); > > FOO_UNLOCK(sc); > > > > Your solution is only spinning halfway around. What I say is that you > > need some more parameters passed to callbacks, so that one can check if > > it has been cancelled before proceeding. > > Err, there is no callout_start(), but callout_reset() will just reschedule > a callout if it's already pending. The consumer of the callout can > maintain his own side state that he can check in his callout to handle a > state change in between scheduling the callout and the callout running. Yes, right, but I want this state variable checking to be centralized. On the spark I can think of the following two situations where the state must always be checked: - before calling a callback (like d_read / d_write / d_ioctl / timeouts / usbd_callback ... ) - after returning from sleep > > Now consider my proposal. There is no more need for > > dev_refthread/dev_relthread. Instead the callback, "d_read()" will do the > > checking. This consist of: > > > > d_read() > > { > > mtx_lock(args->mtx); > > > > if(args->refcount_copy == atomic_read(args->refcount_p)) > > { > > /* valid */ > > } > > > > mtx_unlock(args->mtx); > > > > return; > > } > > > > Per "devfs_read_f()" one has got to lock/unlock one time. It involves two > > refcount reads and no refcount writes. > > > > Well, isn't this 200 percent faster? > > Maybe, but another thing you need to consider is "maintainenance" overhead. > Device drivers, especially, should be the simplest parts of the kernel to > implement because we want to minimize the potential for screw up in that > area. Having to trust that the same code is going to be duplicated 40 or > 50 times without any errors versus having it done once in a centralized > place where it breaks everyone if it is broken is just insane. Otherwise, > we might as well go write the whole kernel in assembly so we can tweak > every last bit out of it. :) Yes, you are right, but the problem is, that for most callback systems in the kernel, there is no mechanism that will pre-lock some custom mutex before calling the callback. I am not speaking about adding lines to existing code, but to add one extra parameter to the setup functions, where the mutex that should be locked before calling the callback(s) can be specified. If it is NULL, Giant will be used. The setup functions I have in mind are for example: "make_dev()", "bus_setup_intr()", "callout_reset()" ... and in general all callback systems that look like these. OLD: make_dev(devsw, minor, uid, gid, perms, fmt...) NEW: make_dev(devsw, mtx, minor, uid, gid, perms, fmt...) This increment only refcount system that I want, will be setup automatically by the setup functions, and code is saved in the device drivers. Destruction functions will increment the refcount and so invalidate the old copies of the refcounts, all centralized. Here is my idea of how this can be implemented for the filesystem mounted on /dev : Struct cdev will have a new field of type callback_args: struct cdev { ... struct callback_args si_callback_args; ... }; Maybe we can trim down the callback_args structure to something like this: And maybe we can pick a better name for it. struct callback_args { struct mtx *mtx; u_int32_t *p_refcount; u_int32_t refcount; }; To update the kernel you make a simple script that searches for "make_dev()" and adds one NULL parameters after the first parameter. Look over all the subsitutions to catch any exceptions and that is it. Update the manual. We don't have to rewrite any drivers at all. Keep the code like it is, no changes are needed at all!!! Then one has to update "devfs_read_f()": devfs_read_f() { This first part should be done once, and the result should be stored in some file descriptor that is owned by the thread and needs no locking. mtx_lock(global); struct callback_args cba = dev->si_callback_args; mtx_unlock(global); This is the second part, and as a good rule, the caller is doing the locking. There is only one problem and that is that the callback can call tsleep/msleep. So what we do here is that we extend "struct thread" to contain another field, "struct callback_args *td_cba_p", which we use to pass a hint to tsleep/msleep, about that we are holding a lock: mtx_lock(cba.mtx); if(cba.refcount_copy == atomic_read(cba.refcount_p)) { /* valid */ curthread->td_cba_p = &cba; /* pass hint to next tsleep/msleep */ error = xxx->d_read(); if(curthread->td_cba_p == NULL) { return error; /* lock already exited */ } curthread->td_cba_p = NULL; } else { error = ENXIO; } mtx_unlock(cba.mtx); return error; } Then we have to update msleep(): int new_msleep() { struct callback_args *args = curthread->td_cba_p; int error; if(args) { /* we are holding a lock that we need to exit */ error = old_msleep(... &args->mtx ...); /* here we can save a lot of code by checking * that things are still present like when called, * before returning ! */ if(args->refcount_copy == atomic_read(args->refcount_p)) { // EGONE is a new error enum, that serves this // special case only // error = EGONE; } } else { error = old_msleep(); } return error; } Another consequence of this is that our device driver does not have to contain a single line about "mtx_lock()"/"mtx_unlock()". And then one is saved the worries about having to exit a lock before returning. my_read() { int error; error = tsleep() Look at this. Here we can handle signal interruption, tsleep timeout and device removal in a single check. if(error) return error; return 0; } Then we can also call uiomove() without haveing to worry about exiting any locks. And if anything goes wrong, copyout() will return EGONE, and the function is no longer accessing any freed memory. Holding a lock while calling uiomove() should be allowed, because the lock is preventing the memory uiomove() is writing to/from, from beeing freed. Then one can actually call any function that can sleep, without worrying about exiting any locks. The only thing is that we have to make a note that these functions can exit the currently held lock or maybe we should specify currently active "callback_args". So this is just backwards: mtx_unlock(); ptr = malloc(); mtx_lock(); In 99 percent of the cases malloc will never sleep, and if the call takes some time, we pass a hint to malloc and to this unlock-/lock-ing centralized. That makes the code cleaner. Now how about this: ptr = malloc(); if(ptr == NULL) { /* no memory or EGONE */ if(check_refcount(curthread) == OK) { we can still access any softc } else { just return, nothing more to do } } But in the general case I assume that one can just return immediately if one didn't get the memory that was needed, assuming that the destructor will clean up the softc, hence we are exiting in the middle of a program, with perhaps loose ends. If the malloc() routine finds out that it has to do a lengthy memory search, then it can exit the callback arguments pointed to by "curthread->td_cba_p" and enter those arguments again, checking the refcount, before returning. This way we will never see a single line of mtx_lock/mtx_unlock in the code, in most cases. Well, is it not better to have locking centralized, than decentralized: The functions that sleep must exit/enter the callback args by default. So what do you think about this way of allowing us to hold locks while calling functions that can sleep? Assuming there are "n" callbacks in the kernel. Then there is "2*n" too many lock/unlock lines in the kernel, than strictly necessary. That is a rough calculation. The number might be a little lower, hence not all callbacks need locking. What do you think about having this "check some value before proceeding" also when returning from sleep, centralized? --HPS