From owner-svn-src-head@FreeBSD.ORG Fri Jan 16 08:03:40 2015 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7381DB5; Fri, 16 Jan 2015 08:03:40 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C4890A39; Fri, 16 Jan 2015 08:03:39 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t0G83XxQ004003 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 16 Jan 2015 10:03:33 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t0G83XxQ004003 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t0G83WPm004002; Fri, 16 Jan 2015 10:03:32 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 16 Jan 2015 10:03:32 +0200 From: Konstantin Belousov To: Hans Petter Selasky Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern Message-ID: <20150116080332.GE42409@kib.kiev.ua> References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54B7AF2F.3080802@selasky.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jan 2015 08:03:40 -0000 On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote: > On 01/15/15 12:51, Konstantin Belousov wrote: > > On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote: > >> > >> I see no leakage in that case either! > > Because these cases come through destroy_dev(). > > > >> > >> Are there more cases which I don't see? > > You are breaking existig devfs KPI by your hack. You introduce yet another > > reference on the device, which is not supposed to be there. > > Hi Konstantin, > > I need a non-sleeping way to say a character device is no longer > supposed to be used and be able to re-use the device name right away > creating a new device. I guess you got that. Yes, I got it. The devfs design is not suitable for this, and your hack is the good witness of the fact. My opinion is that you should have tried to handle the issue at the driver level, instead of making this devfs issue. I.e., if you already have cdev node with the correct name, driver should have replaced the private data to point to new device. This would also close a window where /dev node is non-existent or operate erronously. > > > > > If some code calls delist_dev(), it could be said that it is a contract > > of the new function that destroy_dev() must be called eventually on > > the cdev. Then, the reference could be said to be shared-owned by > > delist_dev() and destroy_dev(). But, for arbitrary devfs user this new > > reference is unacceptable and breaks interface. > > delist_dev() changes no references. It can be called multiple times > even, also inside destroy_devl(). Also I think that the > "destroy_dev_sched_cbl()" function should call delist_dev() first so > that we don't have a time from when the "destroy_dev_sched_cbl()" > function is called where the device entry still exists in devfs mounts > until the final destroy_devl() is done by a taskqueue. You do not understand my point. I object against imposing one additional global reference on all cdevs just to cope with the delist hack. See the patch at the end of the message. WRT destroy_dev_sched_cb() calling delist_dev(), even after calling delist_dev(), the node still exists in the /dev. It is only removed after populate loop is run sometime later. dev_sched() KPI is inheritly racy, drivers must handle the races for other reasons. > > >>>> In the case of direct free through #1, the reference count is ignored > >>>> and it doesn't matter if it is one or zero. Only in the case of > >>>> destruction through destroy_dev() it matters. > >>>> > >>>> Like the comment says in destroy_devl(): > >>>> > >>>> /* Avoid race with dev_rel() */ > >>>> > >>>> The problem is that the "cdev->si_refcount" is zero when the initial > >>>> devfs_create() is called. Then one ref is made. When we clear the > >>>> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running > >>>> process to destroy all the FS related structures and the reference count > >>>> goes back to zero when the "cdp" is removed from the "cdevp_list". Then > >>>> the cdev is freed too early. This happens because destroy_devl() is > >>>> dropping the dev_lock() to sleep waiting for pending references. > >>> Basically, this is very good explanation why your delist hack is wrong, > >>> for one of the reason. Another reason is explained below. > >>> You are trying to cover it with additional reference, but this is wrong > >>> as well. > >>> > >>>> > >>>> Do you see something else? > >>> > >>> I think that what you are trying to do with the CDP_ACTIVE hack is doomed > >>> anyway, because you are allowing for devfs directory to have two entries > >>> with the same name, until the populate loop cleans up the inactive one. > >>> In the meantime, any access to the directory operates on random entry. > >> > >> The entry will not be random, because upon an open() call to a character > >> device, I believe the devfs_lookup() function will be called, which > >> always populate the devfs tree at first by calls to > >> devfs_populate_xxx(). Any delisted devices which don't have the > >> "CDP_ACTIVE" bit set, will never be seen by any open. > > Entry can be random, since after the populate loop is ran, your code in > > other thread could start and create duplicate entry. There is a window > > in the lookup where both directory vnode lock and mount point sx locks > > are dropped. So running the populate does not guarantee anything. > > If there is such a race, it is already there! My patch changes nothing > in that area: > > Thread1: > Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes > waiting for some refs for xxx milliseconds. > Thread2: > Tries to create create a new character device having the same name like > the one in thread1. Device name duplication check is missed because > CDP_ACTIVE is cleared. Still thread1 is waiting. > Thread3: > Tries to open character device created by thread2 while thread1 is still > waiting for some ref held by a userspace app to go away. > > This can happen already before my patches! What do you think? Possibly. > > > > >> > >> Regarding leftover filedescriptors which still access the old "cdev" > >> this is not a problem, and these will be closed when the si_refcount > >> goes to zero after the destroy_devl() call. > >> > >>> > >>> The checks for existent names in make_dev() are performed for the reason, > >>> and you makes the rounds to effectively ignore it. > >>> > >> > >> These checks are still correct and don't conflict with my patch from > >> what I can see. Else the existing destroy_devl() would also be broken > >> even before my patch with regard to the "random" selection of character > >> devices at open() from userspace. > > > > The checks are done to avoid duplicate names. Your patch makes these > > checks ineffective (i.e. broken). > > At what level do you mean duplicate names, I don't get this fully? At > the directory level (DE nodes)? Or inside the list of character devices > (LIST_XXX)? It does not matter, dup at either one directory level, or dup of full names in the global list are equivalent (bad) things. > > > Let me summarize: > > - the extra reference on arbitrary cdev should be eliminated. The > > delist_dev_locked() may add the ref and set some CDP_ flag to > > indicate to destroy_dev() that it should do dev_rel(). > > It is possible to do this. I thought about this before doing my patch, > but decided to try to avoid adding yet another cdev flag. > > > - the existence of the duplicated entries should be either eliminated > > (I am not sure it is possible with your code), or we must ensure > > that only one name with CDP_ACTIVE flag set and given name exists. > > Then, all lookup code must be audited to take CDP_ACTIVE into account > > when accessing names. I see at least devfs_find() and devfs_mknod() > > which must be changed. I did not performed full audit. > > I will check this path out aswell. > It seems it is simpler for me to try to clean up after the commit. The patch was only lightly tested. I post the patch for discussion, not for you to committing it. I will expedite the change into HEAD after the consensus on it is made and adequate testing is performed. diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c index 294bd62..6620aef 100644 --- a/sys/fs/devfs/devfs_devs.c +++ b/sys/fs/devfs/devfs_devs.c @@ -137,12 +137,6 @@ devfs_alloc(int flags) vfs_timestamp(&ts); cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts; cdev->si_cred = NULL; - /* - * Avoid race with dev_rel() by setting the initial - * reference count to 1. This last reference is taken - * by the destroy_dev() function. - */ - cdev->si_refcount = 1; return (cdev); } @@ -192,6 +186,16 @@ devfs_find(struct devfs_dirent *dd, const char *name, int namelen, int type) continue; if (type != 0 && type != de->de_dirent->d_type) continue; + + /* + * The race with finding non-active name is not + * completely closed by the check, but it is similar + * to the devfs_allocv() in making it unlikely enough. + */ + if (de->de_dirent->d_type == DT_CHR && + (de->de_cdp->cdp_flags & CDP_ACTIVE) == 0) + continue; + if (bcmp(name, de->de_dirent->d_name, namelen) != 0) continue; break; diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h index ce55416..6c57109 100644 --- a/sys/fs/devfs/devfs_int.h +++ b/sys/fs/devfs/devfs_int.h @@ -56,6 +56,7 @@ struct cdev_priv { u_int cdp_flags; #define CDP_ACTIVE (1 << 0) #define CDP_SCHED_DTR (1 << 1) +#define CDP_UNREF_DTR (1 << 2) u_int cdp_inuse; u_int cdp_maxdirent; diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index 9153588..570f710 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -1045,6 +1045,9 @@ devfs_mknod(struct vop_mknod_args *ap) TAILQ_FOREACH(de, &dd->de_dlist, de_list) { if (cnp->cn_namelen != de->de_dirent->d_namlen) continue; + if (de->de_dirent->d_type == DT_CHR && + (de->de_cdp->cdp_flags & CDP_ACTIVE) == 0) + continue; if (bcmp(cnp->cn_nameptr, de->de_dirent->d_name, de->de_dirent->d_namlen) != 0) continue; diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c index bcd6fb9..79c8fea 100644 --- a/sys/kern/kern_conf.c +++ b/sys/kern/kern_conf.c @@ -116,6 +116,8 @@ dev_free_devlocked(struct cdev *cdev) mtx_assert(&devmtx, MA_OWNED); cdp = cdev2priv(cdev); + KASSERT((cdp->cdp_flags & CDP_UNREF_DTR) == 0, + ("destroy_dev() was not called after delist_dev(%p)", cdev)); TAILQ_INSERT_HEAD(&cdevp_free_list, cdp, cdp_list); } @@ -1035,6 +1037,7 @@ destroy_devl(struct cdev *dev) { struct cdevsw *csw; struct cdev_privdata *p; + struct cdev_priv *cdp; mtx_assert(&devmtx, MA_OWNED); KASSERT(dev->si_flags & SI_NAMED, @@ -1043,7 +1046,18 @@ destroy_devl(struct cdev *dev) ("WARNING: Driver mistake: destroy_dev on eternal %d\n", dev2unit(dev))); - devfs_destroy(dev); + cdp = cdev2priv(dev); + if ((cdp->cdp_flags & CDP_UNREF_DTR) == 0) { + /* + * Avoid race with dev_rel(), e.g. from the populate + * loop. If CDP_UNREF_DTR flag is set, the reference + * to be dropped at the end of destroy_devl() was + * already taken by delist_dev_locked(). + */ + dev_refl(dev); + + devfs_destroy(dev); + } /* Remove name marking */ dev->si_flags &= ~SI_NAMED; @@ -1103,19 +1117,27 @@ destroy_devl(struct cdev *dev) } } dev->si_flags &= ~SI_ALIAS; - dev->si_refcount--; /* Avoid race with dev_rel() */ + cdp->cdp_flags &= ~CDP_UNREF_DTR; + dev->si_refcount--; - if (dev->si_refcount > 0) { + if (dev->si_refcount > 0) LIST_INSERT_HEAD(&dead_cdevsw.d_devs, dev, si_list); - } else { + else dev_free_devlocked(dev); - } } static void delist_dev_locked(struct cdev *dev) { + struct cdev_priv *cdp; struct cdev *child; + + mtx_assert(&devmtx, MA_OWNED); + cdp = cdev2priv(dev); + if ((cdp->cdp_flags & CDP_UNREF_DTR) != 0) + return; + cdp->cdp_flags |= CDP_UNREF_DTR; + dev_refl(dev); devfs_destroy(dev); LIST_FOREACH(child, &dev->si_children, si_siblings) delist_dev_locked(child); @@ -1124,6 +1146,7 @@ delist_dev_locked(struct cdev *dev) void delist_dev(struct cdev *dev) { + dev_lock(); delist_dev_locked(dev); dev_unlock();