From owner-svn-src-head@FreeBSD.ORG Sun Jan 18 11:32:01 2015 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88B40514; Sun, 18 Jan 2015 11:32:01 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2872EE4F; Sun, 18 Jan 2015 11:32:00 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t0IBVprB089472 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 18 Jan 2015 13:31:51 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t0IBVprB089472 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t0IBVp81089471; Sun, 18 Jan 2015 13:31:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 18 Jan 2015 13:31:51 +0200 From: Konstantin Belousov To: Hans Petter Selasky Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern Message-ID: <20150118113150.GQ42409@kib.kiev.ua> References: <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org> <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org> <20150116080332.GE42409@kib.kiev.ua> <54B8D31B.9030805@selasky.org> <20150118105410.GN42409@kib.kiev.ua> <54BB942A.5070200@selasky.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54BB942A.5070200@selasky.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jan 2015 11:32:01 -0000 On Sun, Jan 18, 2015 at 12:08:26PM +0100, Hans Petter Selasky wrote: > Hi Konstantin, > > On 01/18/15 11:54, Konstantin Belousov wrote: > > On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote: > >> Hi Konstantin, > >> > >> On 01/16/15 09:03, Konstantin Belousov wrote: > >>> My opinion is that you should have tried to handle the issue at the driver > >>> level, instead of making this devfs issue. I.e., if you already have > >>> cdev node with the correct name, driver should have replaced the private > >>> data to point to new device. > >> > >> I think this way cannot be implemented in a clean way, because of > >> locking order reversal. And if you try to avoid the LOR you end up with > >> the race. Chess mate sort of ;-) Let me explain: > >> > >> Thread 1: > >> usb_sx_lock(); > >> cdev = Look in freelist for existing device(); > >> else > >> cdev = make_dev(); > >> usb_sx_unlock(); > >> > >> Thread 2: > >> usb_sx_lock(); > >> put cdev on freelist > >> usb_sx_unlock(); > >> > >> Thread 3: > >> usb_sx_lock(); > >> cdev = remove first entry in freelist > >> usb_sx_unlock(); > >> > >> /* > >> * XXX because USB needs to call destroy_dev() unlocked we > >> * are racing with Thread 1 again > >> */ > >> destroy_dev(cdev); > > I do not understand the 'freelist' part. > > > > You have some usb (or whatever other kind) device, which is represented by > > some data structure. This data structure references cdev. In reverse, > > cdev points to this data structure by si_drv1 or similar driver-owned > > member of struct cdev. > > > > The structures naming usb devices have some mapping to/from usb bus addresses. > > When device is destroyed or created due to external plug event, there is > > some mechanism that ensures mapping consistence. It could be lock, it > > could be single-threaded process which discover the bus, or something > > else, I do not know. > > > > While the structure notes that device with some address goes out and comes > > in, it can look up the corresponding cdev and just change the si_drv1 pointer > > to point to the new device. > > What about events for "devd" that a new character devices is present in > the system or being destroyed for user-space applications? It is up to the driver to decide. If wanted, it could post the pair of destroy/create itself (e.g., if there is some state which require the userspace to reset), or it could decide not to. The later is reasonable if the destroy/create is believed to be caused by a glitch rather than the actual replug. > > > > > I find it very strange that you need sleepless operation which can > > _both_ destroy and create cdev. At least one operation of creation or > > destruction must sleep, at least in the current devfs design. It could > > be made sleepless, by making VFS visible part even less connected to the > > cdev part, but this is not how it (can) currently work. > > I think it would be good practice that all resources needed for creating > a character device can be allocated prior to registration. That means we > first should allocate all resources needed, but not register as a single > function. > > For example: > > Once make_dev() has returned, we can get an "d_open" callback. But > "si_drv1/2" is always set by drivers _after_ that "make_dev()" has > returned! This causes a race we could simply avoid by splitting the > make_dev() like I suggest. Instead of putting more logic and code inside > the drivers to handle the race! I wanted to tunnel the si_drv values through the make_dev(). I.e. there could appear yet another variant of make_dev() which takes the initialization parameters for the si_* driver vars. I just did not have time/motivation to do the pass. > > >> > >>> > >>> This would also close a window where /dev node is non-existent or operate > >>> erronously. > >> > >> I'm not saying I plan so, but I think "cdevs" at some point need to > >> understand mutexes and locks. That means like with other API's in the > >> kernel we can associate a lock with the "cdev", and this lock is then > >> used to ensure an atomic shutdown of the system in a non-blocking > >> fashion. In my past experience multithreaded APIs should be high level > >> implemented like this: > >> > >> NON-BLOCKING methods: > >> lock(); ** > >> xxx_start(); > >> xxx_stop(); > >> unlock(); > >> > >> BLOCKING methods: > >> setup(); // init > >> unsetup(); // drain > >> > >> Any callbacks should always be called locked ** > >> > >> In devfs there was no non-blocking stop before I added the delist_dev() > >> function. > >> > >>> > >>> You do not understand my point. > >>> > >>> I object against imposing one additional global reference on all cdevs > >>> just to cope with the delist hack. See the patch at the end of the message. > >> > >> It's fine by me. > >>> > >>> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling > >>> delist_dev(), the node still exists in the /dev. It is only removed > >>> after populate loop is run sometime later. dev_sched() KPI is inheritly > >>> racy, drivers must handle the races for other reasons. > >> > >> The populate loop is all running under the dev_lock() from what I can > >> see and make_dev() is also keeping the same lock when inserting and > >> removing new cdevs. The populate loop should always give a consistent > > No, populate loop is not run under dev_mtx. > > Are you sure: Yes, I am sure. I explained it in the next sentences which you break from the first one in the citation. > > > static int > > devfs_populate_loop(struct devfs_mount *dm, int cleanup) > > { > > struct cdev_priv *cdp; > > struct devfs_dirent *de; > > struct devfs_dirent *dd, *dt; > > struct cdev *pdev; > > int de_flags, depth, j; > > char *q, *s; > > > > sx_assert(&dm->dm_lock, SX_XLOCKED); > > dev_lock(); > ^^^^ what is this ? > > TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) { > > > It would not be able to > > allocate memory, even in M_NOWAIT fashion. The dev_mtx is after the > > vm and vnode locks. Only iteration over the cdevp_list is protected by > > dev_mtx, which is dropped right after something which require an > > action, is found on the list. Populate() needs some way to avoid > > reading inconsistent data from the cdev layer, but there is not > > guarantee that we are always up to date. ^^^^^^^^^ > > > >> view of the character devices available, and I don't see how "cdev" > >> structures without the CDP_ACTIVE flag can appear with recently created > >> ones, even if the name is the same. > > It cannot simply because it is not synchronized with the device > > creation/destruction. > > > > > Lookup (or any other VFS-level code) is not protected by dev_mtx, it > > is only dm lock which ensures that populate loop is not run in parallel, > > modifying direntries. > > See comment above. > > > > >> > >> That system functions can still call into the dangling read/write/ioctl > >> functions is another story, and that is why I tell, that in order to > >> simplify this teardown, we possibly should associate a client selectable > >> lock with each CDEV, for teardown purposes. Like done for several years > >> in the callout and USB APIs and possibly many other places. > > There is d_purge method which provides the tear-down semantic for > > drivers which needs it. Only tty uses the method AFAIK. > > > > Devfs ensures that no client code is active when cdev is removed for real. > > --HPS