From owner-svn-src-head@FreeBSD.ORG  Sun Jan 18 11:32:01 2015
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 88B40514;
 Sun, 18 Jan 2015 11:32:01 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2872EE4F;
 Sun, 18 Jan 2015 11:32:00 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t0IBVprB089472
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 18 Jan 2015 13:31:51 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t0IBVprB089472
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id t0IBVp81089471;
 Sun, 18 Jan 2015 13:31:51 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 18 Jan 2015 13:31:51 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Hans Petter Selasky <hps@selasky.org>
Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern
Message-ID: <20150118113150.GQ42409@kib.kiev.ua>
References: <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org>
 <20150115093841.GX42409@kib.kiev.ua> <54B79B25.3070707@selasky.org>
 <20150115115123.GA42409@kib.kiev.ua> <54B7AF2F.3080802@selasky.org>
 <20150116080332.GE42409@kib.kiev.ua> <54B8D31B.9030805@selasky.org>
 <20150118105410.GN42409@kib.kiev.ua> <54BB942A.5070200@selasky.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <54BB942A.5070200@selasky.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Jan 2015 11:32:01 -0000

On Sun, Jan 18, 2015 at 12:08:26PM +0100, Hans Petter Selasky wrote:
> Hi Konstantin,
> 
> On 01/18/15 11:54, Konstantin Belousov wrote:
> > On Fri, Jan 16, 2015 at 10:00:11AM +0100, Hans Petter Selasky wrote:
> >> Hi Konstantin,
> >>
> >> On 01/16/15 09:03, Konstantin Belousov wrote:
> >>> My opinion is that you should have tried to handle the issue at the driver
> >>> level, instead of making this devfs issue.  I.e., if you already have
> >>> cdev node with the correct name, driver should have replaced the private
> >>> data to point to new device.
> >>
> >> I think this way cannot be implemented in a clean way, because of
> >> locking order reversal. And if you try to avoid the LOR you end up with
> >> the race. Chess mate sort of ;-)  Let me explain:
> >>
> >> Thread 1:
> >> usb_sx_lock();
> >>     cdev = Look in freelist for existing device();
> >> else
> >>     cdev = make_dev();
> >> usb_sx_unlock();
> >>
> >> Thread 2:
> >> usb_sx_lock();
> >> put cdev on freelist
> >> usb_sx_unlock();
> >>
> >> Thread 3:
> >> usb_sx_lock();
> >> cdev = remove first entry in freelist
> >> usb_sx_unlock();
> >>
> >> /*
> >>    * XXX because USB needs to call destroy_dev() unlocked we
> >>    * are racing with Thread 1 again
> >>    */
> >> destroy_dev(cdev);
> > I do not understand the 'freelist' part.
> >
> > You have some usb (or whatever other kind) device, which is represented by
> > some data structure.  This data structure references cdev.  In reverse,
> > cdev points to this data structure by si_drv1 or similar driver-owned
> > member of struct cdev.
> >
> > The structures naming usb devices have some mapping to/from usb bus addresses.
> > When device is destroyed or created due to external plug event, there is
> > some mechanism that ensures mapping consistence.  It could be lock, it
> > could be single-threaded process which discover the bus, or something
> > else, I do not know.
> >
> > While the structure notes that device with some address goes out and comes
> > in, it can look up the corresponding cdev and just change the si_drv1 pointer
> > to point to the new device.
> 
> What about events for "devd" that a new character devices is present in 
> the system or being destroyed for user-space applications?
It is up to the driver to decide.  If wanted, it could post the pair of
destroy/create itself (e.g., if there is some state which require
the userspace to reset), or it could decide not to.  The later is
reasonable if the destroy/create is believed to be caused by a glitch
rather than the actual replug.

> 
> >
> > I find it very strange that you need sleepless operation which can
> > _both_ destroy and create cdev. At least one operation of creation or
> > destruction must sleep, at least in the current devfs design. It could
> > be made sleepless, by making VFS visible part even less connected to the
> > cdev part, but this is not how it (can) currently work.
> 
> I think it would be good practice that all resources needed for creating 
> a character device can be allocated prior to registration. That means we 
> first should allocate all resources needed, but not register as a single 
> function.
> 
> For example:
> 
> Once make_dev() has returned, we can get an "d_open" callback. But 
> "si_drv1/2" is always set by drivers _after_ that "make_dev()" has 
> returned! This causes a race we could simply avoid by splitting the 
> make_dev() like I suggest. Instead of putting more logic and code inside 
> the drivers to handle the race!
I wanted to tunnel the si_drv values through the make_dev().
I.e. there could appear yet another variant of make_dev() which takes
the initialization parameters for the si_* driver vars.

I just did not have time/motivation to do the pass.
> 
> >>
> >>>
> >>> This would also close a window where /dev node is non-existent or operate
> >>> erronously.
> >>
> >> I'm not saying I plan so, but I think "cdevs" at some point need to
> >> understand mutexes and locks. That means like with other API's in the
> >> kernel we can associate a lock with the "cdev", and this lock is then
> >> used to ensure an atomic shutdown of the system in a non-blocking
> >> fashion. In my past experience multithreaded APIs should be high level
> >> implemented like this:
> >>
> >> NON-BLOCKING methods:
> >> lock(); **
> >> xxx_start();
> >> xxx_stop();
> >> unlock();
> >>
> >> BLOCKING methods:
> >> setup(); // init
> >> unsetup(); // drain
> >>
> >> Any callbacks should always be called locked **
> >>
> >> In devfs there was no non-blocking stop before I added the delist_dev()
> >> function.
> >>
> >>>
> >>> You do not understand my point.
> >>>
> >>> I object against imposing one additional global reference on all cdevs
> >>> just to cope with the delist hack.  See the patch at the end of the message.
> >>
> >> It's fine by me.
> >>>
> >>> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
> >>> delist_dev(), the node still exists in the /dev. It is only removed
> >>> after populate loop is run sometime later. dev_sched() KPI is inheritly
> >>> racy, drivers must handle the races for other reasons.
> >>
> >> The populate loop is all running under the dev_lock() from what I can
> >> see and make_dev() is also keeping the same lock when inserting and
> >> removing new cdevs. The populate loop should always give a consistent
> > No, populate loop is not run under dev_mtx.
> 
> Are you sure:
Yes, I am sure.  I explained it in the next sentences which you break from
the first one in the citation.

> 
> > static int
> > devfs_populate_loop(struct devfs_mount *dm, int cleanup)
> > {
> >         struct cdev_priv *cdp;
> >         struct devfs_dirent *de;
> >         struct devfs_dirent *dd, *dt;
> >         struct cdev *pdev;
> >         int de_flags, depth, j;
> >         char *q, *s;
> >
> >         sx_assert(&dm->dm_lock, SX_XLOCKED);
> >         dev_lock();
>            ^^^^ what is this ?
> >         TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {
> 
> > It would not be able to
> > allocate memory, even in M_NOWAIT fashion.  The dev_mtx is after the
> > vm and vnode locks.  Only iteration over the cdevp_list is protected by
> > dev_mtx, which is dropped right after something which require an
> > action, is found on the list. Populate() needs some way to avoid
> > reading inconsistent data from the cdev layer, but there is not
> > guarantee that we are always up to date.
^^^^^^^^^

> >
> >> view of the character devices available, and I don't see how "cdev"
> >> structures without the CDP_ACTIVE flag can appear with recently created
> >> ones, even if the name is the same.
> > It cannot simply because it is not synchronized with the device
> > creation/destruction.
> 
> 
> 
> > Lookup (or any other VFS-level code) is not protected by dev_mtx, it
> > is only dm lock which ensures that populate loop is not run in parallel,
> > modifying direntries.
> 
> See comment above.
> 
> >
> >>
> >> That system functions can still call into the dangling read/write/ioctl
> >> functions is another story, and that is why I tell, that in order to
> >> simplify this teardown, we possibly should associate a client selectable
> >> lock with each CDEV, for teardown purposes. Like done for several years
> >> in the callout and USB APIs and possibly many other places.
> > There is d_purge method which provides the tear-down semantic for
> > drivers which needs it.  Only tty uses the method AFAIK.
> >
> > Devfs ensures that no client code is active when cdev is removed for real.
> 
> --HPS