Date: Sun, 10 Mar 2019 11:47:58 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Hans Petter Selasky <hps@selasky.org> Cc: Warner Losh <imp@bsdimp.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, "O'Connor, Daniel" <darius@dons.net.au> Subject: Re: USB stack getting confused Message-ID: <20190310094758.GP2492@kib.kiev.ua> In-Reply-To: <fd5038a4-406b-6e4b-bb52-b567b1954ad1@selasky.org> References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au> <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org> <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org> <20190309162640.GN2492@kib.kiev.ua> <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com> <20190309192330.GO2492@kib.kiev.ua> <fd5038a4-406b-6e4b-bb52-b567b1954ad1@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 09, 2019 at 10:35:28PM +0100, Hans Petter Selasky wrote: > On 3/9/19 8:23 PM, Konstantin Belousov wrote: > > On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote: > >> > >> Is there a form of destroy_dev() that does a revoke on all open instances? > >> Eg, this is gone, you can't use it anymore, and all further attempts to use > >> the device will generate an error, but in the mean time we destroy the > >> device and let the detach routine get on with life. waiting may make sense > >> when you are merely unloading the driver (and getting to the detach routine > >> that way), but when the device is gone, I've come around to the point of > >> view that we should just destroy it w/o waiting for closes and anybody that > >> touches it afterwards gets an error and has to cope with the error. But > >> even in the unload case, we maybe we shouldn't get to the detach routine > >> unless we're forcing and/or the detach routine just returns EBUSY since the > >> only one that knows what dev_t's are associated with the device_t is the > >> driver itself. > > You are asking very basic questions about devfs there. > > > > destroy_dev(9) waits for two things: > > - that all threads left the cdevsw methods for the given device; > > - that all cdevpriv destructors finished running. > > Hi, > > > To facilitate waking up threads potentially sleeping inside the cdevsw > > methods, drivers might implement d_purge method which must weed out sleeping > > threads from inside the code in the bound time. > > USB will purge all callers before calling destroy_dev(). This is not the > problem. > > > After that we return from destroy_dev(9) and guarantee that no new calls > > into cdevsw is done for this device. devfs magic consumes the fo_ and > > VOP_ calls and does not allow them to reach into the driver. > > When I designed the current USB devfs it was important to me to keep > open() and close() calls balanced to avoid situations where an open call > may setup some resource and then close(), which free this resource > again, never gets called. destroy_dev(9) makes no such guarantee, and I > think that is a failure of destroy_dev(9). That's when I started using > the cdev's destructor callback function. Lets correct the terminology first. Are you referring to the d_open/d_close pairing ? Without D_TRACKCLOSE, d_close() is only called on the last close of the device. With D_TRACKCLOSE, devfs _tries_ to call d_close each time it sees the VOP_CLOSE() operation from VFS, but due to way VFS works VOP_CLOSE() could be missed. Also, d_open vs d_close are not synchronized, so a driver might get call to d_open in parallel to last d_close. What do you mean by cdev destructor callback function ? Do you mean callback from destroy_dev_cb(), or do you actually reference the destructors from devfs_set_cdevpriv(9) ? If the later, then destroy_dev() guarantees that all cdevpriv destructors for all file descriptors opened against the destroyed cdev are finished before destroy_dev() returns. In other words, if you use cdevpriv, you can remove the drain for your refcount and everything should just work. > > > So what usb does there is actively defeating existing mechanism by > > keeping internal refcount on opens and refusing to call destroy_dev() > > until the count goes to zero > > The FreeBSD USB stack also is used in environments w/o DEVFS and need > own refcounts. I completely disagree with use of code sharing as excuse for FreeBSD bugs. > > > (I did not read the usb code, but I believe > > that I am not too wrong). > >Would usb core just destroy_dev() when the > > physical device goes away, then at worst the existing file descriptors > > opened against the lost devices would become dead (not same dead as > > terminals after revoke(2), but very similar). > > Yes, I can do that if destroy_dev() ensures that d_close is called for > all open file handles once and only once before it returns. I think this > is where the problem comes from. See above. For d_close it is impossible, for cdevpriv dtr it is already there by design. > > > > > If the problem is due to keeping some instance data for the opened device, > > then cdevpriv might be the better fit (at least the KPI was designed > > to be) than blocking destroy until all users are gone. > > > > The USB stack does not use MMAP, so this is not a problem. I do not follow, why does it matter ? On Sat, Mar 09, 2019 at 10:40:02PM +0100, Hans Petter Selasky wrote: > On 3/9/19 8:28 PM, Rozhuk Ivan wrote: > > On Sat, 9 Mar 2019 18:26:40 +0200 > > Konstantin Belousov <kostikbel@gmail.com> wrote: > > > >> In fact I saw something similar with apcupsd and either usb/com > >> adapters or native usb control card for APC UPSes. For reasons I do > >> not understand, these devices are often disconnected. For older > >> versions of apcupsd, it required restart for newly reattached device > >> to be recreated in /dev. Sometimes it hangs whole usb stack. > >> > >> Newer apcupsd seems to open /dev/ugen only for the duration of the > >> query, which makes the erratic behaviour is much less likely, but > >> could still cause breakage when device disappear while apcupsd has it > >> opened. > >> > > > > Same problem with usb sound cards. > > I try to fix it, but fail with dsp, only mixer can be fixed with small code change. > > https://reviews.freebsd.org/D11140 > > > > Hi, > > How will these apps detect that they need to open the new /dev/mixer node? > > I mean, after hang is fixed, mixer app will still try to query the old > file handle forever? Userspace gets either ENXIO or EIO from syscalls. For polls, it gets POLLIN | POLLHUP immediately.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190310094758.GP2492>