Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Mar 2019 21:23:30 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Hans Petter Selasky <hps@selasky.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, "O'Connor, Daniel" <darius@dons.net.au>
Subject:   Re: USB stack getting confused
Message-ID:  <20190309192330.GO2492@kib.kiev.ua>
In-Reply-To: <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
References:  <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au> <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org> <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org> <20190309162640.GN2492@kib.kiev.ua> <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
> On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov <kostikbel@gmail.com>
> wrote:
> 
> > On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> > > On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> > > >>
> > > >>
> > > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org>
> > wrote:
> > > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> > > >>>> My program normally runs continually doing acquisitions of data for
> > N seconds, doing some checks and restarting. After a while (~30 1 minute
> > acquisitions or ~8 30 minute ones) my program can't 'see' the device (it
> > uses libusb10) any more (it reconnects each acquisition for $REASONS). Also
> > pretty weirdly usbconfig can't see it either(!).
> > > >>>
> > > >>> What is printed in dmesg? Maybe the device has a problem.
> > > >>
> > > >> There is nothing in dmesg - no disconnect / reconnect etc.
> > > >>
> > > >> If I hold the user space process in gdb 'forever' (eg over night)
> > usbconfig doesn't see the device, but the moment I quit the user space
> > process it can be seen again.
> > > >
> > > > Does it mean that the file descriptor opened for ugen has a chance to
> > > > be closed ?
> > >
> > > The USB stack will wait for all FDs to be closed during detach also via
> > > destroy_dev().
> > So my guess was correct.  Do you agree that this behaviour is wrong ?
> >
> > In fact I saw something similar with apcupsd and either usb/com adapters
> > or native usb control card for APC UPSes.  For reasons I do not understand,
> > these devices are often disconnected.  For older versions of apcupsd,
> > it required restart for newly reattached device to be recreated in /dev.
> > Sometimes it hangs whole usb stack.
> >
> > Newer apcupsd seems to open /dev/ugen only for the duration of the query,
> > which makes the erratic behaviour is much less likely, but could still
> > cause
> > breakage when device disappear while apcupsd has it opened.
> >
> 
> Is there a form of destroy_dev() that does a revoke on all open instances?
> Eg, this is gone, you can't use it anymore, and all further attempts to use
> the device will generate an error, but in the mean time we destroy the
> device and let the detach routine get on with life. waiting may make sense
> when you are merely unloading the driver (and getting to the detach routine
> that way), but when the device is gone, I've come around to the point of
> view that we should just destroy it w/o waiting for closes and anybody that
> touches it afterwards gets an error and has to cope with the error. But
> even in the unload case, we maybe we shouldn't get to the detach routine
> unless we're forcing and/or the detach routine just returns EBUSY since the
> only one that knows what dev_t's are associated with the device_t is the
> driver itself.
You are asking very basic questions about devfs there.

destroy_dev(9) waits for two things:
- that all threads left the cdevsw methods for the given device;
- that all cdevpriv destructors finished running.
To facilitate waking up threads potentially sleeping inside the cdevsw
methods, drivers might implement d_purge method which must weed out sleeping
threads from inside the code in the bound time.

After that we return from destroy_dev(9) and guarantee that no new calls
into cdevsw is done for this device.  devfs magic consumes  the fo_ and
VOP_ calls and does not allow them to reach into the driver.

So what usb does there is actively defeating existing mechanism by
keeping internal refcount on opens and refusing to call destroy_dev()
until the count goes to zero (I did not read the usb code, but I believe
that I am not too wrong).  Would usb core just destroy_dev() when the
physical device goes away, then at worst the existing file descriptors
opened against the lost devices would become dead (not same dead as
terminals after revoke(2), but very similar).

If the problem is due to keeping some instance data for the opened device,
then cdevpriv might be the better fit (at least the KPI was designed
to be) than blocking destroy until all users are gone.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190309192330.GO2492>