Date: Sat, 15 Mar 2008 21:48:09 +0200 From: Kostik Belousov <kostikbel@gmail.com> To: Ed Schouten <ed@80386.nl> Cc: FreeBSD Arch <arch@freebsd.org> Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? Message-ID: <20080315194809.GN10374@deviant.kiev.zoral.com.ua> In-Reply-To: <20080316015903.N39516@delplex.bde.org> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--h/ohfBjN02kAJu/T Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote: > On Sat, 15 Mar 2008, Ed Schouten wrote: >=20 > >The last couple of days I'm seeing some strange things in my mpsafetty > >branch related to terminal revocation. > > > >In my current TTY design, I hold a count (t_ldisccnt) of the amount of > >threads that are sleeping in the line discipline. I need to store such a > >count, because it's not possible to change line disciplines while some > >threads are still blocked inside the discipline. This means that when > >d_close() is called on a TTY, t_ldisccnt should always be 0. There > >cannot be any threads stuck inside the line discipline when there aren't > >any descriptors referencing it. > > > >Unfortunately, this isn't entirely true with the current VFS/devfs > >design. When vgone() is called, a VOP_CLOSE() is performed , which means > >there could be a dozen threads still stuck inside a device driver, but > >the close routine is already called to clean up stuff. There are a > >*real* lot of drivers that blindly clean up their stuff in the d_close() > >routine, expecting that the device is completely unused. This can > >easily be demonstrated by revoking a bpf device, while running tcpdump. >=20 > Yes, most drivers are broken here, but the problem is rarely noticed > because revoke() isn't normally applied to any devices except ttys. > Even ordinary close() can cause problems when a thread is sleeping > in device open, but this too is only common for ttys (for callin and > callout devices). >=20 > The tty driver is about the only driver that handles this problem > almost correctly. It uses a generation count. All tty drivers are > supposed to sleep using only ttysleep(). ttysleep() checks the > generation count and returns ERESTART if the generation is new. All > tty drivers should consider this error to be fatal and propagate it > up to the syscall level where the syscall is restarted. This tends > to happen naturally, but some places (in device close IIRC), the driver > ignores the error and does more i/o (to finish cleaning up in close > -- close and open can easily pass each other and clobber each others > state when this happens). More I/O also tends to occur if a revoke() > happens when a thread is blocked but not sleeping. Then ttysleep() > isn't in sight, so the thread has no idea that the generation count > changed. Giant locking limits this problem. >=20 > >To be honest, I'm not completely sure how to solve this issue, though I > >know it should at least do something similar to this: > > > >- The device driver should have a seperate routine (d_revoke) to wake > > up any blocked threads, to make sure they leave the device driver > > properly. >=20 > Something is needed to signal blocked but non-sleeping threads. I > think the wakeup for ttys now normally occurs as a side effect of > flushing i/o. revoke() normally calls ttyclose() which calls ttyldclose() > which normally calls ttylclose() which flushes i/o which wakes up > threads waiting on the i/o. I don't see how ttylclose() can work right > in the usual !FNONBLOCK case. Maybe revoke() sets FNONBLOCK. The > generation count stuff doesn't help here because the flush is done > before incrementing the generation count. >=20 > There is an obvious race here for threads doing i/o instead of waiting > for it. These muse be blocked (by Giant now for ttys, or by your > MPSAFE locking). They will run when revoke() releases the lock and > find the i/o flushed and maybe the generation count incrememented, but > they normally won't check these states and will just blunder on doing > more i/o. It would be painful to check these states every time the > lock is aquired, but this seems to be necessary. Magic Giant locking > makes the places where the lock is acquired hard see. >=20 > >- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the > > vnode into deadfs, with the exception of the close() routine. Maybe > > it's better to add a new function to do this, vrevoke(). > > > >This means that when a revoke() call is performed, all blocked threads > >are woken up, will leave the driver, to find out their terminal has been > >revoked. Further system calls will fail, because the vnode is in deadfs, > >but when the processes close the descriptor, the device driver can still > >clean up everything. >=20 > I think vfs already moves the vnode to deadfs. It doesn't do anything > to synchronize with threads running in device drivers. The forced > last-close() should complete synchronously as part of revoke(). Then > other threads leave the device driver asynchronously, hopefully not > much later. Then if the generation count stuff is working right, the > syscall is restarted, but now file descriptors point to deadfs so the > syscall normally fails. I think the async completion is OK provided > it is done right (don't delay it indefinitely, and don't do more > i/o on completion). It doesn't seem to be useful to make revoke() > wait for the completions. >=20 > I don't think it would work well to move everything except d_close to > deadfs. >=20 > Other problems near here: > - neither vfs nor drivers currently know how many threads are in a > driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work This is provided by si_threadcount. See the dev(vn)_refthread and it usage in the devfs vnops and fops. > since it doesn't count threads sleeping in open. Maybe ones excuting > last close too -- this would be more of a problem. revoke() just > uses vcount(), which just acquires the device locks and returns > si_usecount after releasing the device lock. (I don't understand > this locking -- what stops the count changing after the lock is > released, or if it cannot changed then why acquire the lock?) This > can result in revoke() not calling device close when it should. > Drivers can obviously keep count of their activities using large > code. I can't see any way for vfs to keep count short of asking > drivers for their counts. > - there can be any number of threads in device open and close concurrentl= y, > even without the complications for revoke(). The most problematic > cases happen when last-close blocks, as is common for ttys waiting > for output to drain (since no one cares about their output actually > working and ensures draining it using tcdrain() -- normal losing > programs finish up with something like printf(); exit(); and depend > on the close() in exit(); blocking to drain the output). Then new > opens are allowed, and this is useful for doing ioctls() to unblocked > blocked closes. If the new open or fcntl sets non-blocking mode, then > the last-close for the new open may pass the blocked last close. If > the new mode is blocking, then the last-close for the new open may block > too. The number of threads in last-close is thus unlimited. A thunder= ing > herd of them tends to stomp on each other when they are all unblocked at > the same time. >=20 > The connections of this with revoke() are: > - it takes vfs's not counting of all threads in the device driver to > allow the useful behaviour of opens while a close is blocked and > the necessary behaviour of last-close while another last-close is > executing (drivers should be aware of this possibility and merge > the closes, but don't). > - I think revoke() sets FNONBLOCK somewhere. Thus it tends to unblock > any thread waiting in last-close for output to drain. >=20 > Less problematic cases occur when opens block. ttyopen() understands > this possibility and handles it almost right using its t_wopeners > count. ttyopen() uses various sleeps where it should use ttysleep() > or check the generation count itself; this results in it looping > internally instead of restarting the syscall, which is only a small > error since for open() alone, restarting the syscall would call back > to the same non-dead device open except in unusual cases where there > was a signal and syscalls are not restarted, or the device name went > away. There is still a problem with the vfs usage counting -- in > one case involving callin and callout devices whose details I forget, > last-close is not called when it needs to be called to wake up all > the threads sleeping in open so that they can enter a new state. The device driver already could provide the d_purge method that is intended to safely drain all threads that are now in the driver. See the kern_conf.c:destroy_dev() for the usage. Also, please note that the drivers cannot call destroy_dev() from the d_close method due to selflock with si_threadcount. The livelock is caused by the fix for the problem identical to the problem you described, with substitution s/ldisc/cdev/. The destroy_dev_sched() function is provided to execute destroy_dev() from another context. Alternatively to what was proposed regarding vrevoke(), you could use the similar lifecycle management for the ldisc, if suitable. --h/ohfBjN02kAJu/T Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkfcJ/gACgkQC3+MBN1Mb4jN+ACfXT7H0LrUGepI7fnS51azFdte pSYAnR9PIGY9M/yezNxRpxph+od4d5Up =u9ra -----END PGP SIGNATURE----- --h/ohfBjN02kAJu/T--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080315194809.GN10374>