From owner-freebsd-arch@FreeBSD.ORG Sat Mar 15 21:06:00 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4320D106566C for ; Sat, 15 Mar 2008 21:06:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id B845E8FC1A for ; Sat, 15 Mar 2008 21:05:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1JacxM-0004Q2-W0 for arch@freebsd.org; Sat, 15 Mar 2008 22:26:19 +0200 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m2FJmMWB091804 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 15 Mar 2008 21:48:22 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m2FJm9oL062454; Sat, 15 Mar 2008 21:48:09 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m2FJm9C1062453; Sat, 15 Mar 2008 21:48:09 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 15 Mar 2008 21:48:09 +0200 From: Kostik Belousov To: Ed Schouten Message-ID: <20080315194809.GN10374@deviant.kiev.zoral.com.ua> References: <20080315124008.GF80576@hoeg.nl> <20080316015903.N39516@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="h/ohfBjN02kAJu/T" Content-Disposition: inline In-Reply-To: <20080316015903.N39516@delplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua X-Scanner-Signature: 46978630fe90eedb12f62e56ce2f8684 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 2421 [Mar 14 2008] X-SpamTest-Info: helo_type=3 X-SpamTest-Method: none X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release Cc: FreeBSD Arch Subject: Re: vgone() calling VOP_CLOSE() -> blocked threads? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Mar 2008 21:06:00 -0000 --h/ohfBjN02kAJu/T Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote: > On Sat, 15 Mar 2008, Ed Schouten wrote: >=20 > >The last couple of days I'm seeing some strange things in my mpsafetty > >branch related to terminal revocation. > > > >In my current TTY design, I hold a count (t_ldisccnt) of the amount of > >threads that are sleeping in the line discipline. I need to store such a > >count, because it's not possible to change line disciplines while some > >threads are still blocked inside the discipline. This means that when > >d_close() is called on a TTY, t_ldisccnt should always be 0. There > >cannot be any threads stuck inside the line discipline when there aren't > >any descriptors referencing it. > > > >Unfortunately, this isn't entirely true with the current VFS/devfs > >design. When vgone() is called, a VOP_CLOSE() is performed , which means > >there could be a dozen threads still stuck inside a device driver, but > >the close routine is already called to clean up stuff. There are a > >*real* lot of drivers that blindly clean up their stuff in the d_close() > >routine, expecting that the device is completely unused. This can > >easily be demonstrated by revoking a bpf device, while running tcpdump. >=20 > Yes, most drivers are broken here, but the problem is rarely noticed > because revoke() isn't normally applied to any devices except ttys. > Even ordinary close() can cause problems when a thread is sleeping > in device open, but this too is only common for ttys (for callin and > callout devices). >=20 > The tty driver is about the only driver that handles this problem > almost correctly. It uses a generation count. All tty drivers are > supposed to sleep using only ttysleep(). ttysleep() checks the > generation count and returns ERESTART if the generation is new. All > tty drivers should consider this error to be fatal and propagate it > up to the syscall level where the syscall is restarted. This tends > to happen naturally, but some places (in device close IIRC), the driver > ignores the error and does more i/o (to finish cleaning up in close > -- close and open can easily pass each other and clobber each others > state when this happens). More I/O also tends to occur if a revoke() > happens when a thread is blocked but not sleeping. Then ttysleep() > isn't in sight, so the thread has no idea that the generation count > changed. Giant locking limits this problem. >=20 > >To be honest, I'm not completely sure how to solve this issue, though I > >know it should at least do something similar to this: > > > >- The device driver should have a seperate routine (d_revoke) to wake > > up any blocked threads, to make sure they leave the device driver > > properly. >=20 > Something is needed to signal blocked but non-sleeping threads. I > think the wakeup for ttys now normally occurs as a side effect of > flushing i/o. revoke() normally calls ttyclose() which calls ttyldclose() > which normally calls ttylclose() which flushes i/o which wakes up > threads waiting on the i/o. I don't see how ttylclose() can work right > in the usual !FNONBLOCK case. Maybe revoke() sets FNONBLOCK. The > generation count stuff doesn't help here because the flush is done > before incrementing the generation count. >=20 > There is an obvious race here for threads doing i/o instead of waiting > for it. These muse be blocked (by Giant now for ttys, or by your > MPSAFE locking). They will run when revoke() releases the lock and > find the i/o flushed and maybe the generation count incrememented, but > they normally won't check these states and will just blunder on doing > more i/o. It would be painful to check these states every time the > lock is aquired, but this seems to be necessary. Magic Giant locking > makes the places where the lock is acquired hard see. >=20 > >- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the > > vnode into deadfs, with the exception of the close() routine. Maybe > > it's better to add a new function to do this, vrevoke(). > > > >This means that when a revoke() call is performed, all blocked threads > >are woken up, will leave the driver, to find out their terminal has been > >revoked. Further system calls will fail, because the vnode is in deadfs, > >but when the processes close the descriptor, the device driver can still > >clean up everything. >=20 > I think vfs already moves the vnode to deadfs. It doesn't do anything > to synchronize with threads running in device drivers. The forced > last-close() should complete synchronously as part of revoke(). Then > other threads leave the device driver asynchronously, hopefully not > much later. Then if the generation count stuff is working right, the > syscall is restarted, but now file descriptors point to deadfs so the > syscall normally fails. I think the async completion is OK provided > it is done right (don't delay it indefinitely, and don't do more > i/o on completion). It doesn't seem to be useful to make revoke() > wait for the completions. >=20 > I don't think it would work well to move everything except d_close to > deadfs. >=20 > Other problems near here: > - neither vfs nor drivers currently know how many threads are in a > driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work This is provided by si_threadcount. See the dev(vn)_refthread and it usage in the devfs vnops and fops. > since it doesn't count threads sleeping in open. Maybe ones excuting > last close too -- this would be more of a problem. revoke() just > uses vcount(), which just acquires the device locks and returns > si_usecount after releasing the device lock. (I don't understand > this locking -- what stops the count changing after the lock is > released, or if it cannot changed then why acquire the lock?) This > can result in revoke() not calling device close when it should. > Drivers can obviously keep count of their activities using large > code. I can't see any way for vfs to keep count short of asking > drivers for their counts. > - there can be any number of threads in device open and close concurrentl= y, > even without the complications for revoke(). The most problematic > cases happen when last-close blocks, as is common for ttys waiting > for output to drain (since no one cares about their output actually > working and ensures draining it using tcdrain() -- normal losing > programs finish up with something like printf(); exit(); and depend > on the close() in exit(); blocking to drain the output). Then new > opens are allowed, and this is useful for doing ioctls() to unblocked > blocked closes. If the new open or fcntl sets non-blocking mode, then > the last-close for the new open may pass the blocked last close. If > the new mode is blocking, then the last-close for the new open may block > too. The number of threads in last-close is thus unlimited. A thunder= ing > herd of them tends to stomp on each other when they are all unblocked at > the same time. >=20 > The connections of this with revoke() are: > - it takes vfs's not counting of all threads in the device driver to > allow the useful behaviour of opens while a close is blocked and > the necessary behaviour of last-close while another last-close is > executing (drivers should be aware of this possibility and merge > the closes, but don't). > - I think revoke() sets FNONBLOCK somewhere. Thus it tends to unblock > any thread waiting in last-close for output to drain. >=20 > Less problematic cases occur when opens block. ttyopen() understands > this possibility and handles it almost right using its t_wopeners > count. ttyopen() uses various sleeps where it should use ttysleep() > or check the generation count itself; this results in it looping > internally instead of restarting the syscall, which is only a small > error since for open() alone, restarting the syscall would call back > to the same non-dead device open except in unusual cases where there > was a signal and syscalls are not restarted, or the device name went > away. There is still a problem with the vfs usage counting -- in > one case involving callin and callout devices whose details I forget, > last-close is not called when it needs to be called to wake up all > the threads sleeping in open so that they can enter a new state. The device driver already could provide the d_purge method that is intended to safely drain all threads that are now in the driver. See the kern_conf.c:destroy_dev() for the usage. Also, please note that the drivers cannot call destroy_dev() from the d_close method due to selflock with si_threadcount. The livelock is caused by the fix for the problem identical to the problem you described, with substitution s/ldisc/cdev/. The destroy_dev_sched() function is provided to execute destroy_dev() from another context. Alternatively to what was proposed regarding vrevoke(), you could use the similar lifecycle management for the ldisc, if suitable. --h/ohfBjN02kAJu/T Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkfcJ/gACgkQC3+MBN1Mb4jN+ACfXT7H0LrUGepI7fnS51azFdte pSYAnR9PIGY9M/yezNxRpxph+od4d5Up =u9ra -----END PGP SIGNATURE----- --h/ohfBjN02kAJu/T--