FreeBSD Mail Archives

Date:      Sat, 5 Oct 2019 07:05:55 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Kyle Evans <kevans@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r353103 - head/sys/net
Message-ID:  <20191005055823.S2708@besplex.bde.org>
In-Reply-To: <ece67d32-2624-4e06-08a6-5d67aa4a2e03@FreeBSD.org>
References:  <201910041343.x94Dh7Zo078270@repo.freebsd.org> <ece67d32-2624-4e06-08a6-5d67aa4a2e03@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail

On Fri, 4 Oct 2019, John Baldwin wrote:

> On 10/4/19 6:43 AM, Kyle Evans wrote:
>> Author: kevans
>> Date: Fri Oct  4 13:43:07 2019
>> New Revision: 353103
>> URL: https://svnweb.freebsd.org/changeset/base/353103
>>
>> Log:
>>   tuntap(4): loosen up tunclose restrictions
>>
>>   Realistically, this cannot work. We don't allow the tun to be opened twice,
>>   so it must be done via fd passing, fork, dup, some mechanism like these.
>>   Applications demonstrably do not enforce strict ordering when they're
>>   handing off tun devices, so the parent closing before the child will easily
>>   leave the tun/tap device in a bad state where it can't be destroyed and a
>>   confused user because they did nothing wrong.
>>
>>   Concede that we can't leave the tun/tap device in this kind of state because
>>   of software not playing the TUNSIFPID game, but it is still good to find and
>>   fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good
>>   discipline in tun handling.
>
> Why are you using d_close for last close anyway?  It's not really reliable compared
> to using cdevpriv and a cdevpriv dtor.

Correct last-close close is necessarily very complicated.  Here is the
ttydev_close() part of mine for the tty driver.  It uses cdevpriv to a
fault.

One fault from this is that since tracking closes actually works, it
breaks mmap() since cdevpriv is associated with open files, but mmap()
is supposed to work on closed files and even for open files cdevpriv
is not available in vm_fault().  So my fixes make vm_fault() on ttys
always fail, since tty_mmap() always fails on entry since cdevpriv is
invalid.

XX Index: tty.c
XX ===================================================================
XX --- tty.c	(revision 332488)
XX +++ tty.c	(working copy)

I only have this patch for an old version of FreeBSD.

XX @@ -368,37 +693,157 @@
XX      struct thread *td __unused)
XX  {
XX  	struct tty *tp = dev->si_drv1;
XX +	int *ocountp;
XX +	int error;
XX 
XX  	tty_lock(tp);
XX +	tty_trace(T_C_START, "start", dev, fflag, NO_E);
XX +	if (tp->t_dev == NULL) {
XX +		tty_trace(T_C_DONE | 0x2000 | T_V, "done (no t_dev)", dev,
XX +		    fflag, ENXIO);
XX +		tty_unlock(tp);
XX +		return (ENXIO);
XX +	}
XX +	tp->t_acount++;
XX +	ocountp = ttydev_ocountp(dev);
XX +	if (*ocountp <= 0) {
XX +		tty_trace(0x4000 | T_C_DONE, "done (already closed)", dev,
XX +		    fflag, 0);
XX +		if (fflag & FREVOKE) {
XX +			(*ttydev_devgenp(dev))++;
XX +			tty_wakeupall(tp);
XX +		}
XX +		ttydev_exit(tp);
XX +		return (0);
XX +	}
XX +	if ((fflag & FREVOKE) == 0) {
XX +		if (tty_devgen() != tty_filegen()) {

This is the main use of cdevpriv.  A file generation is stored in cdevpriv
and returned by tty_filegen() (return -1 if cdevpriv is invalid).  A device
generation is one of 4 generation counts stored in the tty struct and is
returned by tty_devgen().  When these are not equal, it means a revoke in
the past or an even more complicated condition.

XX +			tty_trace(T_C_DONE | 0x8000 | T_I,
XX +			    "done (revoked before starting)", dev, fflag, 0);
XX +			ttydev_exit(tp);
XX +			return (0);
XX +		}
XX +		if (*ocountp > 1) {
XX +			(*ocountp)--;
XX +			tty_trace(0x10000 | T_C_DONE, "done (non-last)",
XX +			    dev, fflag, 0);
XX +			ttydev_exit(tp);
XX +			return (0);
XX +		}
XX +		if (!tty_gone(tp)) {
XX +			tty_trace(0x20000, "tty_drain", dev, fflag, NO_E);
XX +			/*
XX +			 * This used to be broken by forcibly clearing
XX +			 * TF_STOPPED earlier, and even asserted that.
XX +			 * Expect more "correct" hangs now.  The special
XX +			 * timeout amelioriates this.  Only software flow
XX +			 * control was cleared anyway.
XX +			 */
XX +			error = tty_drain(tp, 1);
XX +			tty_trace(0x40000, "tty_drain done", dev, fflag, error);
XX +			/*
XX +			 * Do exactly the same as before draining.
XX +			 * Revoking must be checked for explicitly,
XX +			 * since tty_drain() returns ENXIO if both
XX +			 * revoked and gone, and ERESTART can mean
XX +			 * EINTR.
XX +			 *
XX +			 * Ignore errors in draining, since they are
XX +			 * nonstandard except for EINTR and even EINTR
XX +			 * is too hard for callers to handle.
XX +			 */
XX +			if (tty_devgen() != tty_filegen()) {
XX +				tty_trace(0x80000 | T_C_DONE | T_I,
XX +				    "done (revoked during draining)", dev,
XX +				    fflag, 0);
XX +				ttydev_exit(tp);
XX +				return (0);
XX +			}
XX +			if (*ocountp <= 0) {
XX +				tty_trace(0x100000 | T_C_DONE | T_A,
XX +				    "done (closed during draining)", dev,
XX +				    fflag, 0);
XX +				ttydev_exit(tp);
XX +				return (0);
XX +			}
XX +			if (*ocountp > 1) {
XX +				(*ocountp)--;
XX +				tty_trace(0x200000 | T_C_DONE | T_I,
XX +				    "done (non-last after draining)",
XX +				    dev, fflag, 0);
XX +				ttydev_exit(tp);
XX +				return (0);
XX +			}
XX +		}
XX +	}

Draining in a non-broken tty driver is very complicated.

open() must be able to pass last-close(), with one thread blocked
draining in last-close(); while it is blocked, any number of new
O_NONBLOCK opens and ioctls must be allowed to control the blockage.

close() can pass last-close().  closes for the new opens must work,
perhaps by treating them as non-last.

Revokes are like forced last-close() on all open fd's for the device.

The physical device can also go away while draining it.  This looks
much like a revoke.

All these things can happen during draining because the tty lock is
released during draining and no other tty-related locks are held.
-current breaks draining in last-close() and many things in open()
using the big fat lock (state flag) TF_OPENCLOSE.

Any device driver that supports revokes and blocking in last-close() must
have complications like the above.

This is too much to replicate in even 2 drivers.  Perhaps it can be
done at a higher level.  Start with revoke().  revoke() and devices
going should work for all cdevs and are relatively simple -- just force
everything closed.  But first you have to know where everything is, so
as to wait or defer the full close until no threads are in the driver.
In the above, ttydev_exit() usually only unlocks and counts things,
while ttydev_leave() usually completes a last-close() but it defers
freeing all resources if some thread is in the driver.

XX 
XX  	/*
XX -	 * Don't actually close the device if it is being used as the
XX -	 * console.
XX +	 * We now have a completing last close or a completing revoke
XX +	 * on an open device (but see below about sharing the tty
XX +	 * between cdevs -- the close or revoke may still be partial).
XX  	 */
XX  	MPASS((tp->t_flags & (TF_OPENED_CONS | TF_OPENED_IN)) == 0 ||
XX  	    (tp->t_flags & TF_OPENED_OUT) == 0);
XX  	if (dev == dev_console)
XX  		tp->t_flags &= ~TF_OPENED_CONS;
XX +	else if (TTY_CALLOUT(tp, dev))
XX +		tp->t_flags &= ~TF_OPENED_OUT;
XX  	else
XX -		tp->t_flags &= ~(TF_OPENED_IN|TF_OPENED_OUT);
XX +		tp->t_flags &= ~TF_OPENED_IN;
XX +	*ocountp = 0;
XX 
XX -	if (tp->t_flags & TF_OPENED) {
XX -		tty_unlock(tp);
XX +	/*
XX +	 * If revoking, increment the device generation.  It is critical
XX +	 * that the this is per-device and not per-tty.  Threads in the
XX +	 * driver using or trying to open the revoked dev will wake up
XX +	 * and restart the syscall a few times before their fd is moved
XX +	 * to deadfs, and threads in the driver using or trying to open
XX +	 * the non-revoked devs will wake up but only restart their loop
XX +	 * in the driver.
XX +	 *
XX +	 * If revoking, flush the shared i/o queues for security.
XX +	 *
XX +	 * Always wake up everything for simplicity.
XX +	 */
XX +	if (fflag & FREVOKE) {
XX +		tty_flush(tp, FREAD | FWRITE);
XX +		(*ttydev_devgenp(dev))++;
XX +	}
XX +	tty_wakeupall(tp);
XX +
XX +	/*
XX +	 * Increment the general purpose generation.  This is used to
XX +	 * distinguish the wakeup events of new-open and last-close from
XX +	 * DTR rises.  The device generation is used to distinguish the
XX +	 * wakeup event of a revoke.
XX +	 */
XX +	tp->t_gpgen++;
XX +
XX +	/*
XX +	 * The tty struct is shared between CONS and IN at the same
XX +	 * time and between these and OUT at different times.  So when
XX +	 * CONS or IN was closed, the other one may be left open.  Don't
XX +	 * do a complete close if it is.
XX +	 *
XX +	 * In the partial close case, the unclosed part is unaffected
XX +	 * except for a revoke its i/o was flushed.  The closure
XX +	 * operation for the closed part did very little except for a
XX +	 * revoke it flushed i/o.  Otherwise, it mainly waited for
XX +	 * (shared) output to drain and then adjusted the open flags and
XX +	 * counts for the closed part.  The wait for output to drain was
XX +	 * more unbounded than usual since it doesn't take new open
XX +	 * passing a last close to give another writer.
XX +	 */
XX +	if (tty_opened(tp)) {
XX +		tty_trace(0x800000 | T_C_DONE, "done (shared dev open)", dev,
XX +		    fflag, 0);
XX +		ttydev_exit(tp);
XX  		return (0);
XX  	}
XX 
XX -	/* If revoking, flush output now to avoid draining it later. */
XX -	if (fflag & FREVOKE)
XX -		tty_flush(tp, FWRITE);
XX -
XX  	tp->t_flags &= ~TF_EXCLUDE;
XX 
XX -	/* Properly wake up threads that are stuck - revoke(). */
XX -	tp->t_revokecnt++;
XX -	tty_wakeup(tp, FREAD|FWRITE);
XX -	cv_broadcast(&tp->t_bgwait);
XX -	cv_broadcast(&tp->t_dcdwait);
XX -
XX +	tty_trace(T_C_DONE, "done (really)", dev, fflag, 0);
XX  	ttydev_leave(tp);
XX 
XX  	return (0);

Bruce

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191005055823.S2708>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation