Date: Sat, 5 Oct 2019 07:05:55 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: John Baldwin <jhb@freebsd.org> Cc: Kyle Evans <kevans@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r353103 - head/sys/net Message-ID: <20191005055823.S2708@besplex.bde.org> In-Reply-To: <ece67d32-2624-4e06-08a6-5d67aa4a2e03@FreeBSD.org> References: <201910041343.x94Dh7Zo078270@repo.freebsd.org> <ece67d32-2624-4e06-08a6-5d67aa4a2e03@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 4 Oct 2019, John Baldwin wrote: > On 10/4/19 6:43 AM, Kyle Evans wrote: >> Author: kevans >> Date: Fri Oct 4 13:43:07 2019 >> New Revision: 353103 >> URL: https://svnweb.freebsd.org/changeset/base/353103 >> >> Log: >> tuntap(4): loosen up tunclose restrictions >> >> Realistically, this cannot work. We don't allow the tun to be opened twice, >> so it must be done via fd passing, fork, dup, some mechanism like these. >> Applications demonstrably do not enforce strict ordering when they're >> handing off tun devices, so the parent closing before the child will easily >> leave the tun/tap device in a bad state where it can't be destroyed and a >> confused user because they did nothing wrong. >> >> Concede that we can't leave the tun/tap device in this kind of state because >> of software not playing the TUNSIFPID game, but it is still good to find and >> fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good >> discipline in tun handling. > > Why are you using d_close for last close anyway? It's not really reliable compared > to using cdevpriv and a cdevpriv dtor. Correct last-close close is necessarily very complicated. Here is the ttydev_close() part of mine for the tty driver. It uses cdevpriv to a fault. One fault from this is that since tracking closes actually works, it breaks mmap() since cdevpriv is associated with open files, but mmap() is supposed to work on closed files and even for open files cdevpriv is not available in vm_fault(). So my fixes make vm_fault() on ttys always fail, since tty_mmap() always fails on entry since cdevpriv is invalid. XX Index: tty.c XX =================================================================== XX --- tty.c (revision 332488) XX +++ tty.c (working copy) I only have this patch for an old version of FreeBSD. XX @@ -368,37 +693,157 @@ XX struct thread *td __unused) XX { XX struct tty *tp = dev->si_drv1; XX + int *ocountp; XX + int error; XX XX tty_lock(tp); XX + tty_trace(T_C_START, "start", dev, fflag, NO_E); XX + if (tp->t_dev == NULL) { XX + tty_trace(T_C_DONE | 0x2000 | T_V, "done (no t_dev)", dev, XX + fflag, ENXIO); XX + tty_unlock(tp); XX + return (ENXIO); XX + } XX + tp->t_acount++; XX + ocountp = ttydev_ocountp(dev); XX + if (*ocountp <= 0) { XX + tty_trace(0x4000 | T_C_DONE, "done (already closed)", dev, XX + fflag, 0); XX + if (fflag & FREVOKE) { XX + (*ttydev_devgenp(dev))++; XX + tty_wakeupall(tp); XX + } XX + ttydev_exit(tp); XX + return (0); XX + } XX + if ((fflag & FREVOKE) == 0) { XX + if (tty_devgen() != tty_filegen()) { This is the main use of cdevpriv. A file generation is stored in cdevpriv and returned by tty_filegen() (return -1 if cdevpriv is invalid). A device generation is one of 4 generation counts stored in the tty struct and is returned by tty_devgen(). When these are not equal, it means a revoke in the past or an even more complicated condition. XX + tty_trace(T_C_DONE | 0x8000 | T_I, XX + "done (revoked before starting)", dev, fflag, 0); XX + ttydev_exit(tp); XX + return (0); XX + } XX + if (*ocountp > 1) { XX + (*ocountp)--; XX + tty_trace(0x10000 | T_C_DONE, "done (non-last)", XX + dev, fflag, 0); XX + ttydev_exit(tp); XX + return (0); XX + } XX + if (!tty_gone(tp)) { XX + tty_trace(0x20000, "tty_drain", dev, fflag, NO_E); XX + /* XX + * This used to be broken by forcibly clearing XX + * TF_STOPPED earlier, and even asserted that. XX + * Expect more "correct" hangs now. The special XX + * timeout amelioriates this. Only software flow XX + * control was cleared anyway. XX + */ XX + error = tty_drain(tp, 1); XX + tty_trace(0x40000, "tty_drain done", dev, fflag, error); XX + /* XX + * Do exactly the same as before draining. XX + * Revoking must be checked for explicitly, XX + * since tty_drain() returns ENXIO if both XX + * revoked and gone, and ERESTART can mean XX + * EINTR. XX + * XX + * Ignore errors in draining, since they are XX + * nonstandard except for EINTR and even EINTR XX + * is too hard for callers to handle. XX + */ XX + if (tty_devgen() != tty_filegen()) { XX + tty_trace(0x80000 | T_C_DONE | T_I, XX + "done (revoked during draining)", dev, XX + fflag, 0); XX + ttydev_exit(tp); XX + return (0); XX + } XX + if (*ocountp <= 0) { XX + tty_trace(0x100000 | T_C_DONE | T_A, XX + "done (closed during draining)", dev, XX + fflag, 0); XX + ttydev_exit(tp); XX + return (0); XX + } XX + if (*ocountp > 1) { XX + (*ocountp)--; XX + tty_trace(0x200000 | T_C_DONE | T_I, XX + "done (non-last after draining)", XX + dev, fflag, 0); XX + ttydev_exit(tp); XX + return (0); XX + } XX + } XX + } Draining in a non-broken tty driver is very complicated. open() must be able to pass last-close(), with one thread blocked draining in last-close(); while it is blocked, any number of new O_NONBLOCK opens and ioctls must be allowed to control the blockage. close() can pass last-close(). closes for the new opens must work, perhaps by treating them as non-last. Revokes are like forced last-close() on all open fd's for the device. The physical device can also go away while draining it. This looks much like a revoke. All these things can happen during draining because the tty lock is released during draining and no other tty-related locks are held. -current breaks draining in last-close() and many things in open() using the big fat lock (state flag) TF_OPENCLOSE. Any device driver that supports revokes and blocking in last-close() must have complications like the above. This is too much to replicate in even 2 drivers. Perhaps it can be done at a higher level. Start with revoke(). revoke() and devices going should work for all cdevs and are relatively simple -- just force everything closed. But first you have to know where everything is, so as to wait or defer the full close until no threads are in the driver. In the above, ttydev_exit() usually only unlocks and counts things, while ttydev_leave() usually completes a last-close() but it defers freeing all resources if some thread is in the driver. XX XX /* XX - * Don't actually close the device if it is being used as the XX - * console. XX + * We now have a completing last close or a completing revoke XX + * on an open device (but see below about sharing the tty XX + * between cdevs -- the close or revoke may still be partial). XX */ XX MPASS((tp->t_flags & (TF_OPENED_CONS | TF_OPENED_IN)) == 0 || XX (tp->t_flags & TF_OPENED_OUT) == 0); XX if (dev == dev_console) XX tp->t_flags &= ~TF_OPENED_CONS; XX + else if (TTY_CALLOUT(tp, dev)) XX + tp->t_flags &= ~TF_OPENED_OUT; XX else XX - tp->t_flags &= ~(TF_OPENED_IN|TF_OPENED_OUT); XX + tp->t_flags &= ~TF_OPENED_IN; XX + *ocountp = 0; XX XX - if (tp->t_flags & TF_OPENED) { XX - tty_unlock(tp); XX + /* XX + * If revoking, increment the device generation. It is critical XX + * that the this is per-device and not per-tty. Threads in the XX + * driver using or trying to open the revoked dev will wake up XX + * and restart the syscall a few times before their fd is moved XX + * to deadfs, and threads in the driver using or trying to open XX + * the non-revoked devs will wake up but only restart their loop XX + * in the driver. XX + * XX + * If revoking, flush the shared i/o queues for security. XX + * XX + * Always wake up everything for simplicity. XX + */ XX + if (fflag & FREVOKE) { XX + tty_flush(tp, FREAD | FWRITE); XX + (*ttydev_devgenp(dev))++; XX + } XX + tty_wakeupall(tp); XX + XX + /* XX + * Increment the general purpose generation. This is used to XX + * distinguish the wakeup events of new-open and last-close from XX + * DTR rises. The device generation is used to distinguish the XX + * wakeup event of a revoke. XX + */ XX + tp->t_gpgen++; XX + XX + /* XX + * The tty struct is shared between CONS and IN at the same XX + * time and between these and OUT at different times. So when XX + * CONS or IN was closed, the other one may be left open. Don't XX + * do a complete close if it is. XX + * XX + * In the partial close case, the unclosed part is unaffected XX + * except for a revoke its i/o was flushed. The closure XX + * operation for the closed part did very little except for a XX + * revoke it flushed i/o. Otherwise, it mainly waited for XX + * (shared) output to drain and then adjusted the open flags and XX + * counts for the closed part. The wait for output to drain was XX + * more unbounded than usual since it doesn't take new open XX + * passing a last close to give another writer. XX + */ XX + if (tty_opened(tp)) { XX + tty_trace(0x800000 | T_C_DONE, "done (shared dev open)", dev, XX + fflag, 0); XX + ttydev_exit(tp); XX return (0); XX } XX XX - /* If revoking, flush output now to avoid draining it later. */ XX - if (fflag & FREVOKE) XX - tty_flush(tp, FWRITE); XX - XX tp->t_flags &= ~TF_EXCLUDE; XX XX - /* Properly wake up threads that are stuck - revoke(). */ XX - tp->t_revokecnt++; XX - tty_wakeup(tp, FREAD|FWRITE); XX - cv_broadcast(&tp->t_bgwait); XX - cv_broadcast(&tp->t_dcdwait); XX - XX + tty_trace(T_C_DONE, "done (really)", dev, fflag, 0); XX ttydev_leave(tp); XX XX return (0); Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191005055823.S2708>