Date: Thu, 28 Apr 2005 14:13:28 +1000 From: Stephen McKay <smckay@internode.on.net> To: freebsd-stable@freebsd.org Cc: Stephen McKay <smckay@internode.on.net> Subject: Re: nvi dying with "Resource temporarily unavailable" [SOLVED] Message-ID: <200504280413.j3S4DS3k007921@dungeon.home> In-Reply-To: <200308250044.h7P0iHch006248@dungeon.home> from Stephen McKay at "Mon, 25 Aug 2003 10:44:17 %2B1000" References: <200308071230.h77CUMgj003099@dungeon.home> <3F327F44.9080909@mac.com> <200308080057.h780vDdx005361@dungeon.home> <20030809103630.Q59623@carver.gumbysoft.com> <200308250044.h7P0iHch006248@dungeon.home>
next in thread | previous in thread | raw e-mail | index | archive | help
This is resurrecting an old thread, but I'd like the answer to be found in searches, so here goes: On Monday, 25th August 2003, Stephen McKay wrote: >On Saturday, 9th August 2003, Doug White wrote: > >>On Fri, 8 Aug 2003, Stephen McKay wrote: >> >>> >Stephen McKay wrote: >>> >> Since I upgraded to FreeBSD 4.8 (from 4.5) I've noticed occasional failures >>> >> of nvi. It will suddenly die as a key is pressed, emitting: >>> >> >>> >> Error: input: Resource temporarily unavailable >> >>We went round and round on irc about this a few weeks back. We pinned it >>down to a bad error check in nvi. Unfortunately the fix was non-obvious. >>There's a read() that needs to check for EAGAIN and loop back around on >>the read. If someone wants to take a crack at this, the offending read() >>is at common/cl_read.c line 266. >You almost had me convinced until I got this extraordinary result: > >$ cat >cat: stdin: Resource temporarily unavailable >$ > >Never seen the like in all my born days. > >I'm running zsh 3.1.9 on FreeBSD 4.8-RELEASE. /bin/cat is a simple program. >If it isn't working properly, there's a fault in zsh or the kernel. I have >not upgraded zsh since July 2000. That makes it a bug in the kernel. What >else could it be? At a stretch perhaps a bug in libc. Nothing else comes >to mind. The fault lies in libc_r. I don't yet know how to fix libc_r, or even if it will ever be fixed, but I have installed a protective mechanism on my system that I am willing to live with. One of the things that libc_r does is a trick to allow I/O in one thread to not block other threads: it sets all file descriptors to nonblocking, including 0, 1 and 2. Descriptors 0, 1 and 2 normally refer to your tty (usually a pty nowadays) unless you go to the trouble of redirecting them. A further twist is that these descriptors are not the result of reopening your tty, but come from dup(), and hence share the underlying file flags with all other processes in that session, including your shell and any processes it starts before or after the one that uses libc_r. Setting nonblocking mode on a shared descriptor like this affects *all* processes using it. In other words, your shell, nvi, cat, and indeed all other programs on that tty now have a nonblocking descriptor for it. Having stdin or stdout suddenly become non-blocking causes many programs to fail mysteriously. In short, just running a program linked against libc_r in the background can cause other programs to fail. This is clearly unacceptable. It has taken me quite some years to track this down and it has almost made me lose faith in FreeBSD. (Why would anyone use an OS that fails randomly?) It's especially illuminating (from a programming point of view) that the root cause is in a subsystem I've never used and hence never examined all those times I went looking for the problem. It's the other programmers that have been using libc_r in more and more programs (some which I use without even knowing it) that has caused this slow degradation of my FreeBSD experience. How many other bugs like this are hidden in the ever increasing complexity of FreeBSD (or indeed any other software)? Unexpected interactions are everywhere and we should work hard to minimise them! OK, enough of the rambling philosophy: How can this be prevented? As described in this posting: http://lists.freebsd.org/mailman/htdig/freebsd-hackers/2005-January/009742.html I have added code to my 4.11 kernel to prevent background processes from setting O_NONBLOCK on ttys. I've been running with this for over 3 months and in that time have had no unexpected nvi exits or other weirdness. I believe this is a cure. I also believe that no process can reasonably expect to set O_NONBLOCK on its tty when in the background and hence I think this should be added to -current. But the side effect of the cure is that you cannot start a threaded program in the background without redirecting stdin, stdout and stderr elsewhere. I accept this as a cost of fixing the problem. You may not be so generous. If so, perhaps you can think of a way of fixing libc_r directly. Personally, I'd be happy enough to prevent the damage (by banning background O_NONBLOCK on ttys) while waiting for libc_r to die a natural death as the other threading libraries in 5.x and 6.x take over. Stephen.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200504280413.j3S4DS3k007921>