Date: Mon, 24 Jul 2006 16:57:57 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Jo Rhett <jrhett@svcolo.com> Cc: "Rick C. Petty" <rick-freebsd@kiwi-computer.com>, freebsd-hardware@FreeBSD.org Subject: Re: device busy -- no locks? Message-ID: <20060724154856.I58894@delplex.bde.org> In-Reply-To: <20060721004731.GC8868@svcolo.com> References: <20060721000018.GA99237@svcolo.com> <20060721001607.GA64376@megan.kiwi-computer.com> <20060721004731.GC8868@svcolo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 20 Jul 2006, Jo Rhett wrote: > Thanks for the super-quick reply! Responses are inline... > > On Thu, Jul 20, 2006 at 07:16:07PM -0500, Rick C. Petty wrote: >>> root@scapa 47# fstat /dev/ttyd0 >>> USER CMD PID FD MOUNT INUM MODE SZ|DV R/W NAME >> >> What about "fstat /dev/cuad0" ? Anyway, I've found that fstat is useless, >> try using sysutils/lsof instead. This is related to a longstanding (design) bug in vfs (first-open/last-close semantics). vfs counts devices as being open when it calls the device open routine, despite devices not actually being open until the device open routine returns successfully, which may happen much later (or not at all in the case of failure, but this doesn't cause any additional problems). This causes the device close routine to not be called in some cases where it should be. For bidirectional serial devices, not calling the device close routine results in "callin" devices that are sleeping in open being treated the same as "callin" devices that have successfully completed the open. The former shouldn't give EBUSY for opens of the corresponding "callout" device, but the latter should and do. FreeBSD-1 has a hack to vfs to work around the bug, but the hack was lost in FreeBSD-2. I still use the hack locally. Startting with about FreeBSD-4, there is a D_TRACKCLOSE device flag that can be used to fix the problem less hackishly (but still not in the right way, since it requires individual drivers to do generic things). I haven't got around to using it to fix sio even locally. D_TRACKCLOSE is mostly unused and mostly used bogusly when it is used. The bug rarely causes problems since it is only activated by doing something like the following: thread1: "open" /dev/ttyd0. Actually, block in open waiting for carrier. thread2: open /dev/ttyd0 using O_NONBLOCK to prevent blocking. thread2: perhaps actually use /dev/ttyd0 thread2: "close" /dev/ttyd0. Actually, don't complete the close due to the bogus vfs close. thread1: remain blocked in open through all the above. thread3: try to open /dev/cuaa^Hd0. Get EBUSY because the non-open by thread1 is seen as an open. Starting with about FreeBSD-5, there may be additional problems from races. First-open/last-close semantics basically require opens to be synchronous. Sleeping in open for serial device drivers gives large race windows in which to open may race open/close of the same device in other threads. Prempting the kernel gives small race windows. In practice, Giant locking limits problems. For serial drivers, open/close should still be Giant- locked since the whole tty subsystem is still Giant-locked. (Note that all vfs locks are dropped before calling device open/close. The bogus vfs count provides some psuedo-locking.) Rearrangement of serial drivers in -current may have enlarged the bug, but I can't see any enlargement except that from more serial drivers now supporting bidirectional devices. > Sorry, yes. Same results. And if lsof shows things that fstat doesn't, > then this is a bug in FreeBSD. > > But anyway, > root@scapa 63# lsof /dev/cuad0 > root@scapa 64# lsof /dev/ttyd0 > > Nada. I think fstat and lsof can't see threads sleeping in open since the open hasn't really completed -- the open has completed enough to confuse vfs but not for vfs to report its confusion to userland. It should be possible to see threads sleeping in open using "ps -lax | grep ttydcd" ("ttydcd" is the string for -current; the string for sio used to be "siodcd". Grep for "tty" and "dcd" too). This won't distinguish between threads sleeping normally in open (ttyopen) and ones that are in a bogus state due to a missing close. > Also note that this system is pretty bone stock. Standard install, plus > mysql and apache. Nothing else would be using the port. It's something > that left it locked, and really only "login" could be the culprit. Quite likely, but login doesn't use O_NONBLOCK so I don't know how it could trigger the bug. Maybe nopise on DCD cound do it. The easiest way to trigger the bug is "stty -f /dev/ttyd0" while there is a login blocked in open on ttyd0. >>> No locks? No processes using it. Okay, this is uncool. >>> And yet "ktrace tip com1" and "kdump -f ktrace.out" clearly show: >>> >>> 50461 tip CALL open(0x8059030,0x6,0) >>> 50461 tip NAMI "/dev/cuad0" >>> 50461 tip RET open -1 errno 16 Device busy >> >> This isn't very useful. A ktrace on the process that's locking the file >> would be. :-P > > See above. I can't find it. :-( You might need to start ktracing very early to locate the original problematic open/open/close sequence. >>> NOTE: at this time I am suspecting that CD is being misread (it's not >>> present - I have a break out box on the line) and that this problem is >>> somehow tied to that. This problem appears at random after login has >>> exerted itself on the system. I've disabled the getty on ttyd0 and login >>> has timed out, but it continues to show "device busy". >> >> How did you disable the getty? Was this prior to or after a restart? It >> sounds like /etc/ttys is maybe running a process on it. You need to >> "killall -HUP init" after changing /etc/ttys. But you probably already >> know that. > > Yes, I change "on" to "off" in /etc/ttys and "kill -1 1" :-) Killing all processes sleeping in serial device open unwedges the port for the bug that I know about (provided the close doesn't hang). This and making the open succeed by raising DCD in hardware are the only ways that I know of to unwedge the port once the open gets stuck. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060724154856.I58894>