Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jul 2006 16:57:57 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Jo Rhett <jrhett@svcolo.com>
Cc:        "Rick C. Petty" <rick-freebsd@kiwi-computer.com>, freebsd-hardware@FreeBSD.org
Subject:   Re: device busy -- no locks?
Message-ID:  <20060724154856.I58894@delplex.bde.org>
In-Reply-To: <20060721004731.GC8868@svcolo.com>
References:  <20060721000018.GA99237@svcolo.com> <20060721001607.GA64376@megan.kiwi-computer.com> <20060721004731.GC8868@svcolo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 20 Jul 2006, Jo Rhett wrote:

> Thanks for the super-quick reply!  Responses are inline...
>
> On Thu, Jul 20, 2006 at 07:16:07PM -0500, Rick C. Petty wrote:
>>> root@scapa 47# fstat /dev/ttyd0
>>> USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W NAME
>>
>> What about "fstat /dev/cuad0" ?  Anyway, I've found that fstat is useless,
>> try using sysutils/lsof instead.

This is related to a longstanding (design) bug in vfs (first-open/last-close
semantics).  vfs counts devices as being open when it calls the device
open routine, despite devices not actually being open until the device
open routine returns successfully, which may happen much later (or not
at all in the case of failure, but this doesn't cause any additional
problems).  This causes the device close routine to not be called in
some cases where it should be.  For bidirectional serial devices, not
calling the device close routine results in "callin" devices that are
sleeping in open being treated the same as "callin" devices that have
successfully completed the open.  The former shouldn't give EBUSY for
opens of the corresponding "callout" device, but the latter should and
do.

FreeBSD-1 has a hack to vfs to work around the bug, but the hack was
lost in FreeBSD-2.  I still use the hack locally.  Startting with about
FreeBSD-4, there is a D_TRACKCLOSE device flag that can be used to fix
the problem less hackishly (but still not in the right way, since it
requires individual drivers to do generic things).  I haven't got around
to using it to fix sio even locally.  D_TRACKCLOSE is mostly unused and
mostly used bogusly when it is used.

The bug rarely causes problems since it is only activated by doing
something like the following:

     thread1: "open" /dev/ttyd0.  Actually, block in open waiting for carrier.
     thread2: open /dev/ttyd0 using O_NONBLOCK to prevent blocking.
     thread2: perhaps actually use /dev/ttyd0
     thread2: "close" /dev/ttyd0.  Actually, don't complete the close due to
 	     the bogus vfs close.
     thread1: remain blocked in open through all the above.
     thread3: try to open /dev/cuaa^Hd0.  Get EBUSY because the non-open by
 	     thread1 is seen as an open.

Starting with about FreeBSD-5, there may be additional problems from races.
First-open/last-close semantics basically require opens to be synchronous.
Sleeping in open for serial device drivers gives large race windows in
which to open may race open/close of the same device in other threads.
Prempting the kernel gives small race windows.  In practice, Giant locking
limits problems.  For serial drivers, open/close should still be Giant-
locked since the whole tty subsystem is still Giant-locked.  (Note that
all vfs locks are dropped before calling device open/close.  The bogus vfs
count provides some psuedo-locking.)

Rearrangement of serial drivers in -current may have enlarged the bug,
but I can't see any enlargement except that from more serial drivers
now supporting bidirectional devices.

> Sorry, yes. Same results.  And if lsof shows things that fstat doesn't,
> then this is a bug in FreeBSD.
>
> But anyway,
> root@scapa 63# lsof /dev/cuad0
> root@scapa 64# lsof /dev/ttyd0
>
> Nada.

I think fstat and lsof can't see threads sleeping in open since the open
hasn't really completed -- the open has completed enough to confuse vfs
but not for vfs to report its confusion to userland.  It should be possible
to see threads sleeping in open using "ps -lax | grep ttydcd" ("ttydcd" is
the string for -current; the string for sio used to be "siodcd".  Grep for
"tty" and "dcd" too).  This won't distinguish between threads sleeping
normally in open (ttyopen) and ones that are in a bogus state due to a
missing close.

> Also note that this system is pretty bone stock.  Standard install, plus
> mysql and apache.  Nothing else would be using the port.  It's something
> that left it locked, and really only "login" could be the culprit.

Quite likely, but login doesn't use O_NONBLOCK so I don't know how it
could trigger the bug.  Maybe nopise on DCD cound do it.  The easiest
way to trigger the bug is "stty -f /dev/ttyd0" while there is a login
blocked in open on ttyd0.

>>> No locks? No processes using it.  Okay, this is uncool.
>>> And yet "ktrace tip com1" and "kdump -f ktrace.out" clearly show:
>>>
>>>  50461 tip      CALL  open(0x8059030,0x6,0)
>>>  50461 tip      NAMI  "/dev/cuad0"
>>>  50461 tip      RET   open -1 errno 16 Device busy
>>
>> This isn't very useful.  A ktrace on the process that's locking the file
>> would be.  :-P
>
> See above.  I can't find it. :-(

You might need to start ktracing very early to locate the original
problematic open/open/close sequence.

>>> NOTE: at this time I am suspecting that CD is being misread (it's not
>>> present - I have a break out box on the line) and that this problem is
>>> somehow tied to that.  This problem appears at random after login has
>>> exerted itself on the system.  I've disabled the getty on ttyd0 and login
>>> has timed out, but it continues to show "device busy".
>>
>> How did you disable the getty?  Was this prior to or after a restart?  It
>> sounds like /etc/ttys is maybe running a process on it.  You need to
>> "killall -HUP init" after changing /etc/ttys.  But you probably already
>> know that.
>
> Yes, I change "on" to "off" in /etc/ttys and "kill -1 1" :-)

Killing all processes sleeping in serial device open unwedges the port for
the bug that I know about (provided the close doesn't hang).  This and
making the open succeed by raising DCD in hardware are the only ways that
I know of to unwedge the port once the open gets stuck.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060724154856.I58894>