Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Jul 2023 11:25:10 -0700
From:      alan somers <asomers@gmail.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Should close() release locks atomically?
Message-ID:  <CAOtMX2jg4%2B1m%2BnZ80FrgZ5h0h_pZ4eda879C21-oZ4oZwUMzmA@mail.gmail.com>
In-Reply-To: <CAOtMX2j1JRUjcYkUcZj-r=UUSdzB5Fk8_R1ihVH31BRQwPHa2g@mail.gmail.com>
References:  <CAOtMX2jjKyj5JNkEXh7_UsEQLkuhpfmybht7gDwQR64BQzAXrQ@mail.gmail.com> <ZJX6c1LcDU97E7z8@kib.kiev.ua> <CAOtMX2jRkyv%2Bs21%2Bdcx16GjiEuVrF_c_X=%2B5r02hMLTrwxZ=Pw@mail.gmail.com> <ZJYFGa6oOVQxOqEk@kib.kiev.ua> <CAOtMX2iqaC3YUAPtxjLHPjujJUYuYX98YyhhFv7Jy5cb-QfvBg@mail.gmail.com> <CAOtMX2j1JRUjcYkUcZj-r=UUSdzB5Fk8_R1ihVH31BRQwPHa2g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, Jun 24, 2023 at 8:29 AM Alan Somers <asomers@freebsd.org> wrote:
>
> On Fri, Jun 23, 2023 at 1:53 PM Alan Somers <asomers@freebsd.org> wrote:
> >
> > On Fri, Jun 23, 2023 at 1:48 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >
> > > On Fri, Jun 23, 2023 at 01:11:34PM -0700, Alan Somers wrote:
> > > > On Fri, Jun 23, 2023 at 1:03 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jun 23, 2023 at 12:00:36PM -0700, Alan Somers wrote:
> > > > > > The close() syscall automatically releases locks.  Should it do so
> > > > > > atomically or is a delay permitted?  I can't find anything in our man
> > > > > > pages or the open group specification that says.
> > > > > >
> > > > > > The distinction matters when using O_NONBLOCK.  For example:
> > > > > >
> > > > > > fd = open(..., O_DIRECT | O_EXLOCK | O_NONBLOCK); //succeeds
> > > > > > // do some I/O
> > > > > > close(fd);
> > > > > > fd = open(..., O_DIRECT | O_EXLOCK | O_NONBLOCK); //fails with EAGAIN!
> > > > > >
> > > > > > I see this error frequently on a heavily loaded system.  It isn't a
> > > > > > typical thread race though; ktrace shows that only one thread tries to
> > > > > > open the file in question.  From the ktrace, I can see that the final
> > > > > > open() comes immediately after the close(), with no intervening
> > > > > > syscalls from that thread.  It seems that close() doesn't release the
> > > > > > lock right away.  I wouldn't notice if I weren't using O_NONBLOCK.
> > > > > >
> > > > > > Should this be considered a bug?  If so I could try to come up with a
> > > > > > minimal test case.  But it's somewhat academic, since I plan to
> > > > > > refactor the code in a way that will eliminate the duplicate open().
> > > > > What type of the object is behind fd?  O_NONBLOCK affects open itself.
> > > > > We release flock after object close method, but before close(2) returns.
> > > >
> > > > This is a plain file on ZFS.
> > >
> > > Can you write a self-contained example, and check the same issue e.g. on
> > > tmpfs?
> >
> > I just reproduced it on tmpfs.  A minimal test case will take some more time...
>
> I'm afraid that I haven't been successful in creating a minimal test
> case.  My original test case, while it reliably reproduces the
> problem, is huge.  I'm sorry, but I think I'm going to declare ENOTIME
> and get back to the aforementioned refactoring.

I've finally succeeded in writing a minimal test case.  The critical
piece I was missing before was that other threads were forking in the
background.  Even though the file is opened O_CLOEXEC, the child
process briefly keeps it locked.  However, the file ought to get
unlocked whenever _either_ that parent calls close() or the child
calls fdcloseexec.  So I don't understand how it could fail to get
unlocked.  I've posted the test case to Bugzilla.  Let's move
discussion there.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272367

-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jg4%2B1m%2BnZ80FrgZ5h0h_pZ4eda879C21-oZ4oZwUMzmA>