Date: Tue, 4 Jul 2023 11:25:10 -0700 From: alan somers <asomers@gmail.com> To: Alan Somers <asomers@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Should close() release locks atomically? Message-ID: <CAOtMX2jg4%2B1m%2BnZ80FrgZ5h0h_pZ4eda879C21-oZ4oZwUMzmA@mail.gmail.com> In-Reply-To: <CAOtMX2j1JRUjcYkUcZj-r=UUSdzB5Fk8_R1ihVH31BRQwPHa2g@mail.gmail.com> References: <CAOtMX2jjKyj5JNkEXh7_UsEQLkuhpfmybht7gDwQR64BQzAXrQ@mail.gmail.com> <ZJX6c1LcDU97E7z8@kib.kiev.ua> <CAOtMX2jRkyv%2Bs21%2Bdcx16GjiEuVrF_c_X=%2B5r02hMLTrwxZ=Pw@mail.gmail.com> <ZJYFGa6oOVQxOqEk@kib.kiev.ua> <CAOtMX2iqaC3YUAPtxjLHPjujJUYuYX98YyhhFv7Jy5cb-QfvBg@mail.gmail.com> <CAOtMX2j1JRUjcYkUcZj-r=UUSdzB5Fk8_R1ihVH31BRQwPHa2g@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jun 24, 2023 at 8:29=E2=80=AFAM Alan Somers <asomers@freebsd.org> w= rote: > > On Fri, Jun 23, 2023 at 1:53=E2=80=AFPM Alan Somers <asomers@freebsd.org>= wrote: > > > > On Fri, Jun 23, 2023 at 1:48=E2=80=AFPM Konstantin Belousov <kostikbel@= gmail.com> wrote: > > > > > > On Fri, Jun 23, 2023 at 01:11:34PM -0700, Alan Somers wrote: > > > > On Fri, Jun 23, 2023 at 1:03=E2=80=AFPM Konstantin Belousov <kostik= bel@gmail.com> wrote: > > > > > > > > > > On Fri, Jun 23, 2023 at 12:00:36PM -0700, Alan Somers wrote: > > > > > > The close() syscall automatically releases locks. Should it do= so > > > > > > atomically or is a delay permitted? I can't find anything in o= ur man > > > > > > pages or the open group specification that says. > > > > > > > > > > > > The distinction matters when using O_NONBLOCK. For example: > > > > > > > > > > > > fd =3D open(..., O_DIRECT | O_EXLOCK | O_NONBLOCK); //succeeds > > > > > > // do some I/O > > > > > > close(fd); > > > > > > fd =3D open(..., O_DIRECT | O_EXLOCK | O_NONBLOCK); //fails wit= h EAGAIN! > > > > > > > > > > > > I see this error frequently on a heavily loaded system. It isn= 't a > > > > > > typical thread race though; ktrace shows that only one thread t= ries to > > > > > > open the file in question. From the ktrace, I can see that the= final > > > > > > open() comes immediately after the close(), with no intervening > > > > > > syscalls from that thread. It seems that close() doesn't relea= se the > > > > > > lock right away. I wouldn't notice if I weren't using O_NONBLO= CK. > > > > > > > > > > > > Should this be considered a bug? If so I could try to come up = with a > > > > > > minimal test case. But it's somewhat academic, since I plan to > > > > > > refactor the code in a way that will eliminate the duplicate op= en(). > > > > > What type of the object is behind fd? O_NONBLOCK affects open it= self. > > > > > We release flock after object close method, but before close(2) r= eturns. > > > > > > > > This is a plain file on ZFS. > > > > > > Can you write a self-contained example, and check the same issue e.g.= on > > > tmpfs? > > > > I just reproduced it on tmpfs. A minimal test case will take some more= time... > > I'm afraid that I haven't been successful in creating a minimal test > case. My original test case, while it reliably reproduces the > problem, is huge. I'm sorry, but I think I'm going to declare ENOTIME > and get back to the aforementioned refactoring. I've finally succeeded in writing a minimal test case. The critical piece I was missing before was that other threads were forking in the background. Even though the file is opened O_CLOEXEC, the child process briefly keeps it locked. However, the file ought to get unlocked whenever _either_ that parent calls close() or the child calls fdcloseexec. So I don't understand how it could fail to get unlocked. I've posted the test case to Bugzilla. Let's move discussion there. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D272367 -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jg4%2B1m%2BnZ80FrgZ5h0h_pZ4eda879C21-oZ4oZwUMzmA>