Date: Thu, 9 Apr 2015 08:04:31 -0500 From: Guy Helmer <guy.helmer@gmail.com> To: Jilles Tjoelker <jilles@stack.nl> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: lockf() vs. flock() -- lockf() not locking? Message-ID: <8F0FB344-3858-462F-A67D-97805379514C@gmail.com> In-Reply-To: <20150407221525.GA99106@stack.nl> References: <3950D855-0F4E-49E0-A388-4C7ED102B68B@gmail.com> <20150407221525.GA99106@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Apr 7, 2015, at 5:15 PM, Jilles Tjoelker <jilles@stack.nl> wrote: >=20 > On Mon, Apr 06, 2015 at 04:18:09PM -0500, Guy Helmer wrote: >> Recently an application I use switched from using flock() for = advisory >> file locking to lockf() in the code that protects against concurrent >> writes to a file that is being shared and updated by multiple >> processes (not threads in a single process). The code seems reliable = =E2=80=94 >> a lock manager class opens the file & obtains the lock, then the >> read/update method opens the file using a separate file descriptor & >> reads/writes the file, flushes & closes the second file descriptor, >> and then destroys the lock manager object which unlocks the file & >> closes the first file descriptor. >=20 >> Surprisingly this simple change seems to have made the code = unreliable >> by allowing concurrent writers to the file and corrupting its >> contents: >=20 >> - if (flock(fd, LOCK_EX) !=3D 0) >> + if (lockf(fd, F_LOCK, 0) !=3D 0) >> throw std::runtime_error("Failed to get a lock of " + = filename); >=20 >> . . . >> if (fd !=3D -1) { >> - flock(fd, LOCK_EX); >> + lockf(fd, F_ULOCK, 0); >> close(fd); >> fd =3D -1; >> } >=20 >> =46rom my reading of the lockf(3) man page and reviewing the >> implementation in lib/libc/gen/lockf.c, and corresponding code in >> sys/kern/kern_descrip.c, it appears the lockf() call should be >> successfully obtaining an advisory lock over the whole file like a >> successful flock() did. However, I have a stress test that quickly >> corrupts the target file using the lockf() implementation, and the >> test fails to cause corruption using the flock() implementation. = I=E2=80=99ve >> instrumented the code, and it's clear that multiple processes are >> simultaneously in the block of code after the =E2=80=9Clockf(fd, = F_LOCK, 0)=E2=80=9D >> line. >=20 >> Am I missing something obvious? Any ideas? >=20 > With lockf/fcntl locks, the close of the second file descriptor = actually > already unlocks the file. If there is another close and open in there, > it would explain your problem. Both the lockf(3) and the fcntl(2) man > pages mention these strange semantics, but only fcntl(2) clearly warns > about them. With flock locks, opening the file another time will not > cause problems. >=20 > The second thing that will not work with lockf/fcntl locks is having a > child process inherit them. >=20 > Changing flock() to lockf() seems like a bad idea, particularly in a > reusable "lock manager" class, since it is then harder to see what > operations must be avoided to avoid losing the lock. >=20 > There is a proposal in the Austin Group for the next version of POSIX = to > add a form of file lock that allows both range locking and proper > (flock-style) semantics. Thank you! I had forgotten the text in the fcntl(2) page about = fcntl/lockf semantics (haven=E2=80=99t read that for a while). I have = verified the method in question uses the lock manager to lock the file, = opens the file to read the contents, closes the file [thus loosing the = lockf lock], does some work, and then opens & writes the file to save = the updates. Regards, Guy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8F0FB344-3858-462F-A67D-97805379514C>