Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Apr 2015 00:15:25 +0200
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Guy Helmer <guy.helmer@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: lockf() vs. flock() -- lockf() not locking?
Message-ID:  <20150407221525.GA99106@stack.nl>
In-Reply-To: <3950D855-0F4E-49E0-A388-4C7ED102B68B@gmail.com>
References:  <3950D855-0F4E-49E0-A388-4C7ED102B68B@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 06, 2015 at 04:18:09PM -0500, Guy Helmer wrote:
> Recently an application I use switched from using flock() for advisory
> file locking to lockf() in the code that protects against concurrent
> writes to a file that is being shared and updated by multiple
> processes (not threads in a single process). The code seems reliable —
> a lock manager class opens the file & obtains the lock, then the
> read/update method opens the file using a separate file descriptor &
> reads/writes the file, flushes & closes the second file descriptor,
> and then destroys the lock manager object which unlocks the file &
> closes the first file descriptor.

> Surprisingly this simple change seems to have made the code unreliable
> by allowing concurrent writers to the file and corrupting its
> contents:

> -    if (flock(fd, LOCK_EX) != 0)
> +    if (lockf(fd, F_LOCK, 0) != 0)
>          throw std::runtime_error("Failed to get a lock of " + filename);

> . . .
>      if (fd != -1) {
> -        flock(fd, LOCK_EX);
> +        lockf(fd, F_ULOCK, 0);
>          close(fd);
>          fd = -1;
>      }

> From my reading of the lockf(3) man page and reviewing the
> implementation in lib/libc/gen/lockf.c, and corresponding code in
> sys/kern/kern_descrip.c, it appears the lockf() call should be
> successfully obtaining an advisory lock over the whole file like a
> successful flock() did. However, I have a stress test that quickly
> corrupts the target file using the lockf() implementation, and the
> test fails to cause corruption using the flock() implementation. I’ve
> instrumented the code, and it's clear that multiple processes are
> simultaneously in the block of code after the “lockf(fd, F_LOCK, 0)”
> line.

> Am I missing something obvious? Any ideas?

With lockf/fcntl locks, the close of the second file descriptor actually
already unlocks the file. If there is another close and open in there,
it would explain your problem. Both the lockf(3) and the fcntl(2) man
pages mention these strange semantics, but only fcntl(2) clearly warns
about them. With flock locks, opening the file another time will not
cause problems.

The second thing that will not work with lockf/fcntl locks is having a
child process inherit them.

Changing flock() to lockf() seems like a bad idea, particularly in a
reusable "lock manager" class, since it is then harder to see what
operations must be avoided to avoid losing the lock.

There is a proposal in the Austin Group for the next version of POSIX to
add a form of file lock that allows both range locking and proper
(flock-style) semantics.

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150407221525.GA99106>