From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 9 13:04:34 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 12CEE451 for ; Thu, 9 Apr 2015 13:04:34 +0000 (UTC) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D207A611 for ; Thu, 9 Apr 2015 13:04:33 +0000 (UTC) Received: by iebmp1 with SMTP id mp1so99802957ieb.0 for ; Thu, 09 Apr 2015 06:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=u6ICOL/TTCuHRxtVL9dCJWEqiRsItXUQaBzoY/RKodk=; b=CVG/dKfjBK4s7aAMvgwy2eE2eAxuaWGvm6hnGYbPgkNtpXF2t6JIYl17gx7MoZ8gIY GH/kG5O9vQ0CdN2TnNKE265SOjEX8A0qU9YlOMu8lPqQzEtKWpV3+gf1elA8mrWm+Cwk us37cmurGo06MzYSN+d1sryadZe8+4Gt6Oe1f7jx9wjykPrxlW0Hq1dMjhBfDPBIHM4L lYKpGmzx6yW1s5jhYgc/s8/uMc8EQHyZkXCwLbPIEZyhLAinWwZCJbyuRMp4weG9MW1z weiXktbqDCX1YAZIh3guoz/7QWdoR7ZeDBm0f6tgnK9L/ZlFRgn4yOOafkwAq0F3LjGN ygJg== X-Received: by 10.50.78.9 with SMTP id x9mr19664542igw.44.1428584673177; Thu, 09 Apr 2015 06:04:33 -0700 (PDT) Received: from [172.22.132.117] ([192.119.231.58]) by mx.google.com with ESMTPSA id ka2sm8879974igb.2.2015.04.09.06.04.31 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Apr 2015 06:04:31 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: lockf() vs. flock() -- lockf() not locking? From: Guy Helmer In-Reply-To: <20150407221525.GA99106@stack.nl> Date: Thu, 9 Apr 2015 08:04:31 -0500 Message-Id: <8F0FB344-3858-462F-A67D-97805379514C@gmail.com> References: <3950D855-0F4E-49E0-A388-4C7ED102B68B@gmail.com> <20150407221525.GA99106@stack.nl> To: Jilles Tjoelker X-Mailer: Apple Mail (2.2070.6) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2015 13:04:34 -0000 > On Apr 7, 2015, at 5:15 PM, Jilles Tjoelker wrote: >=20 > On Mon, Apr 06, 2015 at 04:18:09PM -0500, Guy Helmer wrote: >> Recently an application I use switched from using flock() for = advisory >> file locking to lockf() in the code that protects against concurrent >> writes to a file that is being shared and updated by multiple >> processes (not threads in a single process). The code seems reliable = =E2=80=94 >> a lock manager class opens the file & obtains the lock, then the >> read/update method opens the file using a separate file descriptor & >> reads/writes the file, flushes & closes the second file descriptor, >> and then destroys the lock manager object which unlocks the file & >> closes the first file descriptor. >=20 >> Surprisingly this simple change seems to have made the code = unreliable >> by allowing concurrent writers to the file and corrupting its >> contents: >=20 >> - if (flock(fd, LOCK_EX) !=3D 0) >> + if (lockf(fd, F_LOCK, 0) !=3D 0) >> throw std::runtime_error("Failed to get a lock of " + = filename); >=20 >> . . . >> if (fd !=3D -1) { >> - flock(fd, LOCK_EX); >> + lockf(fd, F_ULOCK, 0); >> close(fd); >> fd =3D -1; >> } >=20 >> =46rom my reading of the lockf(3) man page and reviewing the >> implementation in lib/libc/gen/lockf.c, and corresponding code in >> sys/kern/kern_descrip.c, it appears the lockf() call should be >> successfully obtaining an advisory lock over the whole file like a >> successful flock() did. However, I have a stress test that quickly >> corrupts the target file using the lockf() implementation, and the >> test fails to cause corruption using the flock() implementation. = I=E2=80=99ve >> instrumented the code, and it's clear that multiple processes are >> simultaneously in the block of code after the =E2=80=9Clockf(fd, = F_LOCK, 0)=E2=80=9D >> line. >=20 >> Am I missing something obvious? Any ideas? >=20 > With lockf/fcntl locks, the close of the second file descriptor = actually > already unlocks the file. If there is another close and open in there, > it would explain your problem. Both the lockf(3) and the fcntl(2) man > pages mention these strange semantics, but only fcntl(2) clearly warns > about them. With flock locks, opening the file another time will not > cause problems. >=20 > The second thing that will not work with lockf/fcntl locks is having a > child process inherit them. >=20 > Changing flock() to lockf() seems like a bad idea, particularly in a > reusable "lock manager" class, since it is then harder to see what > operations must be avoided to avoid losing the lock. >=20 > There is a proposal in the Austin Group for the next version of POSIX = to > add a form of file lock that allows both range locking and proper > (flock-style) semantics. Thank you! I had forgotten the text in the fcntl(2) page about = fcntl/lockf semantics (haven=E2=80=99t read that for a while). I have = verified the method in question uses the lock manager to lock the file, = opens the file to read the contents, closes the file [thus loosing the = lockf lock], does some work, and then opens & writes the file to save = the updates. Regards, Guy