From owner-freebsd-bugs@FreeBSD.ORG Sun Mar 11 11:20:10 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E7343106564A for ; Sun, 11 Mar 2012 11:20:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C11C98FC12 for ; Sun, 11 Mar 2012 11:20:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q2BBKArU009011 for ; Sun, 11 Mar 2012 11:20:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q2BBKAd3009010; Sun, 11 Mar 2012 11:20:10 GMT (envelope-from gnats) Resent-Date: Sun, 11 Mar 2012 11:20:10 GMT Resent-Message-Id: <201203111120.q2BBKAd3009010@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Joel Ray Holveck Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABB271065672 for ; Sun, 11 Mar 2012 11:13:34 +0000 (UTC) (envelope-from joelh@thor.piquan.org) Received: from thor.piquan.org (unknown [IPv6:2001:470:1f05:1741:201:2ff:fe8b:103e]) by mx1.freebsd.org (Postfix) with ESMTP id 59C2B8FC1A for ; Sun, 11 Mar 2012 11:13:34 +0000 (UTC) Received: from thor.piquan.org (localhost [127.0.0.1]) by thor.piquan.org (8.14.5/8.14.5) with ESMTP id q2BBDXIx024870 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 11 Mar 2012 04:13:34 -0700 (PDT) (envelope-from joelh@thor.piquan.org) Received: (from joelh@localhost) by thor.piquan.org (8.14.5/8.14.5/Submit) id q2BBDX0V024869; Sun, 11 Mar 2012 04:13:33 -0700 (PDT) (envelope-from joelh) Message-Id: <201203111113.q2BBDX0V024869@thor.piquan.org> Date: Sun, 11 Mar 2012 04:13:33 -0700 (PDT) From: Joel Ray Holveck To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: David Wolfskill Subject: kern/165927: msync reports success after a failed pager flush X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Joel Ray Holveck List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Mar 2012 11:20:11 -0000 >Number: 165927 >Category: kern >Synopsis: msync reports success after a failed pager flush >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Mar 11 11:20:10 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Joel Ray Holveck >Release: FreeBSD 8.3-PRERELEASE i386 >Organization: Juniper Networks, Inc. >Environment: System: FreeBSD thor.piquan.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #2: Sat Feb 25 15:52:16 PST 2012 root@thor.piquan.org:/usr/obj/usr/src/sys/THOR i386 >Description: When a process is writing to an mmap-backed file, under certain common circumstances, changes to data might not be properly flushed. Nevertheless, msync may report success. The bug is most easily demonstrated using NFS, so much of this description refers to NFS-based errors. However, these are only examples; the bug can apply to many other filesystems as well. If a process has an NFS-backed file mmapped in and dirties the data, there are several common circumstances under which it might not be properly flushed. The bug in kern/165923 is one situation, in which the backing file is written with the wrong uid, leading to a return of NFSERR_ACCES. Another client might delete the file, making the server return NFSERR_STALE. Formerly (e.g., in 8.2-RELEASE), this would cause the client's VM subsystem to go into an infinite loop: the client would attempt to flush to the server, the server returns an error, the client leaves the pages on the dirty list but still needs to flush them, repeat ad infinitum. In r223054 (on stable/8; MFC r222586), this behavior was changed: the VM system marks the pages as clean to avoid this type of loop. However, this comes with its own set of problems. As an example, consider the case where a process is gathering data into an mmap-backed datastore. The process gathers some data into the datastore. While this is happening, another client changes the ownership or mode of the file. Next, the syncer daemon attempts to flush the datastore, but since it fails, the pages are marked as clean. The data-gathering process later runs msync, and since the pages are "clean" (according to the client's VM system), msync returns success. However, the data has never been written to disk. >How-To-Repeat: See the program in the "How-To-Repeat" section of kern/165923. If kern/165923 has not yet been fixed, then that program will demonstrate the bug by itself using the instructions in that PR: note that the pages are not written, but msync returns success. Alternately, the program can still demonstrate the bug, but with more effort. Make sure that both WAIT_FOR_SYNC and DO_MMAP are turned on. As the client program sleeps during the WAIT_FOR_SYNC interval, on the server run "chattr uchg backing-store". (A chmod won't be sufficient on a FreeBSD 8.2 server, but might be on others.) Be quick; you have to do this before the client's syncer flushes the file, which will happen within 0-30 seconds. (If kern/165923 has not been fixed, then you don't have to hurry; the syncer can't save the file.) Either wait for the sleep to return, or press ^C (which will stop the sleep and continue with the call to msync). Observe (using "od -X" or similar) that the file's contents will not have changed, but the msync succeeded. This indicates that msync(2) is a necessary, but NOT sufficient, way for a process to verify that mapped files are flushed. The idea that it's necessary is contrary to the documentation in the msync(2) and mmap(2) man pages. The fact that it's not sufficient is contrary to POSIX's assertion that msync may be used "for synchronized I/O data integrity completion" (and more explicit verbage in the informative sections; cf ), which is the subject of the present PR. >Fix: The VM system currently (as of r222586) marks pages that cannot be written as clean. Instead, the VM object should be made unavailable (unmapped, set VM_PROT_NONE, or similar), so that later memory accesses raise a SIGSEGV and msyncs return EINVAL (or ENOMEM according to POSIX.1-2008). While this means that the program will almost certainly exit with an error, that is appropriate, since its write did fail. (This is also similar to what happens if a swap drive fails.) This bug is most visible in conjunction with kern/165923, since that bug causes the sort of failure that triggers the bug currently under discussion. However, they are independent. As described in How-To-Repeat, an analogous situation that can with NFSERR_STALE. >Release-Note: >Audit-Trail: >Unformatted: