From owner-freebsd-fs@FreeBSD.ORG  Fri Jun 15 15:34:52 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E1B561065670
	for <freebsd-fs@freebsd.org>; Fri, 15 Jun 2012 15:34:52 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-scalar.mail.uoguelph.ca (esa-scalar.mail.uoguelph.ca
	[66.199.40.18]) by mx1.freebsd.org (Postfix) with ESMTP id 76E938FC18
	for <freebsd-fs@freebsd.org>; Fri, 15 Jun 2012 15:34:52 +0000 (UTC)
Received: from zcs3.mail.uoguelph.ca (new.mail.uoguelph.ca [131.104.93.37])
	by esa-scalar.mail.uoguelph.ca (8.14.1/8.14.1) with ESMTP id
	q5FFYmk5017695; Fri, 15 Jun 2012 11:34:48 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0760FB4047;
	Fri, 15 Jun 2012 11:34:48 -0400 (EDT)
Date: Fri, 15 Jun 2012 11:34:48 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <1116727909.1836239.1339774488001.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120614122456.GZ2337@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Pavlo <devgs@ukr.net>
Subject: Re: mmap() incoherency on hi I/O load (FS is zfs)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jun 2012 15:34:53 -0000

Kostik wrote:
> On Thu, Jun 14, 2012 at 07:32:36AM -0400, Rick Macklem wrote:
> > Pavlo wrote:
> > > There's a case when some parts of files that are mapped and then
> > > modified getting corrupted. By corrupted I mean some data is ok
> > > (one
> > > that
> > > was written using write()/pwrite()) but some looks like it never
> > > existed.
> > > Like it was some time in buffers, when several processes
> > > simultaneously
> > > (of course access was synchronised) used shared pages and reported
> > > it's
> > > existence. But after time pass they (processes) screamed that it
> > > is
> > > now
> > > lost. Only part of data written with pwrite() was there.
> > > Everything
> > > that
> > > was written via mmap() is zero.
> > >
> > > So as I said it occurs on hi I/O busyness. When in background 4+
> > > processes do indexing of huge ammount of data. Also I want to
> > > note, it
> > > never occurred in the life of our project while we used mmap()
> > > under
> > > same I/O stress conditions when mapping was done for a whole file
> > > of
> > > just
> > > a part(header) starting from a beginning of a file. First time we
> > > used
> > > mapping of individual pages, just to save RAM, and this popped up.
> > >
> > > Solution for this problem is msync() before any munmap(). But man
> > > says:
> > >
> > > The msync() system call is usually not needed since BSD implements
> > > a
> > > coherent file system buffer cache. However, it may be used to
> > > associate
> > > dirty VM pages with file system buffers and thus cause them to be
> > > flushed
> > > to physical media sooner rather than later.
> > >
> > > Any thoughts? Thanks.
> > >
> > With a recent kernel from head, I am seeing dirty mmap'd pages being
> > written
> > quite late for the NFSv4 client. Even after the NFS client
> > VOP_RECLAIM() has
> > been called, it seems. I didn't observe this behaviour in a kernel
> > from
> > head in March. (I don't know enough about the vm/mmap area to know
> > if this
> > is correct behaviour or not?)
> >
> > I thought I'd mention this, since you didn't say how recent a kernel
> > you
> > were running and thought it might be caused by the same change?
> Can you, please, comment more on this ?
> How is this possible at all ?
> 
> Could you please show at least a backtrace for the moment when a write
> request is made for the page which belong to already reclaimed vnode ?
After some off list discussion, it was determined that my problem was
doing nfsrpc_close() before vnode_destroy_object() in the NFSv4 client's
VOP_RECLAIM(). This is an NFSv4 specific bug and wouldn't be related to
the above issue.

Sorry about the noise, rick