From owner-freebsd-fs@FreeBSD.ORG Mon Jun 21 20:15:55 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42945106564A; Mon, 21 Jun 2010 20:15:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 12FA98FC14; Mon, 21 Jun 2010 20:15:55 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id BD77046B29; Mon, 21 Jun 2010 16:15:54 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5710F8A04E; Mon, 21 Jun 2010 16:15:53 -0400 (EDT) From: John Baldwin To: Kostik Belousov Date: Mon, 21 Jun 2010 16:15:22 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <20100621125825.GG13238@deviant.kiev.zoral.com.ua> <201006211030.55327.jhb@freebsd.org> <20100621184928.GI13238@deviant.kiev.zoral.com.ua> In-Reply-To: <20100621184928.GI13238@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201006211615.22758.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 21 Jun 2010 16:15:53 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, alc@freebsd.org, fs@freebsd.org, pho@freebsd.org Subject: Re: Tmpfs elimination of double-copy X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2010 20:15:55 -0000 On Monday 21 June 2010 2:49:28 pm Kostik Belousov wrote: > On Mon, Jun 21, 2010 at 10:30:55AM -0400, John Baldwin wrote: > > On Monday 21 June 2010 8:58:25 am Kostik Belousov wrote: > > > Hi, > > > Below is the patch that eliminates second copy of the data kept by tmpfs > > > in case a file is mapped. Also, it removes potential deadlocks due to > > > tmpfs doing copyin/out while page is busy. It is possible that patch > > > also fixes known issue with sendfile(2) of tmpfs file, but I did not > > > verified this. > > > > > > Patch essentially consists of three parts: > > > - move of vm_object' vnp_size from the type-discriminated union to the > > > vm_object proper; > > > - making vm not choke when vm object held in the struct vnode' v_object > > > is default or swap object instead of vnode object; > > > - use of the swap object that keeps data for tmpfs VREG file, also as > > > v_object. > > > > > > Peter Holm helped me with the patch, apparently we survive fsx and stress2. > > > > Why did you have to move vnp_size out of the union? Is tmpfs using a non- > > OBJT_VNODE object to hold file data? > Tmpfs uses OBJT_SWAP object to keep the data pages for the files. > Current code allocates another object of type OBJT_VNODE, assigned > to vp->v_object, to satisfy VM interface for mapping the file, using > vnode_create_vobject. The objects do not share the pages (I do not think > this can be easily achieved without serious changes to VM). Thus most, > if not all, the data is present in two sets of pages. > > When such file is written to, tmpfs copies user buffer both to the swap > object, and to the v_object. > > Patch I posted assigns the swap object to the vp->v_object. I had to > make small change to vm_mmap_vnode() to not allocate the vnode pager > and to not increment vnode use counter when v_object is the swap > object. > > vnp_size has to be provided on the object layer because our swap > object is used to e.g. mmap the executables from tmpfs, and image > activation code relies on vnp_size instead of slower VOP_GETATTR(). > I think this route is easier then converting all vnp_size users to > VOP_GETATTR for only tmpfs benefit. Ok, thanks for the expanded explanation. :) It seems a shame to have to move vnp_size out of the pager-specific data. Maybe add a comment in vm_object.h to say that vnp_size is used by multiple object types which is why it can't be vnode-specific anymore? -- John Baldwin