Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2008 21:23:41 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Jan Harkes <jaharkes@cs.cmu.edu>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Coda on FreeBSD problem reports?
Message-ID:  <20080118211556.T46437@fledge.watson.org>
In-Reply-To: <20080118210621.GF7898@cs.cmu.edu>
References:  <18CC5A4A2AC36D7FF57615EE@ganymede.hub.org> <478AF6BC.8050604@highperformance.net> <20080114142124.Y55696@fledge.watson.org> <20080116085630.GA32361@pappardelle.tekno.chalmers.se> <20080117080359.U51764@fledge.watson.org> <20080118073445.GA30721@pappardelle.tekno.chalmers.se> <20080118095652.GC30721@pappardelle.tekno.chalmers.se> <20080118103952.D18977@fledge.watson.org> <20080118210621.GF7898@cs.cmu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 18 Jan 2008, Jan Harkes wrote:

> On Fri, Jan 18, 2008 at 11:10:26AM +0000, Robert Watson wrote:
>> This is likely a VM interaction involving either an improperly managed or
>> unmanaged VM object for Coda vnodes.
>
> That sounds right. I haven't looked at vobjects and how they are manager, 
> didn't even know FreeBSD had these.
>
> It sounds a lot like the i_mapping/address_space in the Linux kernel and if 
> these are even slightly similar, we would want to share the vobject between 
> the Coda vnode and the cache/container vnode.

In FreeBSD, as in Mach from which the FreeBSD VM was derived, a VM object is 
what holds cached pages for a file.  VM objects are managed by a pager, and in 
the case of vnode-backed VM objects, this is the vnode pager 
(src/sys/vm/vnode_pager.c).  When a memory mapping is created, the VM object 
is referenced, and whenever it needs to fill a page, the vnode pager loads the 
page using VOP_READ(), and when it gets bored (i.e., msync, memory pressure), 
it will write them back out using VOP_WRITE().  Due to the magic of a merged 
VM/buffer cache, it's actually the same memory as used in the buffer cache, so 
if you do write() on the file, it is visible to mmap() and vice versa for a 
write via the memory mapping.  Vnodes can float around without VM objects, but 
they can't be mapped without one, so normally we set up a VM object on open(), 
and then don't GC the VM object until the vnode references hit zero and the 
vnode falls out of memory.

>> loosely guess the former if cache vnodes are reused between Coda vnodes.
>
> Cache vnodes are reused, but under very specific conditions, and for other 
> reasons we are going to switch to unlinking / recreating them.

This sounds like a generally good and safe idea.

>> However, sharing makes more sense in other ways, as it means there won't be 
>> data cache coherency problems between the Coda and cache VM objects if both 
>> are written too simultaneously (or even not simultaneously, given that when 
>> there's little memory pressure, pages hang around for a long time).
>
> We never write simultaneously because of the session semantics + whole file 
> caching. When we get an open the Coda using application is blocked until we 
> know that all data has been copied to the cache file before we hand the 
> reference to the cache file back to the kernel. But we don't actually sync 
> the dirty pages to disk so if the Coda vnode uses it's own vobject it would 
> miss the few dirty pages that are still associated with the cache vnode's 
> vobject. It is also a huge performance benefit for a lot of short lived 
> files which are unlinked before their dirty pages have even hit the disk.
>
> So sharing these definitely seems like the cleaner solution.

Two things to be aware of:

(1) If the VM object is the one of the cache vnode, then when the page is read
     or written to disk, it will bypass the Coda VOP's and go directly to the
     cache VOP's, since the cache vnode VM object uses the cache vnode's vnode
     operation vector.

(2) Be aware that memory mappings can persist beyond close() -- i.e., you can
     open() a file, mmap() it, and then close() it.  This means that writes can
     happen "later", and since it's hitting the cache vnode operations rather
     than the Coda ones, you won't get an explicit notification.

I've not tested it, but the attached patch may do something like what you 
want.  I have some reservations about this approach, though, due to the above 
concerns.

Robert N M Watson
Computer Laboratory
University of Cambridge

Index: coda_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/fs/coda/coda_vnops.c,v
retrieving revision 1.78
diff -u -r1.78 coda_vnops.c
--- coda_vnops.c	13 Jan 2008 14:44:02 -0000	1.78
+++ coda_vnops.c	17 Jan 2008 15:22:12 -0000
@@ -244,6 +244,8 @@
      if (error) {
      	printf("coda_open: VOP_OPEN on container failed %d\n", error);
  	return (error);
+    } else {
+	(*vpp)->v_object = vp->v_object;
      }
  /* grab (above) does this when it calls newvnode unless it's in the cache*/

@@ -747,6 +749,8 @@

      CODADEBUG(CODA_INACTIVE, myprintf(("in inactive, %s, vfsp %p\n",
  				  coda_f2s(&cp->c_fid), vp->v_mount));)
+
+    vp->v_object = NULL;

      /* If an array has been allocated to hold the symlink, deallocate it */
      if ((coda_symlink_cache) && (VALID_SYMLINK(cp))) {
@@ -1552,7 +1556,7 @@
      cache_purge(vp);
      coda_free(VTOC(vp));
      vp->v_data = NULL;
-    vnode_destroy_vobject(vp);
+    vp->v_object = NULL;
      return (0);
  }




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080118211556.T46437>