From owner-freebsd-fs Mon Sep 11 14:55:45 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 4B8B437B43C; Mon, 11 Sep 2000 14:55:40 -0700 (PDT) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id OAA16115; Mon, 11 Sep 2000 14:54:59 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp01.primenet.com, id smtpdAAAoMaOCF; Mon Sep 11 14:54:54 2000 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id OAA18763; Mon, 11 Sep 2000 14:55:27 -0700 (MST) From: Terry Lambert Message-Id: <200009112155.OAA18763@usr09.primenet.com> Subject: Re: CFR: nullfs, vm_objects and locks... (patch) To: bp@butya.kz (Boris Popov) Date: Mon, 11 Sep 2000 21:55:27 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG, dillon@FreeBSD.ORG, semenu@FreeBSD.ORG, tegge@FreeBSD.ORG In-Reply-To: from "Boris Popov" at Sep 05, 2000 06:02:19 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Last few days I've spent trying make nullfs really functional and > stable. There are many issues with the current nullfs code, but below I'll > try to outline the most annoying ones. > > The first one, is an inability to handle mmap() operation. This > comes from the VM/vnode_pager design where each vm_object associated with > a single vnode and vise versa. Looking at the problem in general one may > note, that stackable filesystems may have either separated vm_object per > layer or don't have it at all. Since nullfs essentially maps its vnodes to > underlying filesystem, it is reasonable to map all operations to > underlying vnode. I had a similar approach, which uses only one additional call: struct vnode *VOP_FINALVP(struct vnode *vp); When called on a vnode, it returns the real backing object, instead of a higher level shadow in a stack. Upper level vnodes do not have backing store associated with them. My approach, and the one you have put forward, are both flawed, if you try to move beyond the simple case of a 1:1 correspondance between stacking layers and underlying objects. That is, if we have anything more complex than a page in the final disk image equalling a page in a process address space, then there is a need for intermediate backing object(s). The most obvious case for this would be a compressing stacking layer, where the backing pages and the process address space pages are algorithmically related, but not identical. Similar cases to this one are metadata stuffing (say you take the first 1k of the file for an intermediate layer to enable access control lists, etc.), cryptographic stacks, and transformational stacks (example: an NFS client that maps 8859-1 files into 16 bit Unicode data, transparently). It seems to me that a hybrid approach is required, with explicit coherency calls between layers, at least for the non-correspondance cases, and with something like your approach (or mine) as an optimization, for the simple case. What this means is putting some of the pre-unified VM and buffer cache synchronization points back into the VFS consumer layers: the system call layer, and the NFS client layer. The simplest approach to resolving this is to provide a pager that implements VOP_{GET|PUT}PAGES using the read and write primitives; this would be used in intermediate layers which have their own backing objects in buffer cache/swap, but no disk backing object in an on-disk file system. > P.S. Two hours ago Sheldon Hearn told me that Tor Egge and Semen Ustimenko > worked together on the nullfs problem, but since discussion were private I > didn't know anything about it and probably stepped on their to toes with > my recent cleanup commit :( The code which I have seen on this subject works using the explicit coherency synchronization between backing objects. Unlike the approach in your patches, there is a duplicate backing object. It was my understanding that there was a cache coherency issue for devices that may be mounted after having a null layer stacked on them; specifically, the devices are vnodes, and have their own vm_object_t associated with them, and thus their own pages. From playing around with the patches Tor Egge had provided, I was able to demonstrate coherency failures in a number of circumstances, and it was not at all clear to me that msync() and fsync() would operate as expected. I was able to cause a number of supposedly "synchronized" file systems to fail, one catastrophically (doing a shutdown of a system with a nullfs mounted over /dev, with an FS named /A mounted on a device visible through the nullfs) when it spammed my root partition (not the /A partition!). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message