From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 13:42:40 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3422116A4DB for ; Tue, 9 Mar 2004 13:42:40 -0800 (PST) Received: from mxsf11.cluster1.charter.net (mxsf11.cluster1.charter.net [209.225.28.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id A24FC43D31 for ; Tue, 9 Mar 2004 13:42:39 -0800 (PST) (envelope-from ups@tree.com) Received: from stups.com ([209.187.143.11])i29La7Dx091585; Tue, 9 Mar 2004 16:36:07 -0500 (EST) (envelope-from ups@tree.com) Received: from tree.com (localhost [127.0.0.1]) by stups.com (8.9.3/8.9.3) with ESMTP id QAA05388; Tue, 9 Mar 2004 16:36:07 -0500 Message-Id: <200403092136.QAA05388@stups.com> X-Mailer: exmh version 2.0.2 To: rick@snowhite.cis.uoguelph.ca In-Reply-To: Message from rick@snowhite.cis.uoguelph.ca <200403052250.RAA68715@snowhite.cis.uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 09 Mar 2004 16:36:07 -0500 From: Stephan Uphoff cc: freebsd-fs@freebsd.org Subject: Re: newnfsd's stuck on "ufs" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2004 21:42:40 -0000 Hi, This sound familiar. Somewhere your NFSv4 server forgets to unlock a vnode. Since ufs uses a recursive lock - this does not block the thread forgetting the unlock. I solved similar problems (netbsd, home made lock manager and file system) by adding "No vnode lock held" assertions to the nfs server loop. Let me know if I can help - I am really interested in seeing NFSv4 in FreeBSD. Stephan > I'm working away at porting my NFSv4 server to FreeBSD5.2. It goes along > ok for a while, but when doing several ops concurrently, most of the nfsd > threads end up stuck sleeping on "ufs" as shown by the attached "ps axl". > (The amusing part is that, once all but one thread is stuck, the last > thread works fine. In other words, the "fix" is to only run one newnfsd:-) > > Anybody happen to know off the top of head, what I've screwed up? > > Thanks for any hints, rick > --- ps axl of newnfsd --- > 0 523 522 0 4 0 1192 736 nfsd I ?? 0:05.34 newnfsd: ser > 0 524 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 525 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 526 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 527 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 528 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 529 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 530 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 531 522 0 -4 0 1192 736 ufs D ?? 0:00.02 newnfsd: ser > 0 532 522 0 4 0 1192 736 nfsd I ?? 0:00.01 newnfsd: ser > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 14:06:24 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67E1316A4CE for ; Tue, 9 Mar 2004 14:06:24 -0800 (PST) Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by mx1.FreeBSD.org (Postfix) with ESMTP id 451D643D2F for ; Tue, 9 Mar 2004 14:06:24 -0800 (PST) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (dumaguete.citi.umich.edu [141.211.133.51]) by citi.umich.edu (Postfix) with ESMTP id 853182095F; Tue, 9 Mar 2004 17:06:23 -0500 (EST) To: openafs-devel@openafs.org, freebsd-fs@freebsd.org From: Jim Rees Date: Tue, 09 Mar 2004 17:06:23 -0500 Sender: rees@citi.umich.edu Message-Id: <20040309220623.853182095F@citi.umich.edu> Subject: OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2004 22:06:24 -0000 Here is a patch to OpenAFS that makes the FreeBSD 5.2 client work. It also breaks the 4.x client. I will commit this to OpenAFS if I can get it to work on 4.x. Garrett Wollman did all the work, I just cleaned it up a bit and fixed it so it would still build (but not necessarily run) on OpenBSD and 4.x. http://www.citi.umich.edu/u/rees/fbsd.diff From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 16:26:46 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8535416A4CE for ; Tue, 9 Mar 2004 16:26:46 -0800 (PST) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2F62143D1F for ; Tue, 9 Mar 2004 16:26:46 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i2A0QjqG028739; Tue, 9 Mar 2004 19:26:45 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20040309220623.853182095F@citi.umich.edu> References: <20040309220623.853182095F@citi.umich.edu> Date: Tue, 9 Mar 2004 19:26:44 -0500 To: Jim Rees , openafs-devel@openafs.org, freebsd-fs@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 00:26:46 -0000 At 5:06 PM -0500 3/9/04, Jim Rees wrote: >Here is a patch to OpenAFS that makes the FreeBSD 5.2 client work. >It also breaks the 4.x client. I will commit this to OpenAFS if I >can get it to work on 4.x. > >Garrett Wollman did all the work, I just cleaned it up a bit and >fixed it so it would still build (but not necessarily run) on >OpenBSD and 4.x. > >http://www.citi.umich.edu/u/rees/fbsd.diff I downloaded the latest snapshot of the openafs repository, applied this patch, and: export MAKE=/usr/local/bin/gmake /configure --with-afs-sysname=i386_fbsd_50 --enable-transarc-paths gmake gmake install I then added a few config files in /usr/vice/etc that I'm used to seeing from installations on other platforms. Is there an /etc/rc-type file for what is supposed to happen next? I copied /usr/vice/etc/libafs.ko to /boot/kernel, and typed: kldload libafs.ko /usr/local/sbin/afsd -stat 1200 -dcache 800 -daemons 3 -volumes 70 and the system panic'ed on me right after the AFS cache scan: Versionstring: FreeBSD 5.2-CURRENT #0: Sun Mar 7 00:53:31 EST 2004 root@santropez.netel.rpi.edu:/usr/obj/usr/src/sys/Dual-Athlon2k Panicstring: mtx_lock() of spin mutex (null) @ /usr/src/sys/kern/vfs_subr.c:19 This is on a dual-CPU system. Might there be problems with that, or did I just screw up the AFS startup? -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 18:25:56 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A24B616A4CE for ; Tue, 9 Mar 2004 18:25:56 -0800 (PST) Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by mx1.FreeBSD.org (Postfix) with ESMTP id 83F8443D39 for ; Tue, 9 Mar 2004 18:25:56 -0800 (PST) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net [66.93.1.248]) by citi.umich.edu (Postfix) with ESMTP id 7097420EEB; Tue, 9 Mar 2004 21:25:55 -0500 (EST) To: Garance A Drosihn From: Jim Rees In-Reply-To: Garance A Drosihn, Tue, 09 Mar 2004 19:26:44 EST Date: Tue, 09 Mar 2004 21:25:55 -0500 Sender: rees@citi.umich.edu Message-Id: <20040310022555.7097420EEB@citi.umich.edu> cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 02:25:56 -0000 The very first thing you should do is rebuild with the correct sysname. You can't run a 5.0 client on 5.2. Also, I see your kernel was not built in /usr/src/sys. You might have to use the --with-bsd-kernel-headers flag to configure. However, I wouldn't be at all suprised if there were mp problems. It's never been tested. If you haven't already, try building a kernel with WITNESS and DDB. There is not yet an install procedure. I've been using the package builder from src/packaging/OpenBSD but it needs a few tweaks. I'll be working on this when I can. Thank you for the prompt feedback and please keep me informed. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 18:40:25 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C15516A4CE for ; Tue, 9 Mar 2004 18:40:25 -0800 (PST) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8381443D2D for ; Tue, 9 Mar 2004 18:40:24 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i2A2eNqG018010; Tue, 9 Mar 2004 21:40:24 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20040310022555.7097420EEB@citi.umich.edu> References: <20040310022555.7097420EEB@citi.umich.edu> Date: Tue, 9 Mar 2004 21:40:22 -0500 To: Jim Rees From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 02:40:25 -0000 At 9:25 PM -0500 3/9/04, Jim Rees wrote: >The very first thing you should do is rebuild with the correct >sysname. You can't run a 5.0 client on 5.2. Oops. I went with i386_fbsd_50 because that was most-likely one listed in the README file. >Also, I see your kernel was not built in /usr/src/sys. You might >have to use the --with-bsd-kernel-headers flag to configure. I don't understand this comment. Everything for my kernel is under /usr/src/sys... Or do you mean I have to build openafs under /usr/src/sys? >Thank you for the prompt feedback and please keep me informed. I'm trying another build with the right sysname right now... (with the source in /usr/src/sys/openafs) -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 19:08:27 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7BAD316A4CE for ; Tue, 9 Mar 2004 19:08:27 -0800 (PST) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37C9743D31 for ; Tue, 9 Mar 2004 19:08:27 -0800 (PST) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost.nic.fr [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id i2A38PDa002424 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Tue, 9 Mar 2004 22:08:26 -0500 (EST) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id i2A38PYS002421; Tue, 9 Mar 2004 22:08:25 -0500 (EST) (envelope-from wollman) Date: Tue, 9 Mar 2004 22:08:25 -0500 (EST) From: Garrett Wollman Message-Id: <200403100308.i2A38PYS002421@khavrinen.lcs.mit.edu> To: Garance A Drosihn In-Reply-To: References: <20040310022555.7097420EEB@citi.umich.edu> X-Spam-Score: -19.8 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES X-Scanned-By: MIMEDefang 2.37 cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 03:08:27 -0000 < said: > I don't understand this comment. Everything for my kernel is > under /usr/src/sys... Or do you mean I have to build openafs > under /usr/src/sys? Actually, the real issue (which the patches don't address at all) is that you absolutely must build the kernel parts of OpenAFS (or any kernel module, for that matter) against the correct kernel option headers for the kernel you plan to use. The Makefile simply assumes that you are using GENERIC (or are sufficiently similar to GENERIC as makes no difference). If you dig into the Makefile you will see an explicit reference to sys/${arch}/compile/GENERIC; you need to update that to point to wherever your kernel was compiled. -GAWollman From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 21:03:41 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7885816A4CE for ; Tue, 9 Mar 2004 21:03:41 -0800 (PST) Received: from smtp4.server.rpi.edu (smtp4.server.rpi.edu [128.113.2.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 15C2343D39 for ; Tue, 9 Mar 2004 21:03:41 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp4.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i2A53eqg032466; Wed, 10 Mar 2004 00:03:40 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <200403100308.i2A38PYS002421@khavrinen.lcs.mit.edu> References: <20040310022555.7097420EEB@citi.umich.edu> <200403100308.i2A38PYS002421@khavrinen.lcs.mit.edu> Date: Wed, 10 Mar 2004 00:03:38 -0500 To: Garrett Wollman From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 05:03:41 -0000 At 10:08 PM -0500 3/9/04, Garrett Wollman wrote: >< said: > >> I don't understand this comment. Everything for my kernel is >> under /usr/src/sys... Or do you mean I have to build openafs >> under /usr/src/sys? > >Actually, the real issue (which the patches don't address at all) is >that you absolutely must build the kernel parts of OpenAFS (or any >kernel module, for that matter) against the correct kernel option >headers for the kernel you plan to use. The Makefile simply assumes >that you are using GENERIC (or are sufficiently similar to GENERIC >as makes no difference). If you dig into the Makefile you will see an >explicit reference to sys/${arch}/compile/GENERIC; you need to update >that to point to wherever your kernel was compiled. Well, if it was doing that, then who knows *what* set of options I was compiling it with the last time! Well, I had just gone through a buildworld/buildkernel cycle, and I have nothing under /usr/src/sys/i386/compile . It looks like my opt_global.h file is sitting at: /usr/obj/usr/src/sys/Dual-Athlon2k/opt_global.h And I have a 'cvs checkout' of the present openafs source sitting at /usr/cvs/openafs. So, this time I did: cd /usr/src/sys rm -Rf openafs cp -rp /usr/cvs/openafs . cd openafs export MAKE=/usr/local/bin/gmake patch -p0 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D1A3D16A4CE for ; Tue, 9 Mar 2004 21:15:03 -0800 (PST) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8936943D2D for ; Tue, 9 Mar 2004 21:15:03 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i2A5F2qG008978; Wed, 10 Mar 2004 00:15:02 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: References: <20040310022555.7097420EEB@citi.umich.edu> <200403100308.i2A38PYS002421@khavrinen.lcs.mit.edu> Date: Wed, 10 Mar 2004 00:15:01 -0500 To: Garrett Wollman From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 05:15:03 -0000 At 12:03 AM -0500 3/10/04, Garance A Drosihn wrote: > ...lots of stuff... > kldload libafs.ko > /usr/local/sbin/afsd -stat 1200 -dcache 800 -daemons 3 -volumes 70 > >And it seemed to come up okay! I then did a: > cd /afs > ls -ltr >and it paniced. But I think that's because the CellServDB file >that I'm using does not match the root.cell for rpi.edu. I'll >do some more testing. The CellServDB I used was from my Mac, and only lists a few cells. I rebooted, and this time I started afsd with: /usr/local/sbin/afsd -stat 1200 -dcache 800 -daemons 3 \ -volumes 70 -dynroot -afsdb and it seems to be working correctly. I can 'klog' to my RPI userid, and then poke around all my private files in AFS @rpi. I was also able to 'umount' /afs correctly. It's encouraging to see it get this far! Thanks for the extra tips. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Tue Mar 9 21:24:23 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE06C16A4CE for ; Tue, 9 Mar 2004 21:24:23 -0800 (PST) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id A955F43D2D for ; Tue, 9 Mar 2004 21:24:23 -0800 (PST) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost.nic.fr [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id i2A5OMDa002991 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Wed, 10 Mar 2004 00:24:22 -0500 (EST) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id i2A5OM9M002988; Wed, 10 Mar 2004 00:24:22 -0500 (EST) (envelope-from wollman) Date: Wed, 10 Mar 2004 00:24:22 -0500 (EST) From: Garrett Wollman Message-Id: <200403100524.i2A5OM9M002988@khavrinen.lcs.mit.edu> To: Garance A Drosihn In-Reply-To: References: <20040310022555.7097420EEB@citi.umich.edu> <200403100308.i2A38PYS002421@khavrinen.lcs.mit.edu> X-Spam-Score: -19.8 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES X-Scanned-By: MIMEDefang 2.37 cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 05:24:24 -0000 < said: > and it seems to be working correctly. I can 'klog' to my RPI > userid, and then poke around all my private files in AFS @rpi. > I was also able to 'umount' /afs correctly. It's encouraging > to see it get this far! Thanks for the extra tips. Once you get enough activity to start to recycle vnodes (and AFS vcache entries) it will probably fall over pretty fast. Adding WITNESS and DEBUG_VFS_LOCKS may make the bugs more obvious. -GAWollman From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 00:25:04 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A62AC16A4CE for ; Wed, 10 Mar 2004 00:25:04 -0800 (PST) Received: from habarber.pdc.kth.se (p38.kthopen.kth.se [130.237.5.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 245B843D3F for ; Wed, 10 Mar 2004 00:24:59 -0800 (PST) (envelope-from haba@pdc.kth.se) Received: from localhost (localhost [127.0.0.1]) by habarber.pdc.kth.se (8.12.10/8.12.10) with ESMTP id i2A8P0Pq024593; Wed, 10 Mar 2004 09:25:00 +0100 Date: Wed, 10 Mar 2004 09:24:59 +0100 (MET) Message-Id: <20040310.092459.122314167.haba@pdc.kth.se> To: wollman@khavrinen.lcs.mit.edu From: Harald Barth In-Reply-To: <200403100524.i2A5OM9M002988@khavrinen.lcs.mit.edu> References: <200403100524.i2A5OM9M002988@khavrinen.lcs.mit.edu> X-Mailer: Mew version 4.0.62 on Emacs 21.3.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: openafs-devel@openafs.org Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 08:25:04 -0000 > Once you get enough activity to start to recycle vnodes (and AFS > vcache entries) it will probably fall over pretty fast. Adding > WITNESS and DEBUG_VFS_LOCKS may make the bugs more obvious. run-tests -fast -all from arla will probably find some of the bugs. Harald. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 06:51:27 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 83D1216A4CE for ; Wed, 10 Mar 2004 06:51:27 -0800 (PST) Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by mx1.FreeBSD.org (Postfix) with ESMTP id 675F643D41 for ; Wed, 10 Mar 2004 06:51:27 -0800 (PST) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net [66.93.1.248]) by citi.umich.edu (Postfix) with ESMTP id 76DE520820; Wed, 10 Mar 2004 09:51:26 -0500 (EST) To: freebsd-fs@freebsd.org, openafs-devel@openafs.org From: Jim Rees In-Reply-To: Garrett Wollman, Tue, 09 Mar 2004 22:08:25 EST Date: Wed, 10 Mar 2004 09:51:27 -0500 Sender: rees@citi.umich.edu Message-Id: <20040310145126.76DE520820@citi.umich.edu> Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 14:51:27 -0000 I suppose in the short term we need yet another configure (or make) option to specify the location of the kernel build. But I really detest this solution. I would much rather be able to build a single module that would work with any kernel built from a given set of sources. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 15:21:58 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 27BCB16A4CE for ; Wed, 10 Mar 2004 15:21:58 -0800 (PST) Received: from cliffclavin.cs.rpi.edu (cliffclavin.cs.rpi.edu [128.213.1.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB56843D2F for ; Wed, 10 Mar 2004 15:21:57 -0800 (PST) (envelope-from crossd@cs.rpi.edu) Received: from 128.213.50.12 (kiki.cs.rpi.edu [128.213.50.12]) i2ANLqn7093313 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Mar 2004 18:21:52 -0500 (EST) From: "David E. Cross" To: wronkm@cs.rpi.edu, crossd@cs.rpi.edu, moorthy@cs.rpi.edu, freebsd-fs@freebsd.org Content-Type: text/plain Message-Id: <1078960907.4345.20.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.3 Date: 10 Mar 2004 18:21:52 -0500 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.37 Subject: JUFS update, and questions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 23:21:58 -0000 Journaled UFS Technology Description As many are aware we have been keenly interested in Journaling for the UFS filesystem. This is intended to bring people up to date on design decisions that we have made, progress, and to solicit help for problems that we are facing. In the design of this system we consulted many different implementations of journaled filesystems, including ext3fs, reiser, XFS, and JFS. We also received an implementation of an incomplete but highly functional journaled UFS implementation. From these we have attempted to construct a "best-of-breed" solution. >From our review we selected methods based on those used by JFS and XFS due to their relative simplicity and performance and similarity to the journaled UFS implementation that we have. A brief description of this is as follows: There exists on disk, in the root of each filesystem a file called .journal Upon r/w mount this file is verified to have the following characteristics: 1) mode: -r-------- 2) user: 0, group: 0 3) flags: noschg 4) all blocks allocated (no sparse blocks, no frags) (1) 5) That the journal is empty, and that the first entry is a checkpoint. (empty meaning NOT null, but there are no operations that need to be committed) The system then saves the vnode/inode reference to this file and has a hook in chflags that prevents modification to that vnode/inode during operation. The code prevents r/w mounting of the filesystem unless the above conditions are met. The format of the journal is roughly as follows: Each block (FS blocksize) has this format. Block { Header { Magic Number Version Transaction ID of this block Last transaction ID committed Length of Header # of transactions in block Options Field Checksum } Transaction { Opcode operand } (repeat for number in header) } In addition to this on-disk representation the system maintains an in-core journal. The in-core provides a buffer mechanism such that each operation does not force a sync write. The format of the in-core is roughly as follows: journal { current transaction ID Last Transaction committed ID first in-core entry last in-core entry first on-disk entry last on-disk entry mutex-pointer buffer } Every operation then has its information placed in the buffer, when the buffer becomes full it is flushed to disk, when disk is full it is read back and committed. Periodically during periods of light disk IO there will be a heartbeat kernel process that will force commits of all buffered data, on disk and in core. One of the opcodes defined is the NOP. Its format is: Opcode Operand 0x0000 length(16bit), data(arbitrary) Aside from debugging, this is used as a checkpoint function, after a commit the journal will write a blank journal entry out stating that this is transaction "N", and transaction "N-1" was the last committed. This is also done on umount. Journaling will be a mount option, and has so far been defined as MNT_JOURNAL 0x00800000 (2), this flag will trigger the checks mentioned at the beginning. The kernel will _not_ replay the journal in the event of an unclean mount, this will be handled by fsck for at least the following situation: Handle moving between the journaled and non-journaled options, due to either (lack of) specifying mount flags, or different compiled options. For example: Admin mounts /usr/home with "-o journal", system crashes, system comes back up and /etc/fstab has not been updated to include the "journal" flag, admin later realizes this and remounts /usr/home with the appropriate flag. If fsck did not handle the journal syncing then the FS would be "repaired" by fsck on the reboot after the crash, and the kernel would then attempt to re-repair the data from the journal log and be referencing a potentially MUCH older version of the filesystem database. (3) fsck will also ensure that the journal file meets the requirements listed, specifically it will update the journal file itself to include the checkpoint if needed. fsck's operation in brief will be as follows: 1) scan the journal file for the highest numbered transaction ID 2) Read in number of the last completed transaction from that block 3) Rescan the journal for the lowest transaction ID after that one. 4) begin replaying in order until highest transaction ID is reached. 5) write the checkpoint transaction and mark the filesystem clean. Unmounting of the filesystem will include a full commit of the journal (in-core and on-disk), and a write of the checkpoint opcode to the first journal block. Given the nature of what we are doing (and how), its incompatible to mount a filesystem both journaled and softdept-ed, our code will prevent an admin/user from trying to do both at once with a deny message, it will not just silently fail. Issues that we are having now include how and when to increment the transaction ID. The transaction IDs are used to group operations together such that related operations are completed together, and to guarantee replay-safeness. For example a rename(2) is a combination of a link and an unlink. So it works something like this: TID=5 rename(2) call made TID++ link (opcode tagged with TID 6) unlink (opcode tagged with TID 6) TID=6 Later, when this is flushed to disk the system will make sure that all opcodes with the same TID are written, and not split across blocks. The TID in the header of the block will be the TID of the last opcode in that block. So that it then becomes a super-transaction of all of them (potentially thousands of smaller transactions). An unlink would be similar to this (assuming no processes holding the file open, and a link count of 1) TID=6 unlink(2) call made TID++ unlink inode update (link_cnt--) inode update (free) truncate TID=7 Assuming a flush to disk now would have the following: Header { TID = 7 , count=6, lastTID=5 } opcodes { link unlink <--- these were the rename(2) unlink inode update inode update truncate } This block could then be safely replayed multiple times (Think situation of a crash where this had been committed but the checkpoint not written, fsck would then replay this since it could not know that it was already done) These examples are relatively easy, what we are running into problems with is things that bypass the vfs layer. An example is mmaping of a sparse file, a write access to the middle of the file could trigger a large number of updates. Inode changes, direct block allocations, indirect block allocations, and fragment promotions. In this situation, and in our model, how and where would we increment the transaction ID? Notes: (1) I do not know how to actually do this within the kernel, pointers here would be appreciated. (2) This currently conflicts with MNT_IGNORE. Is this a problem? What should we use? (3) There is another problem here, files that were held open when the system crashed. They could have a reference count of zero, but still have allocated data. It seems that an fsck would still be required to walk the inode tables and put these files "somewhere", or just free the blocks they were using. Can anyone think of a better way to do this? -- David E. Cross From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 15:54:50 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4AFE216A4CE for ; Wed, 10 Mar 2004 15:54:50 -0800 (PST) Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC7C443D1F for ; Wed, 10 Mar 2004 15:54:47 -0800 (PST) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (sccrmhc13) with ESMTP id <200403102354460160070kkde>; Wed, 10 Mar 2004 23:54:47 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id PAA81657; Wed, 10 Mar 2004 15:55:55 -0800 (PST) Date: Wed, 10 Mar 2004 15:55:54 -0800 (PST) From: Julian Elischer To: "David E. Cross" In-Reply-To: <1078960907.4345.20.camel@kiki.cs.rpi.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: moorthy@cs.rpi.edu cc: wronkm@cs.rpi.edu Subject: Re: JUFS update, and questions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2004 23:54:50 -0000 On 10 Mar 2004, David E. Cross wrote: > Journaled UFS Technology Description > [... much good stuff deleted] Not requests for features, just requests as to whether you have considerred these.. Does it have the ability to keep the journal on a separate media? I have sometimes seen the ability to have a separate journal disk used to good effect. (not a system filesystem, but a journalled database). Having a separate journal file/disk elsewhere can speed things up by reducing seeks (and other resiliance advantages). I have also seen double logging and remote logging... each of which of course has advantages and disadvantages.. Remote logging allows the log to be "replayed" at real time in teh remote site, leading to an instantaneously correct remote backup/mirror of the local disk. (of course it can not be safely accessed except with special safety requirements.. (e.g ability to shoot an open vnode if teh inode under it is rewritten) I notice also that you store pre/post stuff and wonder if this can be used in conjunction with soft-update's need to sometimes roll-back things on the disk? julian From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 16:16:14 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4122516A4CE for ; Wed, 10 Mar 2004 16:16:14 -0800 (PST) Received: from cliffclavin.cs.rpi.edu (cliffclavin.cs.rpi.edu [128.213.1.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id DAFC543D1D for ; Wed, 10 Mar 2004 16:16:13 -0800 (PST) (envelope-from crossd@cs.rpi.edu) Received: from 128.213.50.12 (kiki.cs.rpi.edu [128.213.50.12]) i2B0G9n7094426 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Mar 2004 19:16:09 -0500 (EST) From: "David E. Cross" To: Julian Elischer In-Reply-To: References: Content-Type: text/plain Message-Id: <1078964168.4345.27.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.3 Date: 10 Mar 2004 19:16:09 -0500 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.37 cc: freebsd-fs@freebsd.org cc: moorthy@cs.rpi.edu cc: wronkm@cs.rpi.edu Subject: Re: JUFS update, and questions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Mar 2004 00:16:14 -0000 On Wed, 2004-03-10 at 18:55, Julian Elischer wrote: > On 10 Mar 2004, David E. Cross wrote: > > > Journaled UFS Technology Description > > > [... much good stuff deleted] > > Not requests for features, just requests as to whether you have > considerred these.. > > Does it have the ability to keep the journal on a separate media? > > I have sometimes seen the ability to have a separate journal disk used > to good effect. (not a system filesystem, but a journalled database). > > Having a separate journal file/disk elsewhere can speed things up by > reducing seeks (and other resiliance advantages). > I have also seen double logging and remote logging... each of which of > course has advantages and disadvantages.. > > Remote logging allows the log to be "replayed" at real time in teh > remote site, leading to an instantaneously correct remote > backup/mirror of the local disk. (of course it can not be safely > accessed except with special safety requirements.. (e.g ability > to shoot an open vnode if teh inode under it is rewritten) > > > I notice also that you store pre/post stuff and wonder if this can be > used in conjunction with soft-update's need to sometimes roll-back > things on the disk? It has been thought about, and certainly the design would make it trivial at a later point to add. The system does all of the work through vnodes and struct bufs, so they could be backed by "anything" in the future (well, within reason). But for right now its beyond the scope of the project. I am not familiar enough with softupdates and how it functions to even begin to comment on what is or is not possible. We've taken the assumption that they are in no way compatible with each other, and one _or_ the other will be in use, but not both. Once we have some beta-ish code that we can distribute I am sure you can disable the checks (it will just be at mount-time) and see what happens. -- David E. Cross From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 19:27:35 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 14CC916A4CE for ; Wed, 10 Mar 2004 19:27:35 -0800 (PST) Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by mx1.FreeBSD.org (Postfix) with ESMTP id E74B143D1D for ; Wed, 10 Mar 2004 19:27:34 -0800 (PST) (envelope-from rees@citi.umich.edu) Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net [66.93.1.248]) by citi.umich.edu (Postfix) with ESMTP id 09652207DF; Wed, 10 Mar 2004 22:27:34 -0500 (EST) To: openafs-devel@openafs.org, freebsd-fs@freebsd.org From: Jim Rees In-Reply-To: Garance A Drosihn, Tue, 09 Mar 2004 21:40:22 EST Date: Wed, 10 Mar 2004 22:27:34 -0500 Sender: rees@citi.umich.edu Message-Id: <20040311032734.09652207DF@citi.umich.edu> Subject: Re: [OpenAFS-devel] OpenAFS for FreeBSD 5.2 patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Mar 2004 03:27:35 -0000 The FreeBSD 5.x OpenAFS client is now in the OpenAFS source repository. I added a --with-bsd-kernel-build flag to configure, but I'm not happy about it. I'd still like to find a way to eliminate it. I broke the OpenBSD client, but I'll fix that tomorrow. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 10 22:19:16 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 950CE16A4CE for ; Wed, 10 Mar 2004 22:19:16 -0800 (PST) Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F5BC43D46 for ; Wed, 10 Mar 2004 22:19:16 -0800 (PST) (envelope-from fjoe@iclub.nsu.ru) Received: from iclub.nsu.ru ([193.124.215.97] ident=root) by mx.nsu.ru with esmtp (Exim 4.30) id 1B1Jaz-0004vY-OA; Thu, 11 Mar 2004 12:23:05 +0600 Received: from iclub.nsu.ru (fjoe@localhost [127.0.0.1]) by iclub.nsu.ru (8.12.11/8.12.11) with ESMTP id i2B6J9cb035310; Thu, 11 Mar 2004 12:19:09 +0600 (NS) (envelope-from fjoe@iclub.nsu.ru) Received: (from fjoe@localhost) by iclub.nsu.ru (8.12.11/8.12.11/Submit) id i2B6J6CW035308; Thu, 11 Mar 2004 12:19:07 +0600 (NS) (envelope-from fjoe) Date: Thu, 11 Mar 2004 12:19:06 +0600 From: Max Khon To: "David E. Cross" Message-ID: <20040311061906.GA35178@iclub.nsu.ru> References: <1078964168.4345.27.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1078964168.4345.27.camel@kiki.cs.rpi.edu> User-Agent: Mutt/1.4.1i cc: freebsd-fs@freebsd.org cc: moorthy@cs.rpi.edu cc: wronkm@cs.rpi.edu cc: Julian Elischer Subject: Re: JUFS update, and questions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Mar 2004 06:19:16 -0000 Hello! On Wed, Mar 10, 2004 at 07:16:09PM -0500, David E. Cross wrote: > > > Journaled UFS Technology Description > > > > > [... much good stuff deleted] > > > > Not requests for features, just requests as to whether you have > > considerred these.. > > > > Does it have the ability to keep the journal on a separate media? > > > > I have sometimes seen the ability to have a separate journal disk used > > to good effect. (not a system filesystem, but a journalled database). > > > > Having a separate journal file/disk elsewhere can speed things up by > > reducing seeks (and other resiliance advantages). > > I have also seen double logging and remote logging... each of which of > > course has advantages and disadvantages.. > > > > Remote logging allows the log to be "replayed" at real time in teh > > remote site, leading to an instantaneously correct remote > > backup/mirror of the local disk. (of course it can not be safely > > accessed except with special safety requirements.. (e.g ability > > to shoot an open vnode if teh inode under it is rewritten) > > > > > > I notice also that you store pre/post stuff and wonder if this can be > > used in conjunction with soft-update's need to sometimes roll-back > > things on the disk? > > It has been thought about, and certainly the design would make it > trivial at a later point to add. The system does all of the work > through vnodes and struct bufs, so they could be backed by "anything" in > the future (well, within reason). But for right now its beyond the > scope of the project. Do you plan to implement data journalling? Regards, /fjoe From owner-freebsd-fs@FreeBSD.ORG Fri Mar 12 13:05:16 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C215916A4CE for ; Fri, 12 Mar 2004 13:05:16 -0800 (PST) Received: from cliffclavin.cs.rpi.edu (cliffclavin.cs.rpi.edu [128.213.1.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id 650F943D1F for ; Fri, 12 Mar 2004 13:05:16 -0800 (PST) (envelope-from crossd@cs.rpi.edu) Received: from 128.213.50.12 (kiki.cs.rpi.edu [128.213.50.12]) i2CL5En7076216 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 12 Mar 2004 16:05:15 -0500 (EST) From: "David E. Cross" To: Manuel Petit , freebsd-fs@freebsd.org In-Reply-To: <7909D8D3-7322-11D8-B222-000A95CE04DA@freston.org> References: <1078960907.4345.20.camel@kiki.cs.rpi.edu> <7909D8D3-7322-11D8-B222-000A95CE04DA@freston.org> Content-Type: text/plain Message-Id: <1079125513.4345.82.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.3 Date: 12 Mar 2004 16:05:14 -0500 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.37 Subject: Re: JUFS update, and questions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Mar 2004 21:05:17 -0000 On Thu, 2004-03-11 at 01:08, Manuel Petit wrote: > I assume that you are taking care that the blocks freed by truncate are > not > recycled until the transaction is written to the journal. Yes, though the exact mechanism of this is still under work, the idea is to mark the blocks "dirty" with the TID of the operation that freed them, upon their reallocation the journal is flushed to that TID before the call is allowed to return. > > Additionally regarding the volume not being rw-mountable if the journal > is not > empty... i like much better the way it was done in BFS: the filesystem > replayed > the journal on a rw-mount. Also you seem to avoid the topic of > ro-mounting, > for ro-mounting is reasonable to avoid mounting if the journal is not > empty > (since the journal cannot be replayed)... or it could be done like the > BeOS > bootstrapper did: it built a block relocation table that redirected > blocks > to the contents of the journal (the bootstrapper did not attempt to > replay > the log... but just to walk the fs for finding the kernel) > If the FS is not allowed to be mounted-ro until the journal is replayed then "/" could never be journaled. I also like the idea of allowing the admin to "safely" look at the filesystem before potentially damageing operations take place, if even to only do a dump/tar of the data someplace else. By "ignoring" the ro-mount I had intended to treat it the same way it is treated now by the kernel, to allow it. This makes it roughly equivalent to the RO mount of an async FS that went boom; only its fully recoverable when the FSCK runs. There may be issues here on the root-fs, and a sync journal option may be needed. I can think of a few ways to accomplish this; the easiest would be to keep re-writing the same journal block and append data onto it until its full, then move on. > > > > (3) There is another problem here, files that were held open when the > > system crashed. They could have a reference count of zero, but > > still have allocated data. It seems that an fsck would still be > > required to walk the inode tables and put these files "somewhere", > > or just free the blocks they were using. Can anyone think of a > > better way to do this? > > Yes. On unlink if the reference count is 0 relink it to a ghost > directory > that gets purged on mount. The file also gets purged when is finally > closed... it is a bit hacky since the file is linked to that directory > while > keeping its reference count to 0; but on close you know that if > reference > count is zero it is linked to the ghost directory and unlinking from it > can be handled specially. This was a potential idea that I had as well, the problem is the case of filesystem full. Consider a filesystem that is 100% full (and what better time to delete files than 100% full). To delete the file you then need to allocate a block (consider the case even if you have a pre-allocated structure, the potential need to grow this structure) to link the file to this phantom directory. The idea then becomes to just pre-allocate something that is the maximum possible size... and then isn't that just equivalent to an inode table? At this point I am not seeing the problem of just walking the inode table as that much of a problem, its _very_ quick to do that, inodes are just 256 bytes each (UFS2), and just looking for the case of refcount=0 and free!=0. What do other people think? -- David E. Cross From owner-freebsd-fs@FreeBSD.ORG Fri Mar 12 13:05:48 2004 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90F7F16A4CE for ; Fri, 12 Mar 2004 13:05:48 -0800 (PST) Received: from cliffclavin.cs.rpi.edu (cliffclavin.cs.rpi.edu [128.213.1.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id 45EF443D31 for ; Fri, 12 Mar 2004 13:05:48 -0800 (PST) (envelope-from crossd@cs.rpi.edu) Received: from 128.213.50.12 (kiki.cs.rpi.edu [128.213.50.12]) i2CL5ln7076235 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 12 Mar 2004 16:05:47 -0500 (EST) From: "David E. Cross" To: freebsd-fs@freebsd.org Content-Type: multipart/mixed; boundary="=-fhhiQwkC/bVpGhwSB2AH" Message-Id: <1079125546.4345.84.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.3 Date: 12 Mar 2004 16:05:47 -0500 X-Scanned-By: MIMEDefang 2.37 Subject: [Fwd: Re: JUFS update, and questions.] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Mar 2004 21:05:48 -0000 --=-fhhiQwkC/bVpGhwSB2AH Content-Type: text/plain Content-Transfer-Encoding: 7bit --=-fhhiQwkC/bVpGhwSB2AH Content-Disposition: inline Content-Description: Forwarded message - Re: JUFS update, and questions. Content-Type: message/rfc822 Subject: Re: JUFS update, and questions. From: "David E. Cross" To: Max Khon In-Reply-To: <20040311061906.GA35178@iclub.nsu.ru> References: <1078964168.4345.27.camel@kiki.cs.rpi.edu> <20040311061906.GA35178@iclub.nsu.ru> Content-Type: text/plain Message-Id: <1079122939.4345.41.camel@kiki.cs.rpi.edu> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.3 Date: 12 Mar 2004 15:22:19 -0500 Content-Transfer-Encoding: 7bit On Thu, 2004-03-11 at 01:19, Max Khon wrote: > Do you plan to implement data journalling? > > Regards, Not in the initial release. There is an opcode reserved, however there are a few issues that I am not able to address currently. 1) The journal is based on disk blocks (filesystem blocks to be specific), a write of a complete block of data won't fit in a FSblocksize. It should be trivial to relax this limitation. The other is the case of a write(v)(2) that is larger than the journal itself. In this case the only thing I can think of is to force flush the entire journal and make the entire operation sync at that point.... but... this is something to be handled after the initial release (though perhaps relaxing the per-block restriction would be a good thing to do "now"). Note that data journaling is not the panacea to data integrity. The system does not know what writes need to be grouped together (consider a database operation that writes the new record and then the index, those are 2 separate writes, there is no way for the kernel to know they need to be completed together atomically; the database application needs its own method of transactions to guarantee integrity, the only thing the journal could provide is the notion that writes occur in-order). -- David E. Cross --=-fhhiQwkC/bVpGhwSB2AH--