From owner-freebsd-hackers Sat Apr 17 11:47:26 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 4BE6515205 for ; Sat, 17 Apr 1999 11:47:24 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id LAA75452; Sat, 17 Apr 1999 11:44:52 -0700 (PDT) (envelope-from dillon) Date: Sat, 17 Apr 1999 11:44:52 -0700 (PDT) From: Matthew Dillon Message-Id: <199904171844.LAA75452@apollo.backplane.com> To: David Greenman Cc: hackers@freebsd.org Subject: Directories not VMIO cached at all! Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I've been playing with my new large-memory configured box and, especially, looking at disk I/O. When I scan enough directories ( e.g. a megabyte worth of directories on a 1 GB machine), then scan again, the data is re-fetched from disk. I was under the impression that large directories were B_VMIO'd, while small ones are B_MALLOC'd in the buffer cache. But I've examined the code, and I do not believe directories are being B_VMIO'd at all! This means that the device blocks backing directories must fit within the malloc-only portion of the buffer cache. I think it would be worthwhile to VMIO directories as well as regular files. In fact, I think that it would be worthwhile to VMIO directories whether they are large or small - small directories will wind up in the namei cache allowing the VMIO backing to be freed up. Plus it is possible that a system might have thousands of small directories, and hey - we might as well use available memory to cache the ones for which the namei lookup misses ( like news related dirs ). Right now, the buffer cache appears to limit itself to 8 MBytes or so, and the maximum malloc space limits itself to only 400K! Rather absurdly small for a directory cache, I think, yet I also believe that increasing the size of the buffer cache may be detrimental due to the amount of I/O it can bind up. The buffer cache is really designed to deal with I/O and has the side effect of caching block mappings. It isn't really designed to cache things long term. If we VMIO directories, the buffer cache becomes truely transparent ( i.e. everything related to a filesystem is VMIO'd and thus backed by the VM cache ). If I understand the problem correctly, doing this should be as simple as allowing VDIR vnode types to be object-backed, as per the patch below. I would appreciate it if DG could look into this. I've applied the below patch to my 1G machine and it seems to work - my disk I/O has gone pretty much to zero. I think the performance benefits are there. The only question is whether losing malloc-caching for small directories is a win or a lose in regards to namei hits allowing VMIO-backed small directories to be freed from the VM cache. -Matt Index: kern/vfs_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.189 diff -u -r1.189 vfs_subr.c --- vfs_subr.c 1999/03/12 02:24:56 1.189 +++ vfs_subr.c 1999/04/17 18:35:56 @@ -2574,12 +2574,12 @@ vm_object_t object; int error = 0; - if ((vp->v_type != VREG) && (vp->v_type != VBLK)) + if ((vp->v_type != VREG) && (vp->v_type != VBLK) && (vp->v_type != VDIR)) return 0; retry: if ((object = vp->v_object) == NULL) { - if (vp->v_type == VREG) { + if (vp->v_type == VREG || vp->v_type == VDIR) { if ((error = VOP_GETATTR(vp, &vat, cred, p)) != 0) goto retn; object = vnode_pager_alloc(vp, vat.va_size, 0, 0); Index: kern/vfs_lookup.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_lookup.c,v retrieving revision 1.33 diff -u -r1.33 vfs_lookup.c --- vfs_lookup.c 1999/01/28 00:57:47 1.33 +++ vfs_lookup.c 1999/04/17 18:35:56 @@ -163,7 +163,7 @@ else cnp->cn_flags |= HASBUF; - if (ndp->ni_vp && ndp->ni_vp->v_type == VREG && + if (ndp->ni_vp && (ndp->ni_vp->v_type == VREG || ndp->ni_vp->v_type == VDIR) && (cnp->cn_nameiop != DELETE) && ((cnp->cn_flags & (NOOBJ|LOCKLEAF)) == LOCKLEAF)) @@ -687,7 +687,7 @@ if (!wantparent) vrele(dvp); - if (dp->v_type == VREG && + if ((dp->v_type == VREG || dp->v_type == VDIR) && ((cnp->cn_flags & (NOOBJ|LOCKLEAF)) == LOCKLEAF)) vfs_object_create(dp, cnp->cn_proc, cnp->cn_cred); Index: kern/vfs_syscalls.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.121 diff -u -r1.121 vfs_syscalls.c --- vfs_syscalls.c 1999/03/23 14:26:40 1.121 +++ vfs_syscalls.c 1999/04/17 18:35:58 @@ -1012,7 +1012,7 @@ vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p); fp->f_flag |= FHASLOCK; } - if ((vp->v_type == VREG) && (vp->v_object == NULL)) + if ((vp->v_type == VREG || vp->v_type == VDIR) && (vp->v_object == NULL)) vfs_object_create(vp, p, p->p_ucred); VOP_UNLOCK(vp, 0, p); p->p_retval[0] = indx; Index: kern/vfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_vnops.c,v retrieving revision 1.65 diff -u -r1.65 vfs_vnops.c --- vfs_vnops.c 1999/04/04 21:41:17 1.65 +++ vfs_vnops.c 1999/04/17 18:35:58 @@ -171,7 +171,7 @@ /* * Make sure that a VM object is created for VMIO support. */ - if (vp->v_type == VREG) { + if (vp->v_type == VREG || vp->v_type == VDIR) { if ((error = vfs_object_create(vp, p, cred)) != 0) goto bad; } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message