From owner-freebsd-current Fri Jul 24 09:00:52 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA02919 for freebsd-current-outgoing; Fri, 24 Jul 1998 09:00:52 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA02810 for ; Fri, 24 Jul 1998 09:00:29 -0700 (PDT) (envelope-from luoqi@watermarkgroup.com) Received: (from luoqi@localhost) by lor.watermarkgroup.com (8.8.8/8.8.8) id LAA21864; Fri, 24 Jul 1998 11:58:12 -0400 (EDT) (envelope-from luoqi) Date: Fri, 24 Jul 1998 11:58:12 -0400 (EDT) From: Luoqi Chen Message-Id: <199807241558.LAA21864@lor.watermarkgroup.com> To: bde@zeta.org.au, green@zone.baldcom.net, jkh@time.cdrom.com, luoqi@watermarkgroup.com Subject: Re: vn subsystem Cc: bright@hotjobs.com, freebsd-current@FreeBSD.ORG, joelh@gnu.org Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >I took a look at this problem, I found there're some bugs in VMIO code > >when dealing with buf at a non-page-aligned blkno, e.g. reading one page > >size of data at block 1 from a block device, as Brian Feldman's core dump > >shows, since the buf does not start at a page bounary, it should span > >two pages, yet only one page is allocated in the current code, and > >subsequent write to the 2nd page would result in a fault. I took a shot > >at fixing this problem, resulted in the patch below. Would any knowledgeable > >person please take a look at the patch? I've found no ill effect so far > > I don't think the bug can be fixed at this level. The size of a B_VMIO > buffer is supposed to be a multiple of PAGE_SIZE. Smaller buffers are > supposed to be malloced. msdosfs_mount() only gets as far as having > misaligned blkno's because of incomplete cleanup from a previous (usually > failed) mount. (IIRC, vp->v_object (where vp is the vnode for the block > device) is not cleared even when all references to vp go away, and this > somehow causes use of a stale block size.) > >From my understanding of the code, multiple of DEV_BSIZE but not PAGE_SIZE is supported through the valid and dirty bitmap in vm_page structure. That's why VMIO for a block device is possible. BTW, VMIO bufs cannot be malloced, there is a check in allocbuf() that panic's when it sees one. The msdosfs_mount() was actually a victim of a failed FFS mount. FFS mount enables VMIO on the block device, and the effect is permanent even when the mount fails. MSDOSFS needs non-page-aligned block bufs, for one, FAT starts at block 1, and in fact it was reading the FAT blocks that killed msdosfs_mount(). Normally MSDOSFS operates on a non-VMIO block device. > I think the correct fix is to get rid of the stale v_object and improve > the block size guessing (don't guess). > > I'm not sure what the deblocking stuff in allocbuf() is for. Is it only > for NFS? FFS with its >= 4K block size never goes near any of the > complications there. I don't know what the initial intention for the deblocking stuff was. It may well be designed just for NFS, but it makes possible the handling of non-page-aligned bufs, so why don't take advantage of that:) And for NFS' sake, we want to have these bugs fixed. The portion of the code that handles aligning buffer cache and its vm pages are well localized in a couple of functions in vfs_bio.c, I have good confidence that I understand the code well. I hope people could try out the patch (of course, I will correct the overflow problems, I'm completely clueless about all these different sized integers:). -lq To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message