From owner-freebsd-current  Fri Jul 24 09:00:52 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA02919
          for freebsd-current-outgoing; Fri, 24 Jul 1998 09:00:52 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA02810
          for <freebsd-current@FreeBSD.ORG>; Fri, 24 Jul 1998 09:00:29 -0700 (PDT)
          (envelope-from luoqi@watermarkgroup.com)
Received: (from luoqi@localhost)
	by lor.watermarkgroup.com (8.8.8/8.8.8) id LAA21864;
	Fri, 24 Jul 1998 11:58:12 -0400 (EDT)
	(envelope-from luoqi)
Date: Fri, 24 Jul 1998 11:58:12 -0400 (EDT)
From: Luoqi Chen <luoqi@watermarkgroup.com>
Message-Id: <199807241558.LAA21864@lor.watermarkgroup.com>
To: bde@zeta.org.au, green@zone.baldcom.net, jkh@time.cdrom.com,
        luoqi@watermarkgroup.com
Subject: Re: vn subsystem
Cc: bright@hotjobs.com, freebsd-current@FreeBSD.ORG, joelh@gnu.org
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >I took a look at this problem, I found there're some bugs in VMIO code
> >when dealing with buf at a non-page-aligned blkno, e.g. reading one page
> >size of data at block 1 from a block device, as Brian Feldman's core dump
> >shows, since the buf does not start at a page bounary, it should span
> >two pages, yet only one page is allocated in the current code, and
> >subsequent write to the 2nd page would result in a fault. I took a shot
> >at fixing this problem, resulted in the patch below. Would any knowledgeable
> >person please take a look at the patch? I've found no ill effect so far
> 
> I don't think the bug can be fixed at this level.  The size of a B_VMIO
> buffer is supposed to be a multiple of PAGE_SIZE.  Smaller buffers are
> supposed to be malloced.  msdosfs_mount() only gets as far as having
> misaligned blkno's because of incomplete cleanup from a previous (usually
> failed) mount.  (IIRC, vp->v_object (where vp is the vnode for the block
> device) is not cleared even when all references to vp go away, and this
> somehow causes use of a stale block size.)
> 
>From my understanding of the code, multiple of DEV_BSIZE but not PAGE_SIZE
is supported through the valid and dirty bitmap in vm_page structure. That's
why VMIO for a block device is possible. BTW, VMIO bufs cannot be malloced,
there is a check in allocbuf() that panic's when it sees one.

The msdosfs_mount() was actually a victim of a failed FFS mount. FFS mount
enables VMIO on the block device, and the effect is permanent even when
the mount fails. MSDOSFS needs non-page-aligned block bufs, for one,
FAT starts at block 1, and in fact it was reading the FAT blocks that killed
msdosfs_mount(). Normally MSDOSFS operates on a non-VMIO block device.

> I think the correct fix is to get rid of the stale v_object and improve
> the block size guessing (don't guess).
> 
> I'm not sure what the deblocking stuff in allocbuf() is for.  Is it only
> for NFS?  FFS with its >= 4K block size never goes near any of the
> complications there.

I don't know what the initial intention for the deblocking stuff was. It
may well be designed just for NFS, but it makes possible the handling of
non-page-aligned bufs, so why don't take advantage of that:) And for
NFS' sake, we want to have these bugs fixed. The portion of the code that 
handles aligning buffer cache and its vm pages are well localized in a couple
of functions in vfs_bio.c, I have good confidence that I understand the code
well. I hope people could try out the patch (of course, I will correct the
overflow problems, I'm completely clueless about all these different sized
integers:).

-lq

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message