From owner-freebsd-hackers Mon Jan 4 20:45:51 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA26754 for freebsd-hackers-outgoing; Mon, 4 Jan 1999 20:45:51 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA26747 for ; Mon, 4 Jan 1999 20:45:50 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id UAA91309; Mon, 4 Jan 1999 20:45:23 -0800 (PST) (envelope-from dillon) Date: Mon, 4 Jan 1999 20:45:23 -0800 (PST) From: Matthew Dillon Message-Id: <199901050445.UAA91309@apollo.backplane.com> To: Terry Lambert Cc: dg@root.com, hackers@FreeBSD.ORG Subject: Re: vfs_bio / struct buf References: <199901050428.VAA06606@usr05.primenet.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> :so couldn't really be replaced by something that has 512 byte granularity :> :without losing some performance. Granted, applications that show this :> :behavior are probably broken, but that's another issue. :> :> Ah. Hmmm. I see the problem... the buf's need some sort of native block :> size and NFS doesn't really have a native block size. : :Not to contrdict David, but I was under the impression that :the reason for this code was not necessarily the read-before-write :avoidance on small, unaligned regions, but was actually for the :avoidance on aligned block sized or multiple of block size regions :being written. The theory being that if you wrote a fragment of a :NFS buffer size and did this sseveral times that you could just write :it and not read at all. Mostly or database stuff, if I recall :correctly. : :There's actually a byte field that's unused as far as I can tell to :allow page granularity down to PAGE_SIZE/8 to be bitmapped for :validity within a given page, for similar reasons. It isn't unused! The valid and dirty bits are definitely used (and have a DEV_BSIZE granularity). For lots of things. For example, the MSDOS filesystem. The problem is that that is the best granularity that a vm_page_t can have. The validoff/validend/dirtyoff/dirtyend stuff was thrown into the bp because DEV_BSIZE'd granularity isn't good enough for NFS when you might be reading or writing just a few bytes. Read-before-write isn't the real problem, though the optimization certainly fixes that. The real problem is when you have multiple machines doing an lseek()/write() on the same file. The write() granularity must be correct or the machines will screw each other up even though they aren't writing to the same byte ranges (but are writing to the same block). :I went looking at this code when I had an MSDOS FS that used :1K blocks, but was not aligned on an even 1K boundary from the :start of the device (odd cylinder size on the physical disk), :which mean that every 4th 1K block spanned a page boundary :(with obvious performance degradation during random access). : : Terry Lambert : terry@lambert.org heh. That should be fixed now with Luoqi's commits. The bp system now understands DEV_BSIZE'd alignment properly in (hopefully) all cases. As long as it is at least 512-byte aligned it should work. I wouldn't worry about performance degredation there too much - it's all just mapping already-cached pages into bp's, but I haven't looked at it with a microscope so I can't say that for absolute sure. -Matt :--- :Any opinions in this posting are my own and not those of my present :or previous employers. : :To Unsubscribe: send mail to majordomo@FreeBSD.org :with "unsubscribe freebsd-hackers" in the body of the message : Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message