Date: Tue, 23 Mar 2010 16:16:01 +0200 From: Andriy Gapon <avg@freebsd.org> To: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Cc: Bruce Evans <bde@zeta.org.au> Subject: on st_blksize value Message-ID: <4BA8CD21.3000803@freebsd.org>
next in thread | raw e-mail | index | archive | help
First, what I am proposing: --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -790,11 +790,11 @@ vn_stat(vp, sb, active_cred, file_cred, td) * to file" * Default to PAGE_SIZE after much discussion. * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more correct. */ - sb->st_blksize = PAGE_SIZE; + sb->st_blksize = max(PAGE_SIZE, vap->va_blocksize); sb->st_flags = vap->va_flags; if (priv_check(td, PRIV_VFS_GENERATION)) sb->st_gen = 0; else Explanation: 1. IMO it is not nice that we totally ignore va_blocksize value that can be set by a filesystem. This takes away flexibility. That va_blocksize value might really turn out to be optimal given the filesystem implementation. 2. As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some case this might not be optimal (which I personally doubt), but at least nothing should get broken. One practical benefit can be with ZFS: if a filesystem has recordsize > PAGE_SIZE (e.g. default 128K) and it has checksums or compression enabled, then (over-)writing in blocks smaller than recordsize would require reading of a whole record first. And some applications do use st_blksize as a hint (just for the record: some other use f_iosize instead, and yet some use a hardcoded value). BTW, some torrent-like applications can serve as a good example of applications that overwrite chunks of existing files. Additionally, here's a little bit of history that explains the PAGE_SIZE ("much discussion") comment in vn_stat. It seems that the comment may be misleading nowadays. It was introduced in r89784 and at that time it applied only to the case of non-VREG and non-vn_isdisk vnodes. Then, almost 3 years later, in revision 136966 code for VREG vnodes and vn_isdisk vnodes was dropped, the XXX comment was introduced, and we ended up with the current state of matters. BTW, I am not sure about the XXX comment either. Using bo_bsize may be a nice shortcut, but it would also take away some flexibility. Filesystems can already set bo_bsize and va_blocksize to the same value, but there could be special cases where they not need be the same. Thanks a lot for opinions and suggestions! P.S. Yes, I have read the following interesting thread _completely_: http://lists.freebsd.org/pipermail/freebsd-fs/2007-May/003155.html And this one too: http://freebsd.monkey.org/freebsd-fs/200810/msg00059.html Unfortunately, the discussions didn't result in any action. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BA8CD21.3000803>