From owner-freebsd-hackers@FreeBSD.ORG  Tue Feb 12 13:11:35 2008
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2517116A469;
	Tue, 12 Feb 2008 13:11:35 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au
	[211.29.132.199])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F37813C448;
	Tue, 12 Feb 2008 13:11:34 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m1CDBSUA008800
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 13 Feb 2008 00:11:32 +1100
Date: Wed, 13 Feb 2008 00:11:28 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
In-Reply-To: <4299.1202816854@critter.freebsd.dk>
Message-ID: <20080212234018.O92415@delplex.bde.org>
References: <4299.1202816854@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Tue, 12 Feb 2008 13:45:56 +0000
Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org,
	Pav Lucistnik <pav@freebsd.org>, Andriy Gapon <avg@icyb.net.ua>,
	Bruce Evans <bde@zeta.org.au>
Subject: Re: fs/udf: vm pages "overlap" while reading large dir 
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Feb 2008 13:11:35 -0000

On Tue, 12 Feb 2008, Poul-Henning Kamp wrote:

> In message <47B181F2.2070808@icyb.net.ua>, Andriy Gapon writes:
>
>> 2.3. this code passes to bread blkno that is calculated as 4*sector,
>> where sector is a number of a physical 2048-byte sector. [**]
>> [**] - I think that this is a requirement of buffcache system, because
>> internally it performs many calculations that seem to assume that block
>> size is always 512.
>
> Yes, the buf-cache works in 512 bytes units throughout.

I missed the dbtob() conversions in vfs_bio.c when I replied previously
So blkno cannot be a cookie; it must be for a 512-block.  So how did
the cases where bsize != DEV_BSIZE in getblk() ever work?

>> 3.1. for a fresh buf getlbk would assign the following:
>> bsize = bo->bo_bsize;
>> offset = blkno * bsize;
>
> That's clearly wrong.

If units were always 512-blocks, then anything except bsize = DEV_BSIZE
would be clearly wrong.  Things aren't that simple (but probably should
be).  Even RELENG_3 has bsize = f_iosize (possibly != 512) for non-disks.
That seems to include nfs(client).  In fact, nfs_getcacheblk() does
weird scaling which seems to be mainly to compensate for for weird scaling
here.  It calls getblk() with a bn arg that seems to be f_iosize units.
Then at then end, for the VREG case, it sets bp->b_blkno to this bn
scaled to normal DEV_BSIZE units.  bp->b_blkno seems to have DEV_BSIZE
units for all uses of it in nfs.

> We need to assert that the blkno is aligned to the start of a sector
> and use the 512 byte units, so I guess it would be:
>
>    offset = dbtob(blkno);
>    KASSERT(!(offset & (bsize - 1)), ("suitable diagnostic"));

Barely worth checking.

The current bug has nothing to do with this.  The offset is just wrong
at this point, after using a scale factor that is inconsistent with the
units of blkno, for all (?) disk (and other?) file systems whose sector
size isn't 512.

Bruce