From owner-freebsd-arch@FreeBSD.ORG Sun Jul 5 14:11:55 2009 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48E6F1065670; Sun, 5 Jul 2009 14:11:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id D7B688FC16; Sun, 5 Jul 2009 14:11:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-161-96.carlnfd1.nsw.optusnet.com.au (c122-106-161-96.carlnfd1.nsw.optusnet.com.au [122.106.161.96]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n65EBp4B013112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 6 Jul 2009 00:11:52 +1000 Date: Mon, 6 Jul 2009 00:11:51 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Alexander Motin In-Reply-To: <4A50667F.7080608@FreeBSD.org> Message-ID: <20090705223126.I42918@delplex.bde.org> References: <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: gary.jennejohn@freenet.de, freebsd-arch@FreeBSD.org Subject: Re: DFLTPHYS vs MAXPHYS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2009 14:11:55 -0000 On Sun, 5 Jul 2009, Alexander Motin wrote: > Gary Jennejohn wrote: >> On Sat, 04 Jul 2009 22:14:53 +0300 >> Alexander Motin wrote: >> >>> Can somebody explain me a difference between DFLTPHYS and MAXPHYS >>> constants? As I understand, the last one is a maximal amount of memory, >>> that can be mapped to the kernel, or passed to the hardware drivers. But >>> why then DFLTPHYS is used in so many places and what does it mean? >> >> There's a pretty good comment on these in /sys/conf/NOTES. > > But it does not explains why. DFLTPHYS is the default -- the size to be used when the correct size is not known. However, this is mostly broken: - the correct size should always be known at a low level. You have to know the maximum size for a device to know that this size is larger than the default, else using the default size won't work. Also, you have to know that the default size is a multiple of the minimum size. Both of these are usually true accidentally, so things sort of work. - the default size is defaulted inconsistently. Geom hides the device maximum i/o size (d_maxsize, which is normally either 64K or DFLTPHYS which happen to be the same) from the top level of devices (it reblocks if necessary so that sizes up to (s_iosize_max, which is always MAXPHYS) work, so it is difficult to see the the low-level size or to use an i/o size that is a multiple of the device maximum i/o size if the latter is not a divisor or MAXPHYS. This means that hard-coding MAXPHYS would work best in most places above the driver level, but most places have a mess of buggy layering (mnt_iosize_max is supposed to default to DFLTPHYS and then be changed to si_iosize_max when the latter is known, but some file systems forget to do this). >>> Isn't it a time to review their values for increasing? 64KB looks funny, >>> comparing to modern memory sizes and data rates. It just increases >>> interrupt rates, but I don't think it really need to be so small to >>> improve interactivity now. 64K is large enough to bust modern L1 caches and old L2 caches. Make the size bigger to bust modern L2 caches too. Interrupt rates don't matter when you are transfering 64K items per interrupt. >> I wonder whether all drivers can correctly handle larger values for >> DFLTPHYS. Most can't, since their hardware can't. They can fake it (ata used to) but there is negative point in this for most drivers, since geom already reblocks for disk devices and reblocking would be wrong for devices like tapes. > There are always will be drivers/devices with limitations. They should just > be able to report that limitations to system. This is possible with GEOM, but > it doesn't looks tuned well for all providers. There are many places, when > DFLTPHYS used just with hope that it will work. IMHO if driver unable to > adapt to any defined DFLTPHYS value, it should not use it, but instead should > announce some specific value that it really supports. cam scsi devices seem to be the only important ones that still hard-code d_maxsize to DFLTPHYS. Strangely, pre-cam scsi had the beginnings (or remnants) of more sophisticated i/o size limiting. In FreeBSD-1, it has an xxminphys() function for every scsi device. I think it was supposed to be possible to ask any device for any i/o size, and minphys was used for reblocking at a low level. minphys was only implemented for scsi drivers and wasn't part of the physio() as in Net/2 (?). For the aha1542 driver, minphys was: % void % ahaminphys(bp) % struct buf *bp; % { % /* aha seems to explode with 17 segs (64k may require 17 segs) */ % /* on old boards so use a max of 16 segs if you have problems here */ % if (bp->b_bcount > ((AHA_NSEG - 1) * PAGESIZ)) { % bp->b_bcount = ((AHA_NSEG - 1) * PAGESIZ); % } % } FreeBSD-1 doesn't have DFLTPHYS, and barely uses MAXPHYS. MAXPHYS was 64K. I think MAXBSIZE = 64K limited most transfers. However, physio() uses a buffer of size 256K, larger than it does today!, so apparently, device drivers were responsible for lots of reblocking. In the wd driver, the reblocking consisted of doing 1 512-block at a time (I think it didn't even do multiple sectors per interrupt then). Bruce