Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Jun 2017 12:02:53 -0400
From:      "Kenneth D. Merry" <ken@FreeBSD.ORG>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        Tomoaki AOKI <junchoon@dec.sakura.ne.jp>, freebsd-current@freebsd.org
Subject:   Re: Time to increase MAXPHYS?
Message-ID:  <20170605160253.GA17376@mithlond.kdm.org>
In-Reply-To: <15e42fd1-055d-28f6-5e24-1448e16954a9@selasky.org>
References:  <0100015c6fc1167c-6e139920-60d9-4ce3-9f59-15520276aebb-000000@email.amazonses.com> <972dbd34-b5b3-c363-721e-c6e48806e2cd@elischer.org> <3719c729-9434-3121-cf52-393a4453d0b2@freebsd.org> <CANCZdfrkc1ERKnJr4JzHpePmU%2BrN5JOgAVePCShPHLDCAE19=w@mail.gmail.com> <CANCZdfpD3G8gR=C2_AekM6VeJ6dzKOnP820OOoF1M_eS0MfJ3g@mail.gmail.com> <20170604163948.eb5f74ce2a233b8f204ba671@dec.sakura.ne.jp> <15e42fd1-055d-28f6-5e24-1448e16954a9@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 04, 2017 at 09:52:36 +0200, Hans Petter Selasky wrote:
> On 06/04/17 09:39, Tomoaki AOKI wrote:
> > Hi
> > 
> > One possibility would be to make it MD build-time OTIONS,
> > defaulting 1M on regular systems and 128k on smaller systems.
> > 
> > Of course I guess making it a tunable (or sysctl) would be best,
> > though.
> > 
> 
> Hi,
> 
> A tunable sysctl would be fine, but beware that commonly used firmware 
> out there produced in the millions might hang in a non-recoverable way 
> if you exceed their "internal limits". Conditionally lowering this 
> definition is fine, but increasing it needs to be carefully verified.
> 
> For example many USB devices are only tested with OS'es like Windows and 
> MacOS and if these have any kind of limitation on the SCSI transfer 
> sizes, it is very likely many devices out there do not support any 
> larger transfer sizes either.

I agree that I'd like to see a tunable.  We've been using a MAXPHYS value
slightly larger than 1MB at Spectra for years with no problems, but then
again, we're only running on newer hardware.

If we keep DFLTPHYS the same (64K) or come up with another constant that is
defined to 64K, the way the da(4) and sa(4) handle things will keep most
older controllers working properly.  Here is what da(4) does:

	if (cpi.maxio == 0)
		softc->maxio = DFLTPHYS;        /* traditional default */
	else if (cpi.maxio > MAXPHYS)
		softc->maxio = MAXPHYS;         /* for safety */
	else
		softc->maxio = cpi.maxio;
	softc->disk->d_maxsize = softc->maxio;

cpi is the XPT_PATH_INQ CCB.  The maxio field was added later, so older,
unmodified drivers that haven't set the maxio field default to a 64K I/O
size.

Drivers for some of the more common SAS and FC hardware set maxio to a
value that is correct for the hardware.  (e.g. mpt(4), mps(4), mpr(4),
and isp(4) all set it correctly.)

As Warner pointed out, the way ahci(4) works is that it sets its maximum
I/O size to MAXPHYS.  The question is, does all AHCI hardware support
arbitrary transfer sizes?  Is there a way to figure out what the hardware
supports, and if not, we should probably default it to 128K instead of
MAXPHYS.

Tape drives are another related issue.  Tape block sizes up to 1MB are
pretty common.  LTFS allows for blocksizes up to 1MB.  You can't currently
read a tape with a 1MB blocksize on FreeBSD without bumping MAXPHYS and
having a controller and tape drive that can handle the larger blocksize.

The sa(4) driver has the same logic as the da(4) driver for limiting
transfer sizes to the smaller of MAXPHYS and cpi.maxio.

The sa(4) driver gives the user some tools for figuring things out:

{sm4u-1-mgmt:/root:!:1} mt status -v
Drive: sa0: <IBM ULTRIUM-HH5 G9N1> Serial Number: 101500520A
---------------------------------
Mode      Density              Blocksize      bpi      Compression
Current:  0x58:LTO-5           variable       384607   enabled (0x1)
---------------------------------
Current Driver State: at rest.
---------------------------------
Partition:   0      Calc File Number:   0     Calc Record Number: 0
Residual:    0  Reported File Number:   0 Reported Record Number: 0
Flags: BOP
---------------------------------
Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1048576 bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 8388608 bytes
  Minimum block size supported by tape drive and media (min_blk): 1 bytes
  Block granularity supported by tape drive and media (blk_gran): 0 bytes
  Maximum possible I/O size (max_effective_iosize): 1048576 bytes

On this particular FreeBSD/head machine, I have MAXPHYS set to 1MB.  The
controller (isp(4)) supports ~5MB I/O sizes and the drive (IBM LTO-5)
supports ~8MB I/O, but MAXPHYS is set to 1MB, so that is the limit.

I have considered changing the sa(4) driver to not use physio(9), and
instead use a custom allocator to allow reading and writing tapes with
blocksizes up to what the hardware (combination of tape drive and
controller) allows.  I haven't gotten around to it yet, because bumping
MAXPHYS works well enough in most cases.  It also has a nice side effect of
allowing unmapped I/O.

The pass(4) driver limits I/O sizes in the same way as the da(4) and sa(4)
drivers for CCBs sent via the blocking (CAMIOCOMMAND) ioctl, but for CCBs
sent via the asynchronous API, the only limit is the controller (cpi.maxio)
limit.  The latter is because the buffers for the asynchronous interface
are malloced.  If it were possible to send arbitrary sized, unmapped S/G
lists, then we could convert the asynchronous pass(4) interface to do
unmapped I/O.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170605160253.GA17376>