Date: Mon, 5 Jun 2017 12:02:53 -0400 From: "Kenneth D. Merry" <ken@FreeBSD.ORG> To: Hans Petter Selasky <hps@selasky.org> Cc: Tomoaki AOKI <junchoon@dec.sakura.ne.jp>, freebsd-current@freebsd.org Subject: Re: Time to increase MAXPHYS? Message-ID: <20170605160253.GA17376@mithlond.kdm.org> In-Reply-To: <15e42fd1-055d-28f6-5e24-1448e16954a9@selasky.org> References: <0100015c6fc1167c-6e139920-60d9-4ce3-9f59-15520276aebb-000000@email.amazonses.com> <972dbd34-b5b3-c363-721e-c6e48806e2cd@elischer.org> <3719c729-9434-3121-cf52-393a4453d0b2@freebsd.org> <CANCZdfrkc1ERKnJr4JzHpePmU%2BrN5JOgAVePCShPHLDCAE19=w@mail.gmail.com> <CANCZdfpD3G8gR=C2_AekM6VeJ6dzKOnP820OOoF1M_eS0MfJ3g@mail.gmail.com> <20170604163948.eb5f74ce2a233b8f204ba671@dec.sakura.ne.jp> <15e42fd1-055d-28f6-5e24-1448e16954a9@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 04, 2017 at 09:52:36 +0200, Hans Petter Selasky wrote: > On 06/04/17 09:39, Tomoaki AOKI wrote: > > Hi > > > > One possibility would be to make it MD build-time OTIONS, > > defaulting 1M on regular systems and 128k on smaller systems. > > > > Of course I guess making it a tunable (or sysctl) would be best, > > though. > > > > Hi, > > A tunable sysctl would be fine, but beware that commonly used firmware > out there produced in the millions might hang in a non-recoverable way > if you exceed their "internal limits". Conditionally lowering this > definition is fine, but increasing it needs to be carefully verified. > > For example many USB devices are only tested with OS'es like Windows and > MacOS and if these have any kind of limitation on the SCSI transfer > sizes, it is very likely many devices out there do not support any > larger transfer sizes either. I agree that I'd like to see a tunable. We've been using a MAXPHYS value slightly larger than 1MB at Spectra for years with no problems, but then again, we're only running on newer hardware. If we keep DFLTPHYS the same (64K) or come up with another constant that is defined to 64K, the way the da(4) and sa(4) handle things will keep most older controllers working properly. Here is what da(4) does: if (cpi.maxio == 0) softc->maxio = DFLTPHYS; /* traditional default */ else if (cpi.maxio > MAXPHYS) softc->maxio = MAXPHYS; /* for safety */ else softc->maxio = cpi.maxio; softc->disk->d_maxsize = softc->maxio; cpi is the XPT_PATH_INQ CCB. The maxio field was added later, so older, unmodified drivers that haven't set the maxio field default to a 64K I/O size. Drivers for some of the more common SAS and FC hardware set maxio to a value that is correct for the hardware. (e.g. mpt(4), mps(4), mpr(4), and isp(4) all set it correctly.) As Warner pointed out, the way ahci(4) works is that it sets its maximum I/O size to MAXPHYS. The question is, does all AHCI hardware support arbitrary transfer sizes? Is there a way to figure out what the hardware supports, and if not, we should probably default it to 128K instead of MAXPHYS. Tape drives are another related issue. Tape block sizes up to 1MB are pretty common. LTFS allows for blocksizes up to 1MB. You can't currently read a tape with a 1MB blocksize on FreeBSD without bumping MAXPHYS and having a controller and tape drive that can handle the larger blocksize. The sa(4) driver has the same logic as the da(4) driver for limiting transfer sizes to the smaller of MAXPHYS and cpi.maxio. The sa(4) driver gives the user some tools for figuring things out: {sm4u-1-mgmt:/root:!:1} mt status -v Drive: sa0: <IBM ULTRIUM-HH5 G9N1> Serial Number: 101500520A --------------------------------- Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) --------------------------------- Current Driver State: at rest. --------------------------------- Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual: 0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP --------------------------------- Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1048576 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1048576 bytes On this particular FreeBSD/head machine, I have MAXPHYS set to 1MB. The controller (isp(4)) supports ~5MB I/O sizes and the drive (IBM LTO-5) supports ~8MB I/O, but MAXPHYS is set to 1MB, so that is the limit. I have considered changing the sa(4) driver to not use physio(9), and instead use a custom allocator to allow reading and writing tapes with blocksizes up to what the hardware (combination of tape drive and controller) allows. I haven't gotten around to it yet, because bumping MAXPHYS works well enough in most cases. It also has a nice side effect of allowing unmapped I/O. The pass(4) driver limits I/O sizes in the same way as the da(4) and sa(4) drivers for CCBs sent via the blocking (CAMIOCOMMAND) ioctl, but for CCBs sent via the asynchronous API, the only limit is the controller (cpi.maxio) limit. The latter is because the buffers for the asynchronous interface are malloced. If it were possible to send arbitrary sized, unmapped S/G lists, then we could convert the asynchronous pass(4) interface to do unmapped I/O. Ken -- Kenneth Merry ken@FreeBSD.ORG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170605160253.GA17376>