From owner-freebsd-hackers@FreeBSD.ORG Tue Aug 12 22:54:30 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8363AE5C; Tue, 12 Aug 2014 22:54:30 +0000 (UTC) Received: from natasha.panasas.com (natasha.panasas.com [209.166.131.148]) by mx1.freebsd.org (Postfix) with ESMTP id 45B1021DE; Tue, 12 Aug 2014 22:54:29 +0000 (UTC) Received: from seabiscuit.panasas.com (seabiscuit.panasas.com [172.17.132.204]) by natasha.panasas.com (8.13.1/8.13.1) with ESMTP id s7CKxwdw029520; Tue, 12 Aug 2014 16:59:58 -0400 Received: from SEABISCUIT.int.panasas.com ([172.17.132.204]) by seabiscuit ([172.17.132.204]) with mapi id 14.03.0181.006; Tue, 12 Aug 2014 13:59:58 -0700 From: "Pokala, Ravi" To: John Baldwin , "freebsd-hackers@freebsd.org" Subject: Re: IO chunking Thread-Topic: IO chunking Thread-Index: AQHPtfqI63Kw96jg+06V8le/JTT3WpvNplYA///OawA= Date: Tue, 12 Aug 2014 20:59:57 +0000 Message-ID: References: <201408121257.18814.jhb@freebsd.org> In-Reply-To: <201408121257.18814.jhb@freebsd.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.3.140616 x-originating-ip: [172.17.28.63] Content-Type: text/plain; charset="us-ascii" Content-ID: <8892E5E66959E64F81083D29360EF412@panasas.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2014 22:54:30 -0000 -----Original Message----- From: John Baldwin Date: Tuesday, August 12, 2014 at 9:57 AM To: "freebsd-hackers@freebsd.org" Cc: Ravi Pokala Subject: Re: IO chunking >On Tuesday, August 12, 2014 2:56:26 am Pokala, Ravi wrote: >> Hi folks, >>=20 >> I'm doing moderately-large block IO (16KB - 1MB) directly against drive >> devices (i.e. /dev/adaX), and I see that `iostat -d adaX' reports a >> transaction size of at most 128KB. I believe this is because >>transactions >> are limited to at most MAXPHYS bytes (128KB), and requests larger than >> that are broken into smaller chunks; is that correct? If so, where does >> that chunking happen? In low-level GEOM code (geom_io.c, geom_dev.c)? In >> CAM? In the drive device driver? In VFS? > >Note that you can increase MAXPHYS (though you will want to ensure your >storage controller drivers correctly report their maximum supported size >and don't just hardcode MAXPHYS). Yeah. For a semi-related issue, we're planning on upping MAXPHYS to 256KB. That still artificially limits the transaction size, but it's a bit better. >The limit appears to be throughout the stack, though largely enforced >at the top (e.g. in physio() before entering GEOM or the b_pages[] array >in struct buf). Looking... sys/kern/kern_physio.c: 101 /* Don't exceed drivers iosize limit */ 102 if (bp->b_bcount > dev->si_iosize_max) 103 bp->b_bcount =3D dev->si_iosize_max; Yeah, that's probably what it is. It looks like geom_dev.c::g_dev_taste() sets si_iosize_max to MAXPHYS, and nothing in the ATA layer changes it. >Certainly I've seen folks run with MAXPHYS of 512k, but check your >drivers. Yeah. We've been running w/ 256KB in another branch for a while, so it looks like everything we use is okay. Thanks, Ravi >--=20 >John Baldwin