From owner-freebsd-scsi@FreeBSD.ORG Mon Dec 29 09:22:21 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 52EACEC for ; Mon, 29 Dec 2014 09:22:21 +0000 (UTC) Received: from mail-wg0-f41.google.com (mail-wg0-f41.google.com [74.125.82.41]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E3288181D for ; Mon, 29 Dec 2014 09:22:20 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id y19so18487293wgg.28 for ; Mon, 29 Dec 2014 01:22:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=4Abd5XrFaygRYrJzPMGenrnfCWaGqxn7Z+U97OW/QFs=; b=dFFEMIDl617OifMGMUKMsHLcvZwhw4nmWd/DHAPDruqjkOSN+yPdTBL3awfUl5X5+g qmFBxgxvcgEHjkJnmuikhRdgNbBWwL8fmbh0Lnx5wB1A1yhTO6dTt3TfT6RY5PLlrVmL pH0RJaudoEGTNdYhp3gpVWAedrxuLGDgDk/95kjNFThs6lx0RhDlaFElyPhk6v59PZmM 7p9XoTIBy5r9XmSffjGKO851/nWoPtwq0Tftqgn6awzPQU61auNFCTk4r9+z+iO3q2Ux levhLJzL9JKsmYUR/kTm4OlanrANmjH30Qyx30166r82TFDWGoEE+6w9qVeD4VVg07y+ gMNw== X-Gm-Message-State: ALoCoQmxTBEP+quikC3PlVlWeJKXRDSqk1Rc3JsNo0KwUlNwf3akdvsZFBk1dXusODxa+H2C9Fyj MIME-Version: 1.0 X-Received: by 10.194.177.225 with SMTP id ct1mr104834734wjc.75.1419844932792; Mon, 29 Dec 2014 01:22:12 -0800 (PST) Received: by 10.217.106.202 with HTTP; Mon, 29 Dec 2014 01:22:12 -0800 (PST) X-Originating-IP: [117.221.21.163] Date: Mon, 29 Dec 2014 14:52:12 +0530 Message-ID: Subject: Tape block size greater than MAXPHYS From: Shivaram Upadhyayula To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Dec 2014 09:22:21 -0000 Hi, It seems that currently any tape reads/writes greater than MAXPHYS will fail. For example cpi->maxio = 256 * 1024; /* Controller max io size 256K */ root@quadstorvtl # dd if=/dev/zero of=/dev/sa0 bs=256k count=1 sa0.0: request size=262144 > si_iosize_max=131072; cannot split request sa0.0: request size=262144 > MAXPHYS=131072; cannot split request dd: /dev/sa0: File too large 1+0 records in 0+0 records out 0 bytes transferred in 0.000390 secs (0 bytes/sec) The first limitation comes from sys/cam/scsi/scsi_sa.c:saregister /* * If maxio isn't set, we fall back to DFLTPHYS. Otherwise we take * the smaller of cpi.maxio or MAXPHYS. */ if (cpi.maxio == 0) softc->maxio = DFLTPHYS; else if (cpi.maxio > MAXPHYS) softc->maxio = MAXPHYS; else softc->maxio = cpi.maxio; softc limits maxio to MAXPHYS even if the controller supports a higher maxio value. I tried removing the limitation which then led me to reason for the actual reason for the limiation in sys/kern/kern_physio.c:physio /* * If the driver does not want I/O to be split, that means that we * need to reject any requests that will not fit into one buffer. */ if (dev->si_flags & SI_NOSPLIT && (uio->uio_resid > dev->si_iosize_max || uio->uio_resid > MAXPHYS || uio->uio_iovcnt > 1)) { To maintain consistency of the block numbers SI_NOSPLIT has to be set, but then to issue the entire request in a single bio the request size will be limited to MAXPHYS. Would is be correct to assume that the only way to increase the tape block size for writes/reads is to increase MAXPHYS and recompile the kernel ? (As of now on FreeBSD 10.1) Regards, Shivaram -- QUADStor Open Source Storage Virtualization : Thin Provisioning, Data Deduplication, VAAI, High Availability, Virtual Tape Library http://www.quadstor.com From owner-freebsd-scsi@FreeBSD.ORG Tue Dec 30 06:58:24 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DCE124B9; Tue, 30 Dec 2014 06:58:24 +0000 (UTC) Received: from mithlond.kdm.org (mithlond.kdm.org [70.56.43.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 860E3256B; Tue, 30 Dec 2014 06:58:24 +0000 (UTC) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.14.9/8.14.9) with ESMTP id sBU6VTCe077608 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 29 Dec 2014 23:31:29 -0700 (MST) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.14.9/8.14.9/Submit) id sBU6VTUm077607; Mon, 29 Dec 2014 23:31:29 -0700 (MST) (envelope-from ken) Date: Mon, 29 Dec 2014 23:31:29 -0700 From: "Kenneth D. Merry" To: Shivaram Upadhyayula Subject: Re: Tape block size greater than MAXPHYS Message-ID: <20141230063129.GA77314@mithlond.kdm.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Mon, 29 Dec 2014 23:31:29 -0700 (MST) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mithlond.kdm.org Cc: freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 06:58:25 -0000 On Mon, Dec 29, 2014 at 14:52:12 +0530, Shivaram Upadhyayula wrote: > Hi, > > It seems that currently any tape reads/writes greater than MAXPHYS > will fail. For example > > cpi->maxio = 256 * 1024; /* Controller max io size 256K */ > > root@quadstorvtl # dd if=/dev/zero of=/dev/sa0 bs=256k count=1 > sa0.0: request size=262144 > si_iosize_max=131072; cannot split request > sa0.0: request size=262144 > MAXPHYS=131072; cannot split request > dd: /dev/sa0: File too large > 1+0 records in > 0+0 records out > 0 bytes transferred in 0.000390 secs (0 bytes/sec) > > The first limitation comes from sys/cam/scsi/scsi_sa.c:saregister > /* > * If maxio isn't set, we fall back to DFLTPHYS. Otherwise we take > * the smaller of cpi.maxio or MAXPHYS. > */ > if (cpi.maxio == 0) > softc->maxio = DFLTPHYS; > else if (cpi.maxio > MAXPHYS) > softc->maxio = MAXPHYS; > else > softc->maxio = cpi.maxio; > > softc limits maxio to MAXPHYS even if the controller supports a higher > maxio value. I tried removing the limitation which then led me to > reason for the actual reason for the limiation in > sys/kern/kern_physio.c:physio > > /* > * If the driver does not want I/O to be split, that means that we > * need to reject any requests that will not fit into one buffer. > */ > if (dev->si_flags & SI_NOSPLIT && > (uio->uio_resid > dev->si_iosize_max || uio->uio_resid > MAXPHYS || > uio->uio_iovcnt > 1)) { > > To maintain consistency of the block numbers SI_NOSPLIT has to be set, > but then to issue the entire request in a single bio the request size > will be limited to MAXPHYS. > > Would is be correct to assume that the only way to increase the tape > block size for writes/reads is to increase MAXPHYS and recompile the > kernel ? (As of now on FreeBSD 10.1) Your analysis is correct. The reason I added the SI_NOSPLIT code (and set the flag in the sa(4) driver) is that the previous situation was bad from the standpoint of a tape drive user. You could write to a tape with a large blocksize, but that isn't what would actually make it onto the tape. You wouldn't know exactly what size blocks were making it onto the tape; that would depend on the size and alignment of the incoming buffers. Now at least the application has a clear understanding of what is written to tape. One problem that was there before the SI_NOSPLIT changes and is still present is that we can't by default read tapes with a large blocksize (e.g. 1MB). Increasing MAXPHYS will certainly fix it (assuming your controller sets the maxio field in the path inquiry CCB to something sufficiently large). I have considered adding a custom read/write routine to the sa(4) driver that would essentially take the best available path given the requested block size and the constraints imposed by the controller and MAXPHYS. The logic would be something like: - If the I/O is <= MAXPHYS (including alignment constraints) and the controller supports it, do unmapped I/O. - Otherwise, allocate buffers from a sa(4)-specific UMA zone and copy in and out. This would allow for doing I/O up to the controller's limit, without regard for MAXPHYS. On modern machines, this would also usually be faster than mapping the memory in and out of the kernel, because you avoid the extra TLB shootdowns. Ideally we'll get a scheme in place to allow doing unmapped S/G lists at some point. But we don't have that yet. I have some code with logic similar to the above scenario for the pass(4) driver asynchronous mode that I has been in my queue to upstream for about a year. I also have a very large set of tape driver improvements that I've been working on (off and on) for about a year and a half. I haven't done the custom read/write routine yet, but I may do it if I have some time. By the way, the mps(4) and mpr(4) drivers can do I/O larger than 256KB. That limit is somewhat arbitrary. Perhaps Steve (CCed) can take a look at what we need to do to calculate the true limit (which would be based on the page size of the machine and maximum number of S/G lists the controller can handle) so we can pass back a more accurate number. The isp(4) driver I/O limit is accurate. If you try to use it with a modern tape drive, you'll likely run into some FC-Tape related bugs. I need to upstream those fixes too. Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-scsi@FreeBSD.ORG Tue Dec 30 16:39:11 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 32CD66A6 for ; Tue, 30 Dec 2014 16:39:11 +0000 (UTC) Received: from mail-wg0-f41.google.com (mail-wg0-f41.google.com [74.125.82.41]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BEDAB291F for ; Tue, 30 Dec 2014 16:39:10 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id y19so20973204wgg.28 for ; Tue, 30 Dec 2014 08:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=YSwWRH6H7Ym09tsVlhlAZY1TjzpvOeg/lxAWsD9aqTM=; b=ctpe484/F+JjnyeQVH2iJ50iYXSn+axAoPW4K15z58hi5K0ZUZBzV1wlsrqWMfh5mS dVv916oyR9PXXFXHwasa5CdPZgfErYRcnvfqld9oZblHUZZjkdDVvvIZKhcQP+zVccZb ODIlK+9L9mmV17njyorrTAc+IjJ7PMDLNRRLskTKIsrE5+6gTwULny9mu50JgZMC7keZ +Jkk2UoVgj/THwobTX3kTADn5pAfjRT6FX4PVHfK8riFiHf/3Pvumjr3sqvjcpfxaFnP y9HsbbGpLv09D9mxyWrM3OUODiZ6PD/0aQaDsW6Anv5ya9cM0/t5wC58kmdAA7OIXf86 QArw== X-Gm-Message-State: ALoCoQmucburoe7plTcyrAPMTif1XocGidQwloldpN4pnqAIcKcTkL6gwbtLl8vQx8KDmGxHWBHZ MIME-Version: 1.0 X-Received: by 10.180.85.4 with SMTP id d4mr106867308wiz.36.1419957542955; Tue, 30 Dec 2014 08:39:02 -0800 (PST) Received: by 10.217.106.202 with HTTP; Tue, 30 Dec 2014 08:39:02 -0800 (PST) X-Originating-IP: [117.221.20.54] In-Reply-To: <20141230063129.GA77314@mithlond.kdm.org> References: <20141230063129.GA77314@mithlond.kdm.org> Date: Tue, 30 Dec 2014 22:09:02 +0530 Message-ID: Subject: Re: Tape block size greater than MAXPHYS From: Shivaram Upadhyayula To: "Kenneth D. Merry" Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 16:39:11 -0000 On Tue, Dec 30, 2014 at 12:01 PM, Kenneth D. Merry wrote: > On Mon, Dec 29, 2014 at 14:52:12 +0530, Shivaram Upadhyayula wrote: <...> > One problem that was there before the SI_NOSPLIT changes and is still > present is that we can't by default read tapes with a large blocksize (e.g. > 1MB). Increasing MAXPHYS will certainly fix it (assuming your controller > sets the maxio field in the path inquiry CCB to something sufficiently > large). I have considered adding a custom read/write routine to the sa(4) > driver that would essentially take the best available path given the > requested block size and the constraints imposed by the controller and > MAXPHYS. > > The logic would be something like: > - If the I/O is <= MAXPHYS (including alignment constraints) and the > controller supports it, do unmapped I/O. > - Otherwise, allocate buffers from a sa(4)-specific UMA zone and copy in > and out. This would allow for doing I/O up to the controller's limit, > without regard for MAXPHYS. On modern machines, this would also usually > be faster than mapping the memory in and out of the kernel, because you > avoid the extra TLB shootdowns. > > Ideally we'll get a scheme in place to allow doing unmapped S/G lists at > some point. But we don't have that yet. > > I have some code with logic similar to the above scenario for the pass(4) > driver asynchronous mode that I has been in my queue to upstream for about > a year. Where can the passthrough driver patch be accessed ? Thanks, Shivaram