From owner-freebsd-hackers@FreeBSD.ORG Fri Dec 10 15:26:27 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 123A2106564A; Fri, 10 Dec 2010 15:26:27 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from ftp.translate.ru (ftp.translate.ru [80.249.188.42]) by mx1.freebsd.org (Postfix) with ESMTP id C2BAC8FC1E; Fri, 10 Dec 2010 15:26:26 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (89.112.15.178.pppoe.eltel.net [89.112.15.178]) (Authenticated sender: lev@serebryakov.spb.ru) by ftp.translate.ru (Postfix) with ESMTPA id 7A48F13DF48; Fri, 10 Dec 2010 18:26:25 +0300 (MSK) Date: Fri, 10 Dec 2010 18:26:21 +0300 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <242059106.20101210182621@serebryakov.spb.ru> To: Alexander Motin In-Reply-To: <4D023D00.10301@FreeBSD.org> References: <4D023D00.10301@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Dec 2010 15:26:27 -0000 Hello, Alexander. You wrote 10 =E4=E5=EA=E0=E1=F0=FF 2010 =E3., 17:45:20: >> I'm digging thought GEOM/IO code and can not find place, where >> requests from userland to read more than MAXPHYS bytes, is splitted >> into several "struct bio"? >> It seems, that these children request are issued one-by-one, not in >> parallel, am I right? Why? It breaks down parallelism, when >> underlying GEOM can process several requests simoltaneously? > AFAIK first time requests from user-land broken to MAXPHYS-size pieces > by physio() before entering GEOM. Requests are indeed serialized here, I > suppose to limit KVA that thread can harvest, but IMHO it could be > reconsidered. It is good idea, maybe to have GEOM flag for this? For example, any stripe/geom3/geom5 code can process read of series of reads, for example much fater, than sequentially -- if userland want to read big blocks, bigger than stripe size. And small stripe size is bad idea due to high fixed cost of transaction. Now, when application read files on RAID5 with big blocks (say, read() is called with 1Mb buffer), RAID5 geom sees read requests of 128Kb in size, one by one. And with stripe size of 128Kb, it performs as single disk :( I can add pre-read for full-sized reads, but it is not generic solution, and sending BIOs from one (logical/userland) read/write request without awaiting their completion is generic solution. > One more split happens (when needed) at geom_disk module to honor disk > driver's maximal I/O size. There is no serialization. Most of ATA/SATA > drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K > by default. Many SCSI drivers still limited by DFLTPHYS - 64K. Yep, it is what I seen in my investigations. --=20 // Black Lion AKA Lev Serebryakov