From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 5 16:18:21 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 83AAFD1F; Sat, 5 Jul 2014 16:18:21 +0000 (UTC) Received: from SMTP02.CITRIX.COM (smtp02.citrix.com [66.165.176.63]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "mail.citrix.com", Issuer "Cybertrust Public SureServer SV CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6DC402734; Sat, 5 Jul 2014 16:18:19 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.01,608,1400025600"; d="scan'208";a="150149047" Received: from accessns.citrite.net (HELO FTLPEX01CL02.citrite.net) ([10.9.154.239]) by FTLPIPO02.CITRIX.COM with ESMTP; 05 Jul 2014 16:18:11 +0000 Received: from [IPv6:::1] (10.80.16.47) by smtprelay.citrix.com (10.13.107.79) with Microsoft SMTP Server id 14.3.181.6; Sat, 5 Jul 2014 12:18:10 -0400 Message-ID: <53B8253F.5060403@citrix.com> Date: Sat, 5 Jul 2014 18:18:07 +0200 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Strange IO performance with UFS References: <53B691EA.3070108@citrix.com> <53B69C73.7090806@citrix.com> <20140705001938.54a3873dd698080d93d840e2@systemdatarecorder.org> <53B7C616.1000702@citrix.com> <20140705095831.GO93733@kib.kiev.ua> <53B7D4DF.40301@citrix.com> <20140705112448.GQ93733@kib.kiev.ua> In-Reply-To: <20140705112448.GQ93733@kib.kiev.ua> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-DLP: MIA1 Cc: freebsd-fs@freebsd.org, Stefan Parvu , FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Jul 2014 16:18:21 -0000 On 05/07/14 13:24, Konstantin Belousov wrote: > On Sat, Jul 05, 2014 at 12:35:11PM +0200, Roger Pau Monn? wrote: >> On 05/07/14 11:58, Konstantin Belousov wrote: >>> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn? >>> wrote: >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 >>>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a >>>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46 >>>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2 >>>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166 >>>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22 >>>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173 >>>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65 >>>> kernel`sys_write+0x63 >>>> >>>> This can also be seen by running iostat in parallel with the >>>> fio workload: >>>> >>>> device r/s w/s kr/s kw/s qlen svc_t %b ada0 >>>> 243.3 233.7 31053.3 29919.1 31 57.4 100 >>>> >>>> This clearly shows that even when I was doing a sequential >>>> write (the fio workload shown above), the disk was actually >>>> reading more data than writing it, which makes no sense, and >>>> all the reads come from the path trace shown above. >>> >>> The backtrace above means that the BA_CLRBUF was specified for >>> UFS_BALLOC(). In turns, this occurs when the write size is >>> less than the UFS block size. UFS has to read the block to >>> ensure that partial write does not corrupt the rest of the >>> buffer. >> >> Thanks for the clarification, that makes sense. I'm not opening >> the file with O_DIRECT, so shouldn't the write be cached in >> memory and flushed to disk when we have the full block? It's a >> sequential write, so the whole block is going to be rewritten >> very soon. >> >>> >>> You can get the block size for file with stat(2), st_blksize >>> field of the struct stat, or using statfs(2), field f_iosize of >>> struct statfs, or just looking at the dumpfs output for your >>> filesystem, the bsize value. For modern UFS typical value is >>> 32KB. >> >> Yes, block size is 32KB, checked with dumpfs. I've changed the >> block size in fio to 32k and then I get the expected results in >> iostat and fio: >> >> extended device statistics device r/s w/s kr/s kw/s >> qlen svc_t %b ada0 1.0 658.2 31.1 84245.1 58 108.4 >> 101 extended device statistics device r/s w/s kr/s >> kw/s qlen svc_t %b ada0 0.0 689.8 0.0 88291.4 54 >> 112.1 99 extended device statistics device r/s w/s kr/s >> kw/s qlen svc_t %b ada0 1.0 593.3 30.6 75936.9 80 >> 111.7 97 >> >> write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec > > The current code in ffs_write() only avoids read before write when > write covers complete block. I think we can somewhat loose the > test to also avoid read when we are at EOF and write covers > completely the valid portion of the last block. > > This leaves the unwritten portion of the block with the garbage. I > believe that it is not harmful, since the only way for usermode to > access that garbage is through the mmap(2). The > vnode_generic_getpages() zeroes out parts of the page which are > after EOF. > > Try this, almost completely untested: Doesn't seem to help much, I'm still seeing the same issue. I'm sampling iostat every 1s, and here's the output form the start of the 4k block fio workload: extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 349.5 0.0 44612.3 48 88.0 52 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 655.4 0.0 83773.6 76 99.8 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 699.2 0.0 89493.1 59 109.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 628.1 0.0 80392.6 55 114.8 98 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 655.7 0.0 83799.6 79 98.4 102 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 701.4 0.0 89782.0 80 105.5 97 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 697.9 0.0 89331.6 78 112.0 103 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 714.1 0.0 91408.7 77 110.3 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 724.0 0.0 92675.0 67 112.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 700.4 0.0 89646.6 49 102.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 686.4 0.0 87857.2 78 110.0 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 702.0 0.0 89851.6 80 112.9 97 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 736.3 0.0 94246.4 67 110.1 103 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 624.6 0.0 79950.0 48 115.7 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 704.0 0.0 90118.4 77 106.1 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 714.6 0.0 91470.0 80 103.6 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 710.4 0.0 90926.1 80 111.1 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 655.3 0.0 83882.1 70 115.8 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 539.8 0.0 69094.5 80 121.2 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 711.6 0.0 91087.6 79 107.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 705.5 0.0 90304.5 81 111.3 97 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 727.3 0.0 93092.8 81 108.9 102 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 699.5 0.0 89296.4 55 109.0 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 689.0 0.0 88066.1 78 96.6 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 738.3 0.0 94496.1 56 109.1 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 615.4 0.0 78770.0 80 112.3 98 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 707.3 0.0 90529.8 86 105.7 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 704.3 0.0 89333.9 67 98.3 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 641.3 0.0 82081.5 80 112.3 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 701.6 0.0 89747.9 51 101.1 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 693.0 0.0 88702.1 80 103.6 97 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 632.7 0.0 80991.8 80 112.0 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 709.0 0.0 90748.2 80 107.5 102 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 715.0 0.0 91523.0 80 104.7 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 650.1 0.0 83210.5 56 110.9 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 682.2 0.0 87319.1 57 107.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 719.0 0.0 92032.6 80 103.6 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 624.3 0.0 79905.8 80 110.5 97 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 696.5 0.0 89151.7 80 109.9 103 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 664.2 0.0 85017.6 77 109.9 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 681.7 0.0 87254.0 80 107.5 98 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 668.5 0.0 85569.3 57 109.9 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 682.3 0.0 87329.0 53 110.8 102 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 643.9 0.0 82420.9 77 104.8 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 107.5 457.1 13701.7 58471.3 57 106.0 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 220.9 253.9 28281.4 32498.9 54 108.8 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 290.6 277.9 37198.8 35576.1 65 94.3 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 309.3 267.9 39590.7 34295.9 80 89.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 213.6 302.0 27212.7 38562.0 24 93.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 232.1 224.3 29712.5 28339.8 31 117.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 262.9 249.4 33654.0 31928.1 47 81.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 232.2 229.2 29721.6 29340.5 50 78.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 222.8 229.4 28430.0 29362.7 42 85.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 231.5 246.5 29628.8 31555.9 6 72.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 261.7 256.8 33498.7 32769.1 33 83.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 262.7 260.7 33628.3 33279.4 35 85.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 234.0 249.1 29867.9 31883.1 18 90.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 252.1 239.8 32263.0 30581.4 32 91.2 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 241.5 257.5 30917.0 32961.1 16 69.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 257.9 243.5 33011.9 31164.2 32 86.8 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 237.5 235.6 30311.2 30046.9 31 67.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 290.4 213.1 37172.8 27277.0 79 65.3 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 216.4 284.3 27703.7 36392.5 42 95.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 223.8 248.2 28645.1 31774.4 16 69.4 89 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 294.0 217.7 37544.4 27864.2 64 68.0 110 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 210.7 245.6 26966.6 31439.8 59 107.4 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 228.5 265.2 29246.6 33940.5 10 99.2 98 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 279.1 218.4 35727.2 27955.0 52 71.9 102 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 232.3 293.4 29607.9 37521.4 14 93.2 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 299.5 236.6 38340.2 30288.8 79 69.7 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 216.3 268.9 27686.3 34417.3 4 90.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 285.8 261.0 36585.3 33409.5 53 84.6 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 228.5 232.5 29059.7 29661.1 48 74.3 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 242.7 262.4 31060.0 33588.2 27 69.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 248.2 252.2 31766.1 32149.3 8 78.9 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 267.9 230.2 34288.6 29462.8 62 68.5 100 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 238.0 266.2 30375.8 34075.6 0 95.4 100 As can be seen from the log above, at first the workload runs fine, and the disk is only performing writes, but at some point (in this case around 40% of completion) it starts performing this read-before-write dance that completely screws up performance. Roger.