From owner-freebsd-fs@freebsd.org Thu Jan 3 10:34:40 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A1D71424481 for ; Thu, 3 Jan 2019 10:34:39 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1069B6EDD8 for ; Thu, 3 Jan 2019 10:34:36 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.5] (unknown [192.148.167.11]) by proxypop01.sare.net (Postfix) with ESMTPA id 8F1899DD8FE for ; Thu, 3 Jan 2019 11:34:26 +0100 (CET) From: Borja Marcos Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Interesting: ZFS scrub prefetch hurting sequential scrub performance? Message-Id: <8ECF7513-9DFB-46EF-86BA-DB717D713792@sarenet.es> Date: Thu, 3 Jan 2019 11:34:25 +0100 To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 1069B6EDD8 X-Spamd-Bar: ++ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of borjam@sarenet.es designates 195.16.151.151 as permitted sender) smtp.mailfrom=borjam@sarenet.es X-Spamd-Result: default: False [2.94 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:195.16.150.0/23]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain,multipart/related]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_DN_NONE(0.00)[]; NEURAL_SPAM_MEDIUM(0.68)[0.676,0]; RCPT_COUNT_ONE(0.00)[1]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MX_GOOD(-0.01)[smtp.sarenet.es,smtp.sarenet.es,smtp.sarenet.es]; NEURAL_SPAM_LONG(1.00)[1.000,0]; RCVD_IN_DNSWL_NONE(0.00)[151.151.16.195.list.dnswl.org : 127.0.10.0]; NEURAL_SPAM_SHORT(0.06)[0.061,0]; SUBJECT_ENDS_QUESTION(1.00)[]; DMARC_NA(0.00)[sarenet.es]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+]; ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.01)[country: ES(0.05)]; RCVD_COUNT_TWO(0.00)[2] X-Mailman-Approved-At: Thu, 03 Jan 2019 11:34:08 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2019 10:34:40 -0000 Hi, I have noticed that my scrubs have become painfully slow. I am wondering = wether I=E2=80=99ve just hit some worst case or maybe there is some interaction between the ZFS sequential scrub and scrub = prefetch. I don=E2=80=99t recall seeing this behavior before the sequential scrub code was committed.=20 Did I hit some worst case or should scrub prefetch be disabled with the = new sequential scrub code? # zpool status pool: pool state: ONLINE scan: scrub in progress since Sat Dec 29 03:56:02 2018 133G scanned at 309K/s, 129G issued at 300K/s, 619G total 0 repaired, 20.80% done, no estimated completion time When this happened last month I tried rebooting the server and = restarting the scrub and everything went better.=20 The first graph shows the disk I/O bandwith history for the last week. = When the scrub started disk I/O =E2=80=9Cbusy percent=E2=80=9D reached almost 100 %. And curiously the transfer rates looked rather = healthy at around 10 MBps of read activity. At first I suspected a misbehaving disk slowing down the whole process = with retries but all the disks show a similar service time pattern. One attached for reference. Looking at the rest of the stats for some misbehavior hints I saw = arctats_prefetch_metadata misses raising to about 2000 per second and arcstats_l2_misses following the same pattern.=20= Could it be prefetch spending a lot of time writing on the l2arc only to = have the data evicted due to misses? I have tried disabling scrub prefetch (vfs.zfs.no_scrub_prefetch=3D1) = and, voila! everything picked up speed. Now with a zpool iostat I see bursts of 100+ MBps reading activity and a = proper scrub activity. Disk busy percent has gone down to around 50% and cache stats have = become much better. Turns out that most of the I/O activity was just pointless writes to the L2ARC. Now, the hardware configuration. The server has only 8 GB of memory with a maximum configured ARC size of = 4 GB.=20 It has a LSI2008 card with IR firmware. I didn=C2=B4t bother to cross = flash but anyway I am not using the RAID facilities, it=C2=B4s just configured like a plain HBA. mps0: port 0x9000-0x90ff mem = 0xdfff0000-0xdfffffff,0xdff80000-0xdffbffff irq 17 at device 0.0 = numa-domain 0 on pci4 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: = 185c zpool status pool: pool state: ONLINE scan: scrub in progress since Sat Dec 29 03:56:02 2018 323G scanned at 742K/s, 274G issued at 632K/s, 619G total 0 repaired, 44.32% done, no estimated completion time config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da12 ONLINE 0 0 0 da13 ONLINE 0 0 0 da14 ONLINE 0 0 0 da9 ONLINE 0 0 0 da15 ONLINE 0 0 0 da3 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da10 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 logs da11p2 ONLINE 0 0 0 cache da11p3 ONLINE 0 0 0 errors: No known data errors Yes, both ZIL and L2ARC on the same disk (a SSD). I know it=E2=80=99s = not optimal but I guess it=E2=80=99s better than the high latency of conventional disks, # camcontrol devlist at scbus6 target 11 lun 0 (pass0,da0) at scbus6 target 15 lun 0 (pass1,da1) at scbus6 target 17 lun 0 (pass2,da2) at scbus6 target 18 lun 0 (pass3,da3) at scbus6 target 20 lun 0 (pass4,da4) at scbus6 target 21 lun 0 (pass5,da5) at scbus6 target 22 lun 0 (pass6,da6) at scbus6 target 23 lun 0 (pass7,da7) at scbus6 target 24 lun 0 (pass8,da8) at scbus6 target 25 lun 0 (pass9,da9) at scbus6 target 26 lun 0 = (pass10,da10) at scbus6 target 27 lun 0 = (ses0,pass11) at scbus6 target 28 lun 0 = (pass12,da11) at scbus6 target 29 lun 0 = (pass13,da12) at scbus6 target 30 lun 0 = (pass14,da13) at scbus6 target 32 lun 0 = (pass15,da14) at scbus6 target 33 lun 0 = (pass16,da15) at scbus13 target 0 lun 0 = (pass17,cd0) Hope the attachments reach the list, otherwise I will mail them to = anyone interested. Cheers, Borja.