From owner-freebsd-fs@FreeBSD.ORG Mon Aug 20 12:07:54 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CD9DA106568A for ; Mon, 20 Aug 2012 12:07:54 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 2ED9A8FC12 for ; Mon, 20 Aug 2012 12:07:53 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q7KBo5lv013983 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Mon, 20 Aug 2012 14:50:06 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <5032246D.5060903@digsys.bg> Date: Mon, 20 Aug 2012 14:50:05 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.6esrpre) Gecko/20120728 Thunderbird/10.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <5023D9B7.20001@digsys.bg> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS scrub CPU bound? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2012 12:07:54 -0000 On 20.08.12 13:44, Ivan Voras wrote: > On 09/08/2012 17:39, Daniel Kalchev wrote: > >> CPU: 0.4% user, 0.0% nice, 51.4% system, 0.8% interrupt, 47.3% idle >> Mem: 2171M Active, 1541M Inact, 5407M Wired, 25M Cache, 416K Buf, 22G Free >> Swap: 8192M Total, 8192M Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 11 root 32 155 ki31 0K 512K CPU31 31 190.2H 1818.46% idle >> 0 root 268 -8 0 0K 4288K - 0 141:25 1261.33% kernel >> 4 root 4 -8 - 0K 80K CPU30 30 9:13 95.17% zfskern >> 13 root 3 -8 - 0K 48K - 4 5:37 42.97% geom >> 12 root 66 -84 - 0K 1056K WAIT 0 6:19 30.86% intr >> [...] >> It seems that zfskern will top to 100%. This is an 32 core system, and >> as you see scrub, at 600MB/sec is able to eat 16 cores (from 2x 2.2 GHz > If you hit "H", top will show threads separately, so you will see which > threads exactly do the work (you might need to reduce your font so that > all of them fit on the screen :) ). I have large screen(s) :) Here is the top of top: last pid: 4779; load averages: 11.98, 4.15, 1.70 up 0+01:46:30 14:40:49 898 processes: 49 running, 785 sleeping, 64 waiting CPU: 1.2% user, 0.0% nice, 52.1% system, 0.6% interrupt, 46.1% idle Mem: 1942M Active, 1140M Inact, 3257M Wired, 4240K Cache, 400K Buf, 25G Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 4 root -8 - 0K 80K CPU4 4 1:35 100.00% zfskern{txg_thread_enter} 11 root 155 ki31 0K 512K RUN 13 104:57 57.08% idle{idle: cpu13} 11 root 155 ki31 0K 512K RUN 14 104:54 55.76% idle{idle: cpu14} 11 root 155 ki31 0K 512K CPU8 8 104:55 54.59% idle{idle: cpu8} 11 root 155 ki31 0K 512K RUN 9 105:01 54.20% idle{idle: cpu9} 11 root 155 ki31 0K 512K CPU30 30 104:46 53.66% idle{idle: cpu30} 11 root 155 ki31 0K 512K RUN 19 104:52 53.27% idle{idle: cpu19} 11 root 155 ki31 0K 512K CPU15 15 104:45 52.98% idle{idle: cpu15} 11 root 155 ki31 0K 512K CPU17 17 104:52 52.69% idle{idle: cpu17} 11 root 155 ki31 0K 512K RUN 20 104:51 52.69% idle{idle: cpu20} 11 root 155 ki31 0K 512K CPU12 12 104:54 52.29% idle{idle: cpu12} 11 root 155 ki31 0K 512K CPU25 25 104:48 52.29% idle{idle: cpu25} 11 root 155 ki31 0K 512K CPU24 24 104:51 52.10% idle{idle: cpu24} 11 root 155 ki31 0K 512K CPU26 26 104:46 50.98% idle{idle: cpu26} 11 root 155 ki31 0K 512K CPU11 11 104:53 50.68% idle{idle: cpu11} 11 root 155 ki31 0K 512K RUN 31 104:48 50.49% idle{idle: cpu31} 11 root 155 ki31 0K 512K CPU10 10 104:57 50.10% idle{idle: cpu10} 11 root 155 ki31 0K 512K CPU1 1 104:53 50.00% idle{idle: cpu1} 11 root 155 ki31 0K 512K RUN 29 104:48 49.76% idle{idle: cpu29} 0 root -16 0 0K 4288K - 11 0:42 49.66% kernel{zio_read_intr_20} 11 root 155 ki31 0K 512K CPU3 3 104:57 49.56% idle{idle: cpu3} 0 root -16 0 0K 4288K - 6 0:42 49.56% kernel{zio_read_intr_15} 0 root -16 0 0K 4288K - 21 0:42 49.27% kernel{zio_read_intr_3} 11 root 155 ki31 0K 512K CPU28 28 104:24 49.17% idle{idle: cpu28} 0 root -16 0 0K 4288K - 24 0:42 49.17% kernel{zio_read_intr_29} 0 root -16 0 0K 4288K CPU17 17 0:42 49.07% kernel{zio_read_intr_1} 11 root 155 ki31 0K 512K RUN 27 104:46 48.97% idle{idle: cpu27} 0 root -16 0 0K 4288K CPU12 12 0:42 48.97% kernel{zio_read_intr_2} 0 root -16 0 0K 4288K - 25 0:42 48.78% kernel{zio_read_intr_7} 0 root -16 0 0K 4288K CPU14 14 0:42 48.78% kernel{zio_read_intr_11} 0 root -16 0 0K 4288K CPU19 19 0:42 48.78% kernel{zio_read_intr_28} 0 root -16 0 0K 4288K - 12 0:42 48.78% kernel{zio_read_intr_6} 0 root -16 0 0K 4288K - 29 0:42 48.78% kernel{zio_read_intr_18} 11 root 155 ki31 0K 512K RUN 0 105:10 48.68% idle{idle: cpu0} 11 root 155 ki31 0K 512K RUN 6 104:51 48.58% idle{idle: cpu6} 0 root -16 0 0K 4288K - 16 0:42 48.58% kernel{zio_read_intr_19} 0 root -16 0 0K 4288K CPU21 10 0:42 48.58% kernel{zio_read_intr_0} 0 root -16 0 0K 4288K CPU20 20 0:42 48.58% kernel{zio_read_intr_8} 0 root -16 0 0K 4288K CPU13 13 0:41 48.49% kernel{zio_read_intr_12} 0 root -16 0 0K 4288K - 5 0:42 48.39% kernel{zio_read_intr_5} 0 root -16 0 0K 4288K - 22 0:42 48.39% kernel{zio_read_intr_17} 0 root -16 0 0K 4288K CPU31 31 0:42 48.29% kernel{zio_read_intr_16} 0 root -16 0 0K 4288K - 17 0:42 48.29% kernel{zio_read_intr_4} 0 root -16 0 0K 4288K CPU0 0 0:42 48.29% kernel{zio_read_intr_13} 11 root 155 ki31 0K 512K CPU22 22 104:41 48.19% idle{idle: cpu22} 0 root -16 0 0K 4288K - 25 0:42 48.19% kernel{zio_read_intr_31} 0 root -16 0 0K 4288K - 26 0:42 48.19% kernel{zio_read_intr_25} 0 root -16 0 0K 4288K CPU15 29 0:42 48.19% kernel{zio_read_intr_22} 0 root -16 0 0K 4288K - 10 0:42 48.10% kernel{zio_read_intr_14} 0 root -16 0 0K 4288K - 8 0:42 48.10% kernel{zio_read_intr_21} 0 root -16 0 0K 4288K - 15 0:42 48.10% kernel{zio_read_intr_26} 0 root -16 0 0K 4288K - 16 0:42 48.00% kernel{zio_read_intr_9} 0 root -16 0 0K 4288K - 18 0:42 48.00% kernel{zio_read_intr_30} 0 root -16 0 0K 4288K CPU9 9 0:41 48.00% kernel{zio_read_intr_27} 11 root 155 ki31 0K 512K RUN 21 104:48 47.66% idle{idle: cpu21} 0 root -16 0 0K 4288K - 29 0:42 47.66% kernel{zio_read_intr_23} 0 root -16 0 0K 4288K - 29 0:42 47.66% kernel{zio_read_intr_10} 0 root -16 0 0K 4288K CPU23 23 0:42 47.56% kernel{zio_read_intr_24} 11 root 155 ki31 0K 512K CPU18 18 104:52 47.36% idle{idle: cpu18} 11 root 155 ki31 0K 512K CPU5 5 104:53 47.27% idle{idle: cpu5} 11 root 155 ki31 0K 512K CPU16 16 104:50 47.17% idle{idle: cpu16} 11 root 155 ki31 0K 512K RUN 23 104:44 47.17% idle{idle: cpu23} 11 root 155 ki31 0K 512K CPU7 7 104:56 45.46% idle{idle: cpu7} 11 root 155 ki31 0K 512K RUN 4 104:50 45.36% idle{idle: cpu4} 11 root 155 ki31 0K 512K CPU2 2 104:55 42.29% idle{idle: cpu2} 13 root -8 - 0K 48K - 7 0:42 18.65% geom{g_down} 12 root -68 - 0K 1056K CPU30 30 0:32 16.26% intr{swi2: cambio} 13 root -8 - 0K 48K CPU28 28 0:32 14.99% geom{g_up} 12 root -88 - 0K 1056K CPU12 12 0:21 9.28% intr{irq256: mps0} Is it possible the large number of IOPS to be causing this bottleneck? Too many zfskern state changes or something like that? gstat says dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0| cd0 6 9030 9024 149343 0.5 3 128 0.1 90.4| da0 5 8740 8734 148807 0.5 3 128 0.1 89.9| da1 5 8935 8929 149328 0.5 3 128 0.1 90.3| da2 6 8649 8643 148753 0.5 3 128 0.1 90.1| da3 0 0 0 0 0.0 0 0 0.0 0.0| da0p1 0 0 0 0 0.0 0 0 0.0 0.0| da0p2 6 9030 9024 149343 0.5 3 128 0.2 91.0| da0p3 0 0 0 0 0.0 0 0 0.0 0.0| da1p1 0 0 0 0 0.0 0 0 0.0 0.0| da1p2 7 8739 8733 148803 0.5 3 128 0.2 90.5| da1p3 0 0 0 0 0.0 0 0 0.0 0.0| da2p1 0 0 0 0 0.0 0 0 0.0 0.0| da2p2 7 8935 8929 149328 0.5 3 128 0.2 90.8| da2p3 0 0 0 0 0.0 0 0 0.0 0.0| da3p1 0 0 0 0 0.0 0 0 0.0 0.0| da3p2 6 8649 8643 148753 0.5 3 128 0.2 90.7| da3p3 0 0 0 0 0.0 0 0 0.0 0.0| gptid/072fa016-dbf3-11e1-9417-0025908065d8 0 0 0 0 0.0 0 0 0.0 0.0| gpt/swap0 0 0 0 0 0.0 0 0 0.0 0.0| md0 0 0 0 0 0.0 0 0 0.0 0.0| gptid/0739a7b0-dbf3-11e1-9417-0025908065d8 0 0 0 0 0.0 0 0 0.0 0.0| gpt/swap1 0 0 0 0 0.0 0 0 0.0 0.0| gptid/07446e04-dbf3-11e1-9417-0025908065d8 0 0 0 0 0.0 0 0 0.0 0.0| gpt/swap2 0 0 0 0 0.0 0 0 0.0 0.0| gptid/074ee993-dbf3-11e1-9417-0025908065d8 0 0 0 0 0.0 0 0 0.0 0.0| gpt/swap3 0 0 0 0 0.0 0 0 0.0 0.0| mirror/swap0-1 0 0 0 0 0.0 0 0 0.0 0.0| mirror/swap2-3 >> Opteron 6274). There is high load on geom as well... but geom does go >> over 100% CPU, so I suppose it scales. > Actually, GEOM itself is a bottleneck with SSDs and not really scalable > in its current version. If you hit H, you could see that e.g. the g_down > thread is sometimes pinned to 100% CPU or something similar. If all GEOM > threads in your case are individually below 100%, then you didn't hit > the GEOM bottleneck yet. > Ok, but apparently GEOM is not the bottleneck in my case. Daniel