From owner-freebsd-fs@FreeBSD.ORG Fri Jan 11 13:58:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 33B7959A for ; Fri, 11 Jan 2013 13:58:14 +0000 (UTC) (envelope-from prvs=17232837bf=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CD32FC2C for ; Fri, 11 Jan 2013 13:58:13 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001671706.msg for ; Fri, 11 Jan 2013 13:58:11 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 11 Jan 2013 13:58:11 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=17232837bf=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Nicolas Rachinsky" , "freebsd-fs" References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111111147.GA34160@mid.pc5.i.0x5.de> Subject: Re: slowdown of zfs (tx->tx) Date: Fri, 11 Jan 2013 13:58:26 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2013 13:58:14 -0000 ----- Original Message ----- From: "Nicolas Rachinsky" To: "freebsd-fs" Sent: Friday, January 11, 2013 11:11 AM Subject: Re: slowdown of zfs (tx->tx) >* Nicolas Rachinsky [2013-01-10 20:39 +0100]: >> after replacing one of the controllers, all problems seem to have >> disappeared. Thank you very much for your advice! > > Now the problem is back. > > After changing the controller, there were no more timeouts logged. > > No UDMA_CRC_Error_Count changed. > > While the problem exists, top almost all the time shows: > > last pid: 46322; load averages: 0.90, 1.03, 0.98 up 0+11:07:55 08:28:41 > 39 processes: 1 running, 38 sleeping > CPU: 0.0% user, 0.0% nice, 50.1% system, 0.0% interrupt, 49.9% idle > Mem: 10M Active, 33M Inact, 7612M Wired, 23M Cache, 827M Buf, 234M Free > Swap: 16G Total, 13M Used, 16G Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 926 root 1 44 0 28020K 2404K select 1 2:23 0.29% snmpd > 41642 user1 1 44 0 5828K 204K tx->tx 0 20:53 0.00% rsync > 41641 user1 1 44 0 29952K 3976K select 1 13:39 0.00% ssh > 41640 user1 1 44 0 5828K 140K select 1 0:20 0.00% rsync > 90399 user2 1 44 0 14020K 872K tx->tx 0 0:16 0.00% rsync > 956 root 1 44 0 11808K 708K select 1 0:02 0.00% ntpd > 1051 root 1 44 0 8356K 640K kqread 0 0:00 0.00% master > 25713 root 1 44 0 38108K 3596K select 1 0:00 0.00% sshd > 875 root 1 44 0 6920K 572K select 1 0:00 0.00% syslogd > 1066 root 1 44 0 7976K 564K nanslp 1 0:00 0.00% cron > 1058 postfix 1 44 0 8356K 792K kqread 1 0:00 0.00% qmgr > 705 root 1 44 0 5248K 120K select 1 0:00 0.00% devd > 25715 root 1 44 0 10248K 2828K pause 1 0:00 0.00% csh > 1062 root 1 44 0 26176K 952K select 1 0:00 0.00% sshd > 90401 user2 1 44 0 14020K 768K select 1 0:00 0.00% rsync > 90400 user2 1 44 0 23808K 892K select 1 0:00 0.00% ssh > 90372 user2 1 59 0 8344K 124K wait 0 0:00 0.00% sh > 41619 user1 1 76 0 8344K 40K wait 1 0:00 0.00% sh > 46322 root 1 44 0 9372K 1800K CPU1 1 0:00 0.00% top > 89384 root 1 44 0 8344K 712K wait 0 0:00 0.00% sh > 37854 root 1 45 0 8360K 472K piperd 1 0:00 0.00% sendmail > 45382 postfix 1 44 0 8360K 1324K kqread 1 0:00 0.00% pickup > 41608 root 1 76 0 8344K 440K wait 0 0:00 0.00% sh > 25768 root 1 52 0 13440K 1716K nanslp 0 0:00 0.00% smartd > 33599 root 1 50 0 8344K 452K wait 1 0:00 0.00% sh > 33597 root 1 52 0 8344K 440K wait 1 0:00 0.00% sh > 37855 root 1 44 0 8360K 468K piperd 0 0:00 0.00% postdrop > 33591 root 1 44 0 7976K 524K piperd 1 0:00 0.00% cron > 33595 root 1 46 0 8344K 436K wait 1 0:00 0.00% sh > 33594 root 1 44 0 8344K 436K wait 1 0:00 0.00% sh > 33592 root 1 45 0 7976K 524K piperd 1 0:00 0.00% cron > 1106 root 1 76 0 6916K 352K ttyin 1 0:00 0.00% getty > 1111 root 1 76 0 6916K 352K ttyin 1 0:00 0.00% getty > 1107 root 1 76 0 6916K 352K ttyin 0 0:00 0.00% getty > 1108 root 1 76 0 6916K 352K ttyin 0 0:00 0.00% getty > 1112 root 1 76 0 6916K 352K ttyin 0 0:00 0.00% getty > 1109 root 1 76 0 6916K 352K ttyin 1 0:00 0.00% getty > 1113 root 1 76 0 6916K 352K ttyin 0 0:00 0.00% getty > 1110 root 1 76 0 6916K 352K ttyin 0 0:00 0.00% getty > > The result of > sh -c "while :;do gstat -I 5s -b ;done" > gstat.txt & iostat -d -x -w 5 > iostat.txt & zpool iostat -v 5 > zpool.txt & > is available via > http://flummi.dauerreden.de/20130111/zpool.txt > http://flummi.dauerreden.de/20130111/gstat.txt > http://flummi.dauerreden.de/20130111/iostat.txt > TBH looks like your just saturating your disks with the number of IOP's your doing. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.