Date: Fri, 25 Sep 2015 21:35:22 +0000 (UTC) From: Pallav Bose <pallav_bose@yahoo.com> To: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Interrupt storm and poor disk performance | mfi(4) driver | FreeBSD 8 | Dell PERC H730 Message-ID: <472489221.917644.1443216922050.JavaMail.yahoo@mail.yahoo.com>
next in thread | raw e-mail | index | archive | help
Hello, I have a Dell PowerEdge R430 server with a PERC H730 RAID controller. I'm t= rying to get FreeBSD 8 to install and run on this server. At this time, I h= ave a patched version of the mfi(4) driver which attaches to the controller= . I'm aware of mrsas(4), but since I have scripts that use mfiutil(8), I'd = like to continue using the mfi(4) driver. A simple dd test shows SSD performance to be very poor: # dd if=3D/dev/mfid0 of=3D/dev/null bs=3D1m count=3D10241024+0 records in10= 24+0 records out1073741824 bytes transferred in 27.978784 secs (38377001 by= tes/sec) top -PHS shows a lot of CPU time being used by the swi6 s/w interrupt handl= er: last pid: 81270; =C2=A0load averages: =C2=A00.01, =C2=A00.05, =C2=A00.05 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 up 0+05:34:20 =C2=A015:45:51302 processes: 7 runni= ng, 278 sleeping, 17 waitingCPU 0: =C2=A00.0% user, =C2=A00.0% nice, =C2=A0= 0.0% system, 52.6% interrupt, 47.4% idleCPU 1: =C2=A00.0% user, =C2=A00.0% = nice, =C2=A00.0% system, =C2=A00.0% interrupt, =C2=A0100% idleCPU 2: =C2=A0= 0.0% user, =C2=A00.0% nice, =C2=A00.0% system, =C2=A00.0% interrupt, =C2=A0= 100% idleCPU 3: =C2=A00.0% user, =C2=A00.0% nice, =C2=A00.0% system, =C2=A0= 0.0% interrupt, =C2=A0100% idleCPU 4: =C2=A00.0% user, =C2=A00.0% nice, =C2= =A00.0% system, =C2=A00.0% interrupt, =C2=A0100% idleCPU 5: =C2=A00.0% user= , =C2=A00.0% nice, =C2=A00.0% system, =C2=A00.7% interrupt, 99.3% idleMem: = 48M Active, 4044K Inact, 997M Wired, 7144K Cache, 1248K Buf, 30G FreeSwap: =C2=A0 PID USERNAME =C2=A0 =C2=A0PRI NICE =C2=A0 SIZE =C2=A0 =C2=A0RES STAT= E =C2=A0 C =C2=A0 TIME =C2=A0 WCPU COMMAND=C2=A0 =C2=A010 root =C2=A0 =C2= =A0 =C2=A0 =C2=A0171 ki31 =C2=A0 =C2=A0 0K =C2=A0 192K CPU5 =C2=A0 =C2=A05 = 319:51 100.00% {idle: cpu5}=C2=A0 =C2=A010 root =C2=A0 =C2=A0 =C2=A0 =C2=A0= 171 ki31 =C2=A0 =C2=A0 0K =C2=A0 192K CPU2 =C2=A0 =C2=A02 293:32 94.58% {id= le: cpu2}=C2=A0 =C2=A010 root =C2=A0 =C2=A0 =C2=A0 =C2=A0171 ki31 =C2=A0 = =C2=A0 0K =C2=A0 192K CPU3 =C2=A0 =C2=A03 298:46 93.65% {idle: cpu3}=C2=A0 = =C2=A010 root =C2=A0 =C2=A0 =C2=A0 =C2=A0171 ki31 =C2=A0 =C2=A0 0K =C2=A0 1= 92K CPU4 =C2=A0 =C2=A04 278:55 92.58% {idle: cpu4}=C2=A0 =C2=A010 root =C2= =A0 =C2=A0 =C2=A0 =C2=A0171 ki31 =C2=A0 =C2=A0 0K =C2=A0 192K CPU1 =C2=A0 = =C2=A01 289:36 92.19% {idle: cpu1}=C2=A0 =C2=A010 root =C2=A0 =C2=A0 =C2=A0= =C2=A0171 ki31 =C2=A0 =C2=A0 0K =C2=A0 192K RUN =C2=A0 =C2=A0 0 293:17 85.= 99% {idle: cpu0}=C2=A0 =C2=A011 root =C2=A0 =C2=A0 =C2=A0 =C2=A0-24 =C2=A0 = =C2=A0- =C2=A0 =C2=A0 0K =C2=A0 544K WAIT =C2=A0 =C2=A02 173:40 47.27% {swi= 6: task queue}=C2=A0 =C2=A011 root =C2=A0 =C2=A0 =C2=A0 =C2=A0-64 =C2=A0 = =C2=A0- =C2=A0 =C2=A0 0K =C2=A0 544K WAIT =C2=A0 =C2=A05 =C2=A011:50 =C2=A0= 0.00% {irq256: mfi0}=C2=A0 =C2=A011 root =C2=A0 =C2=A0 =C2=A0 =C2=A0-32 =C2= =A0 =C2=A0- =C2=A0 =C2=A0 0K =C2=A0 544K WAIT =C2=A0 =C2=A01 =C2=A0 6:26 = =C2=A00.00% {swi4: clock} The interrupt rate in case of irq256:mfi0 is very high, in spite of there b= eing no disk activity.=20 # vmstat -iinterrupt =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0total =C2=A0 =C2=A0 =C2=A0 rateirq4: = uart0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0257 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00irq9: acpi0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00irq18: ehci0 ehci1= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 71739 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A03cpu0: timer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 40226355 =C2=A0 =C2=A0 =C2=A0 1998irq256: m= fi0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3= 642472 =C2=A0 =C2=A0 =C2=A0 =C2=A0180irq257: bge0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 34922 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A01cpu3: timer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 40229128 =C2=A0 =C2=A0 =C2=A0 1998cpu5: timer = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 40228= 959 =C2=A0 =C2=A0 =C2=A0 1998cpu4: timer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 40229014 =C2=A0 =C2=A0 =C2=A0 1998cpu1:= timer =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 40228629 =C2=A0 =C2=A0 =C2=A0 1998cpu2: timer =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 40223967 =C2=A0 =C2=A0 =C2=A0= 1998Total =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0245115443 =C2=A0 =C2=A0 =C2=A012175 Procstat output: # procstat -kk 11 =C2=A0 =C2=A0 =C2=A0 # PID 11 taken from output of top=C2= =A0 PID =C2=A0 =C2=A0TID COMM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TDN= AME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 KSTACK=C2=A0 =C2=A011 100008 intr = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi3: vm=C2=A0 =C2=A011 100009 in= tr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi1: netisr 0 =C2=A0 mi_switc= h+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A0= 11 100010 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi4: clock =C2=A0= =C2=A0 =C2=A0mi_switch+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_tramp= oline+0xe=C2=A0 =C2=A011 100011 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 swi4: clock =C2=A0 =C2=A0 =C2=A0mi_switch+0x205 ithread_loop+0x1bf fork= _exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 100012 intr =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi4: clock =C2=A0 =C2=A0 =C2=A0mi_switch+0x205= ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 1000= 13 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi4: clock =C2=A0 =C2=A0= =C2=A0mi_switch+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0= xe=C2=A0 =C2=A011 100014 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi= 4: clock =C2=A0 =C2=A0 =C2=A0mi_switch+0x205 ithread_loop+0x1bf fork_exit+0= x112 fork_trampoline+0xe=C2=A0 =C2=A011 100015 intr =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 swi4: clock =C2=A0 =C2=A0 =C2=A0mi_switch+0x205 ithrea= d_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 100021 intr= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi5: +=C2=A0 =C2=A011 100023 in= tr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi6: Giant task mi_switch+0x2= 05 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 10= 0024 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi6: task queue mi_swi= tch+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2= =A011 100027 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi2: cambio = =C2=A0 =C2=A0 mi_switch+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_tramp= oline+0xe=C2=A0 =C2=A011 100032 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 irq9: acpi0 =C2=A0 =C2=A0 =C2=A0mi_switch+0x205 ithread_loop+0x1bf fork= _exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 100033 intr =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 irq256: mfi0 =C2=A0 =C2=A0 mi_switch+0x205 ithr= ead_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 100034 in= tr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 irq18: ehci0 ehc mi_switch+0x2= 05 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0 =C2=A011 10= 0039 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swi0: uart uart =C2=A0m= i_switch+0x205 ithread_loop+0x1bf fork_exit+0x112 fork_trampoline+0xe=C2=A0= =C2=A011 100040 intr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 irq1: atkbd= 0 # kldload dtraceall# dtrace -n 'profile:::profile-276hz { @pc[stack()]=3Dco= unt(); }'dtrace: description 'profile:::profile-276hz ' matched 1 probe The above dtrace script is supposed to=C2=A0print all the stack traces seen= during the sampling period. The following stack trace occurs a large number of times: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`DELAY+0x64 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`bus_dmamap_load+0x3= a9=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`mfi_mapcmd+0x4f= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`mfi_startio+0x65=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`mfi_wait_command+0x9c= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`mfi_tbolt_sync_map_= info+0xb4=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`mfi_handle= _map_sync+0x39=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`taskq= ueue_run+0x91=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel`intr_e= vent_execute_handlers+0x66=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = kernel`ithread_loop+0x8e=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ke= rnel`fork_exit+0x112=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel= `0xffffffff8050624e=C2=A0Can someone help me debug this problem? It's likel= y that the mfi(4) driver I currently have access to doesn't have all the ne= cessary patches. Thank you. Regards, Pallav From owner-freebsd-questions@freebsd.org Fri Sep 25 22:32:28 2015 Return-Path: <owner-freebsd-questions@freebsd.org> Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96A0CA0898D for <freebsd-questions@mailman.ysv.freebsd.org>; Fri, 25 Sep 2015 22:32:28 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7173018E8 for <freebsd-questions@freebsd.org>; Fri, 25 Sep 2015 22:32:28 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by douhisi.pair.com (Postfix) with ESMTPSA id 3D6AF3F727 for <freebsd-questions@freebsd.org>; Fri, 25 Sep 2015 18:32:21 -0400 (EDT) Message-ID: <5605CB74.2020908@sneakertech.com> Date: Fri, 25 Sep 2015 18:32:20 -0400 From: Quartz <quartz@sneakertech.com> MIME-Version: 1.0 To: freebsd-questions@freebsd.org Subject: Re: ZFS ready drives WAS: zfs performance degradation References: <56019211.2050307@dim.lv> <37A37E9D-9D65-4553-BBA2-C5B032163499@kraus-haus.org> <56038054.5060906@dim.lv> <782C9CEF-BE07-4E05-83ED-133B7DA96780@kraus-haus.org> <56040150.90403@dim.lv> <60BF2FC3-0342-46C9-A718-52492303522F@kraus-haus.org> <560412B2.9070905@dim.lv> <8D1FF55C-7068-4AB6-8C0E-B4E64C1BB5FA@kraus-haus.org> <56042209.8040903@dim.lv> <2008181C-F0B5-4581-9D15-11911A1DE41B@kraus-haus.org> <CAFYkXjkdUrcUUdVQW4qgSuEmtifD=mvbvf4k0vq5t9R6dtR1pQ@mail.gmail.com> <6498A090-A2A2-4580-A148-2BCBF68BF2BF@kraus-haus.org> <5605481D.10902@physics.umn.edu> <106217D9-F3DB-4DB5-822E-098041B5BC6F@kraus-haus.org> In-Reply-To: <106217D9-F3DB-4DB5-822E-098041B5BC6F@kraus-haus.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions <freebsd-questions.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/> List-Post: <mailto:freebsd-questions@freebsd.org> List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, <mailto:freebsd-questions-request@freebsd.org?subject=subscribe> X-List-Received-Date: Fri, 25 Sep 2015 22:32:28 -0000 > once you take > the 5 year warranty in account. This assumes that the company in question will honorably honor the warranty. It's been our experience that they usually don't. I can't count you the number of times a drive manufacturer has pulled a fast one on a warranty replacement. WD is especially bad about this, they send back a cheaper drive than the original, a bottom-bin refurbished drive with twice the runtime/wear as the original that dies after a month (which mysteriously doesn't qualify for a warranty itself), or randomly insists we have to do a pay-and-reimburse replacement method then loses the records and never reimburses us. I understand that this is all anecdotal, but personally I don't find warranties worth the paper they're written on and never assume getting a functional replacement anymore. Long term it's cheaper to just buy a new drive outright than to waste employee time arguing over the phone for days. I buy exclusively based on ratings and reliability reports now.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472489221.917644.1443216922050.JavaMail.yahoo>