From owner-freebsd-fs@FreeBSD.ORG Sat Jan 26 05:50:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6A753C24 for ; Sat, 26 Jan 2013 05:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 57F439D for ; Sat, 26 Jan 2013 05:50:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0Q5o1sH079535 for ; Sat, 26 Jan 2013 05:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0Q5o1YW079534; Sat, 26 Jan 2013 05:50:01 GMT (envelope-from gnats) Date: Sat, 26 Jan 2013 05:50:01 GMT Message-Id: <201301260550.r0Q5o1YW079534@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Harry Coin Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Harry Coin List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jan 2013 05:50:01 -0000 The following reply was made to PR kern/169480; it has been noted by GNATS. From: Harry Coin To: bug-followup@FreeBSD.org, levent.serinol@mynet.com, jdc@koitsu.org Cc: Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O Date: Fri, 25 Jan 2013 23:49:23 -0600 I appreciate your effort very much. One change from last time was I set failmode=continue from 'wait'. There were no errors in zpool or other logs. re 1,2: switching to bs=64k and /dev/zero as the input source, the dd completes and the ls -l during the process causes no problems. Transfer speed reported by dd of 183qMBytes/sec. re 3: the gstat -I500ms shows each of the raid partitions hanging in at about top speed of 50-ish MBps during the above dd. All disks perform within a couple % of one another. re 4: we'll stick with gstat. re 5: I see we agree about that. Nothing happened during the scrub other than the scrub. 14Mbps scrub speed is 1/6th of the minimum I'd expect. re 6: Tuning Is Evil. On the other hand, Crashing is Eviler. So I tried Evil Tuning, with poor results as noted upstream. So, perhaps tuning is indeed not only evil but a black hole for time as well. Switching back to the 512 byte writes, I notice on gstat the writes are zero for 4 secs or so, then a burst of activity (all partitions within a few % of one another), all quiet, repeat. Then I throw in the ls -l... and... it all works with no problems. Trying the dd with the big blocksize and /dev/urandom.... and: gstat looks bursty as in the 512 byte writes. ls -l causes no changes, works. Trying the original /dev/urandom and bs=512 and ... the gstat pattern does the usual bursty thing, ls -l works well. Thinking it's about the gstat... Doing top -S -s 1 .. nothing. nas4free:~# vmstat -i interrupt total rate irq1: atkbd0 168 0 irq18: uhci2 1136 0 irq20: hpet0 620136 328 irq21: uhci0 ehci0 7764 4 irq256: hdac0 37 0 irq257: ahci0 1090 0 irq258: em0 5625 2 irq259: ahci1 849746 450 Total 1485702 787 So weird. It's not crashing. So, I tried to move a video from one dataset to another using mv. Within moments all the writes gstat shows have stopped, while there's a few K / s reads. And there it sits. Here's top last pid: 3932; load averages: 0.02, 0.04, 0.13 up 0+00:24:46 05:03:53 249 processes: 3 running, 229 sleeping, 17 waiting CPU: 0.0% user, 0.0% nice, 1.2% system, 0.0% interrupt, 98.8% idle Mem: 14M Active, 19M Inact, 953M Wired, 8256K Buf, 2246M Free Swap: PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 32K CPU1 1 22:02 100.00% idle{idle: cpu1} 11 root 155 ki31 0K 32K RUN 0 22:06 98.49% idle{idle: cpu0} 13 root -8 - 0K 48K - 1 0:06 0.10% geom{g_up} 5 root -8 - 0K 80K zio->i 0 0:05 0.10% zfskern{txg_thread_enter} 0 root -16 0 0K 2624K sched 1 2:30 0.00% kernel{swapper} 0 root -16 0 0K 2624K - 0 0:30 0.00% kernel{zio_read_intr_1} 0 root -16 0 0K 2624K - 0 0:30 0.00% kernel{zio_read_intr_0} 0 root -16 0 0K 2624K - 1 0:10 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 2624K - 1 0:10 0.00% kernel{zio_write_issue_} 12 root -88 - 0K 272K WAIT 1 0:08 0.00% intr{irq259: ahci1} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_2} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_3} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_6} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_1} 0 root -16 0 0K 2624K - 0 0:07 0.00% kernel{zio_write_intr_0} 0 root -16 0 0K 2624K - 0 0:07 0.00% kernel{zio_write_intr_4} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_7} 0 root -16 0 0K 2624K - 1 0:07 0.00% kernel{zio_write_intr_5} 13 root -8 - 0K 48K - 1 0:07 0.00% geom{g_down} 3919 root 20 0 9916K 1636K tx->tx 1 0:07 0.00% mv 20 root 16 - 0K 16K syncer 0 0:03 0.00% syncer 12 root -60 - 0K 272K WAIT 0 0:01 0.00% intr{swi4: clock} 12 root -52 - 0K 272K WAIT 1 0:01 0.00% intr{swi6: Giant task} 14 root -16 - 0K 16K - 0 0:01 0.00% yarrow 5 root -8 - 0K 80K arc_re 0 0:01 0.00% zfskern{arc_reclaim_thre} 12 root -84 - 0K 272K WAIT 1 0:01 0.00% intr{irq1: atkbd0} 0 root -16 0 0K 2624K - 1 0:00 0.00% kernel{zio_write_issue_} 0 root -8 0 0K 2624K - 0 0:00 0.00% kernel{zil_clean} 0 root -16 0 0K 2624K - 1 0:00 0.00% kernel{zio_write_issue_} Set failmode back to wait. Rebooted from the livecd and now repeating the tests: And they work properly. So, with gstat running I once again try to move a video file from one dataset to another. I watch a bunch of reads happen, the a burst of writes, and that goes on for a couple of minutes. Then the writes stop and the reads continue at 3Kbps according to gstat. The total memory still has 2.5G free. There is no writing to the zpool. I notice that one of the video files has moved, and the writing stopped when it was trying to open the next one I'm guessing. I can sucessfully ^C out of the mv, and notice a constant low speed read going on, 80-100KBps. I try restarting the move, no change to the read pattern, no writes. So, no memory explosion, ^C works, but I can't write, and gstat reports a steady read low pace read for no obvious reason, just as it was when the mv process was active. So, a huger puzzle.