From owner-freebsd-stable@FreeBSD.ORG Sun Jul 11 22:28:20 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5946C106566C for ; Sun, 11 Jul 2010 22:28:20 +0000 (UTC) (envelope-from nukunuku@sbcglobal.net) Received: from smtp128.sbc.mail.sp1.yahoo.com (smtp128.sbc.mail.sp1.yahoo.com [69.147.65.187]) by mx1.freebsd.org (Postfix) with SMTP id 361F18FC19 for ; Sun, 11 Jul 2010 22:28:19 +0000 (UTC) Received: (qmail 55941 invoked from network); 11 Jul 2010 22:28:19 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Received:Date:To:Cc:Subject:Message-ID:References:Mime-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:From; b=5IJ5Ftsdw8BV8fEHxqe6EtNGqnlA1GxdYMKgU+dUTAY5zxn4oKbEersis234c7O0dNxKi9VRI4ASiYS+YNBWzG2JozYuU0yNFvrWCDEjrNlMNvVhVhHD7PHXx8nQs3KS+RWT0y0rgrTlRznZbjMBzm/Su3Yoa3gkdIldHswYMPE= ; Received: from adsl-75-55-221-181.dsl.pltn13.sbcglobal.net (nukunuku@75.55.221.181 with login) by smtp128.sbc.mail.sp1.yahoo.com with SMTP; 11 Jul 2010 15:28:19 -0700 PDT X-Yahoo-SMTP: xGthgDWswBCB1BOngQs2nXHlo9uxxXfMF1J0xzNHmECJ9rs- X-YMail-OSG: DKxo5D4VM1kY5Xl_yB78ML2LeZD4bKTTXVdV4wh46FoYOcARp7f4btmweyaToqxAj9DShhvULSQAwTN9xEEvGdJcJXn7Jpg.YYaMbnWNkocUokX5TbznFLve3p11wltLHuoPOSTzaCFESbhMbVrRNbhBA2Z9_E3AJtb5IWuJMRb52fbOJYNCGwyeAhgeqyyqvtBFGIsD27iapqGlQiWm0_Uz6urLWVUoED6uEcWmQd9HThWy2_JOtjPryHn7buHGr5V8oiHjxHGMA2aZ2YE4akp.0YkMaQbVCUx4 X-Yahoo-Newman-Property: ymail-3 Received: by catsspat.iriomote (Postfix, from userid 1001) id 9C7012F; Sun, 11 Jul 2010 15:28:18 -0700 (PDT) Date: Sun, 11 Jul 2010 15:28:18 -0700 To: Jeremy Chadwick Message-ID: <20100711222818.GA37207@catsspat.iriomote> References: <20100711182511.GA21063@soda.CSUA.Berkeley.EDU> <20100711204757.GA81084@icarus.home.lan> <20100711211213.GA36377@catsspat.iriomote> <20100711214546.GA81873@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100711214546.GA81873@icarus.home.lan> User-Agent: Mutt/1.4.2.2i From: nukunuku@sbcglobal.net (Richard Lee) Cc: freebsd-stable@freebsd.org, Richard Lee Subject: Re: Serious zfs slowdown when mixed with another file system (ufs/msdosfs/etc.). X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jul 2010 22:28:20 -0000 On Sun, Jul 11, 2010 at 02:45:46PM -0700, Jeremy Chadwick wrote: > On Sun, Jul 11, 2010 at 02:12:13PM -0700, Richard Lee wrote: > > On Sun, Jul 11, 2010 at 01:47:57PM -0700, Jeremy Chadwick wrote: > > > On Sun, Jul 11, 2010 at 11:25:12AM -0700, Richard Lee wrote: > > > > This is on clean FreeBSD 8.1 RC2, amd64, with 4GB memory. > > > > > > > > The closest I found by Googling was this: > > > > http://forums.freebsd.org/showthread.php?t=9935 > > > > > > > > And it talks about all kinds of little tweaks, but in the end, the > > > > only thing that actually works is the stupid 1-line perl code that > > > > forces the kernal to free the memory allocated to (non-zfs) disk > > > > cache, which is the "Inact"ive memory in "top." > > > > > > > > I have a 4-disk raidz pool, but that's unlikely to matter. > > > > > > > > Try to copy large files from non-zfs disk to zfs disk. FreeBSD will > > > > cache the data read from non-zfs disk in memory, and free memory will > > > > go down. This is as expected, obviously. > > > > > > > > Once there's very little free memory, one would expect whatever is > > > > more important to kick out the cached data (Inact) and make memory > > > > available. > > > > > > > > But when almost all of the memory is taken by disk cache (of non-zfs > > > > file system), ZFS disks start threshing like mad and the write > > > > throughput goes down in 1-digit MB/second. > > > > > > > > I believe it should be extremely easy to duplicate. Just plug in a > > > > big USB drive formatted in UFS (msdosfs will likely do the same), and > > > > copy large files from that USB drive to zfs pool. > > > > > > > > Right after clean boot, gstat will show something like 20+MB/s > > > > movement from USB device (da*), and occasional bursts of activity on > > > > zpool devices at very high rate. Once free memory is exhausted, zpool > > > > devices will change to constant low-speed activity, with disks > > > > threshing about constantly. > > > > > > > > I tried enabling/disabling prefetch, messing with vnode counts, > > > > zfs.vdev.min/max_pending, etc. The only thing that works is that > > > > stupid perl 1-liner (perl -e '$x="x"x1500000000'), which returns the > > > > activity to that seen right after a clean boot. It doesn't last very > > > > long, though, as the disk cache again consumes all the memory. > > > > > > > > Copying files between zfs devices doesn't seem to affect anything. > > > > > > > > I understand zfs subsystem has its own memory/cache management. > > > > Can a zfs expert please comment on this? > > > > > > > > And is there a way to force the kernel to not cache non-zfs disk data? > > > > > > I believe you may be describing two separate issues: > > > > > > 1) ZFS using a lot of memory but not freeing it as you expect > > > 2) Lack of disk I/O scheduler > > > > > > For (1), try this in /boot/loader.conf and reboot: > > > > > > # Disable UMA (uma(9)) for ZFS; amd64 was moved to exclusively use UMA > > > # on 2010/05/24. > > > # http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html > > > vfs.zfs.zio.use_uma="0" > > > > > > For (2), may try gsched_rr: > > > > > > http://svnweb.freebsd.org/viewvc/base/releng/8.1/sys/geom/sched/README?view=markup > > > > > > -- > > > | Jeremy Chadwick jdc@parodius.com | > > > | Parodius Networking http://www.parodius.com/ | > > > | UNIX Systems Administrator Mountain View, CA, USA | > > > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > > vfs.zfs.zio.use_uma is already 0. It looks to be the default, as I never > > touched it. > > Okay, just checking, because the default did change at one point, as the > link in my /boot/loader.conf denotes. Here's further confirmation (same > thread), the first confirming on i386, the second confirming on amd64: > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057168.html > http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057239.html > > > And in my case, Wired memory is stable at around 1GB. It's > > the Inact memory that takes off, but only if reading from non-zfs file > > system. Without other file systems, I can keep moving files around and > > see no adverse slowdown. I can also scp huge files from another system > > into the zfs machine, and it doesn't affect memory usage (as reported by > > top), nor does it affect performance. > > Let me get this straight: > > The system has ZFS enabled (kernel module loaded), with a 4-disk raidz1 > pool defined and used in the past (Wired being @ 1GB, due to ARC). The > same system also has UFS2 filesystems. The ZFS pool vdevs consist of > their own dedicated disks, and the UFS2 filesystems also have their own > disk (which appears to be USB-based). Yes, correct. I have: ad4 (An old 200GB SATA UFS2 main system drive) ad8, ad10, ad12, ad14 (1TB SATA drives) part of raidz1 and nothing else da0 is an external USB disk (1TB), but I don't think it's related to USB. Status looks like this: state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM uchuu ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 errors: No known data errors > When any sort of read I/O is done on the UFS2 filesystems, Inact > skyrockets, and as a result this impacts performance of ZFS. > > If this is correct: can you remove USB from the picture and confirm the > problem still happens? This is the first I've heard of the UFS caching > mechanism "spiraling out of control". To isolate away USB involvement, I did the following. Without any USB drive attached at all, I copied a large 7GB file from zfs pool to the system drive (internal ad4 UFS2). This alone caused the Inact memory to top out since it's caching whatever is to/from normal file system. Despite Inact memory usage topping out, I didn't notice any slow down in copying *from* zfs to UFS drive (ad4), but I'm not 100% sure. It certainly wasn't obvious if there were any effect. Maybe zfs reads aren't as badly affected. Now, I copied that large file from ad4 back to zpool (somewhere else from where the original file was, of course), and this *was* noticeably affected. It started out similarly (ad4 reading near its max platter speed 40-50MB/s), and zfs pool doing burst writes that are of higher bandwidth. This didn't last very long, though, possibly because memory is already fully consumed (or close to it). It switched to the ad4 read slowing down to below 20MB/s, and zfs write becoming constant and slower, too, instead of quick bursty write behavior. Note I was watching it using gstat. It wasn't as slow as USB drive -> zfs, but that may just be due to USB overhead. While this was happening, I ran that perl code to force the kernel to give up some memory, and it went back to speedy behavior, again until the UFS caching took all the memory. It's as if the kernel doesn't know to throw away Inact memory stuff based on its own internal activity (zfs activity), even though a user processing asking for memory makes it throw them out in an instant. But that's not a qualified statement, of course. Just thinking out loud. --rich