Date: Tue, 27 Apr 2010 21:22:06 +0200 From: Anselm Strauss <amsibamsi@gmail.com> To: Dan Naumov <dan.naumov@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: ZFS scheduling Message-ID: <4BD7395E.2030300@gmail.com> In-Reply-To: <y2qcf9b1ee01004251503jb4791869i9a812fade17a0558@mail.gmail.com> References: <y2qcf9b1ee01004251503jb4791869i9a812fade17a0558@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 04/26/10 00:03, Dan Naumov wrote: >> Hi, >> >> I noticed that my system gets very slow when I'm doing some simple but >> intense ZFS operations. For example, I move about 20 Gigabytes of data >>from one data set to another on the same pool, which is a RAIDZ of 3 500 >> GB SATA disks. The operations itself runs fast, but meanwhile other >> things get really slow. E.g. opening a application takes 5 times as long >> as before. Also simple operations like 'ls' stall for some seconds which >> they did never before. It already changed a lot when I switched from >> RAIDZ to a mirror with only 2 disks. Memory and CPU don't seem to be the >> issue, I have a quad-core CPU and 8 GB RAM. >> >> I can't get rid of the idea that this has something to do with >> scheduling. The system is absolutely stable and fast. Somehow small I/O >> operations on ZFS seem to have it very difficult to make it through when >> other bigger ones are running. Maybe this has something to do with tuning? >> >> I know my system information is very incomplete, and there could be a >> lot of causes. But anybody knows if this could be an issue with ZFS itself? > > Hello > > As you do mention, your system information is indeed very incomplete, > making your problem rather hard to diagnose :) > > Scheduling, in the traditional sense, is unlikely to be the cause of > your problems, but here's a few things you could look into: > > First one is obviously the pool layout, heavy-duty writing on a pool, > consisting of a single raidz vdev is slow (slower than writing to a > mirror, as you already discovered), period. such is the nature of > raidz. Additionally, your problem is magnified by the fact that your > have reads competing with writes since you are reading (I assume) from > the same pool. One approach to alleviating the problem would be to > utilize a pool consisting of 2 or more raidz vdevs in a stripe, like > this: > > pool > raidz > disc1 > disc2 > disc3 > raidz > disc4 > disc5 > disc6 > > The second potential cause of your issues is the system wrongly > guesstimating your optimal TXG commit size. ZFS works in such a > fashion, that it commits data to disk in chunks. How big chunks it > writes at a time it tries to optimize by evaluating your pool IO > bandwidth over time and available RAM. The TXG commits happen with an > interval of 5-30 seconds. The worst case scenario is such, that if the > system misguesses the optimal TXG size, then under heavy write load, > it continues to defer the commit for up to the 30 second timeout and > when it hits the caps, it frantically commits it ALL at once. This can > and most likely will completely starve your read IO on the pool for as > long as the drives choke while committing the TXG. > > If you are on 8.0-RELEASE, you could try playing with the > vfs.zfs.txg.timeout= variable in /boot/loader.conf, generally sane > values are 5-30, with 30 being the default. You could also try > adjusting vfs.zfs.vdev.max_pending= down from the default of 35 to a > lower value and see if that helps. AFAIK, 8-STABLE and -HEAD have a > systctl variable which directly allow you to manually set the > preferred TXG size and I've pretty sure I've seen some patches on the > mailing lists to add this functionality to 8.0. > > Hope this helps. > > > - Sincerely, > Dan Naumov Thanks for the explanation and hints. As I said it's now already a lot better with mirror instead of raidz, maybe I will try to adjust some sysctl parameters as you suggested. But I'm still a bit puzzled why it is possible at all that one simple operation can stall the system so much. In my naive view I just compare it to CPU scheduling. Even when I have a process that consumes the CPU 100%, when I start another small process in parallel that only needs very few CPU time there is virtually no slowdown to it. A normal fair scheduling would assign 50% of the CPU to each process so the small one still has plenty of resources and doubling the execution time of a already very short running process is barely noticeable. Of course it changes when there are lots of processes, so even a small process only gets a fraction of the CPU. But I guess this is not how I/O scheduling or ZFS works. Maybe this goes more into the topic of I/O scheduling priority of processes. Anselm
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BD7395E.2030300>