From owner-freebsd-hackers@FreeBSD.ORG Tue Dec 15 20:11:10 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E01DB1065693; Tue, 15 Dec 2009 20:11:09 +0000 (UTC) (envelope-from google@vink.pl) Received: from mail-fx0-f163.google.com (mail-fx0-f163.google.com [209.85.220.163]) by mx1.freebsd.org (Postfix) with ESMTP id 492298FC15; Tue, 15 Dec 2009 20:11:08 +0000 (UTC) Received: by fxm3 with SMTP id 3so87513fxm.3 for ; Tue, 15 Dec 2009 12:11:08 -0800 (PST) Received: by 10.223.98.19 with SMTP id o19mr370619fan.82.1260907025765; Tue, 15 Dec 2009 11:57:05 -0800 (PST) Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com [209.85.220.227]) by mx.google.com with ESMTPS id 13sm72027fxm.13.2009.12.15.11.57.03 (version=SSLv3 cipher=RC4-MD5); Tue, 15 Dec 2009 11:57:05 -0800 (PST) Received: by fxm27 with SMTP id 27so264939fxm.3 for ; Tue, 15 Dec 2009 11:57:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.5.8 with SMTP id 8mr8272925fat.48.1260907023184; Tue, 15 Dec 2009 11:57:03 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Dec 2009 20:57:03 +0100 Message-ID: <2ae8edf30912151157t53267adek85af80b1e31fb4b@mail.gmail.com> From: Wiktor Niesiobedzki To: Ivan Voras Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Tue, 15 Dec 2009 20:59:04 +0000 Cc: freebsd-fs , freebsd-hackers Subject: Re: ZFS, compression, system load, pauses (livelocks?) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2009 20:11:10 -0000 2009/12/15 Ivan Voras : > The context of this post is file servers running FreeBSD 8 and ZFS with > compressed file systems on low-end hardware, or actually high-end hardwar= e > on VMWare ESX 3.5 and 4, which kind of makes it low-end as far as storage= is > concerned. The servers are standby backup mirrors of production servers - > thus many writes, few reads. > > Running this setup I notice two things: > > 1) load averages get very high, though the only usage these systems get i= s > file system usage: > 2) long pauses, in what looks like vfs.zfs.txg.timeout second intervals, > which seemengly block everything, or at least the entire userland. These > pauses are sometimes so long that file transfers fail, which must be > avoided. > > Looking at the list of processes it looks like a large number of kernel a= nd > userland processes are woken up at once. From the kernel side there are > regularily all g_* threads, but also unrelated threads like bufdaemon, > softdepflush, etc. and from the userland - top, syslog, cron, etc. It is > like ZFS livelocks everything else. > > Any ideas on the "pauses" issue? > Hi, I've a bit striped your post. It's kind of "me too" message (more details here: http://lists.freebsd.org/pipermail/freebsd-geom/2009-December= /003810.html). What I've figured out so far is, that lowering the kernel thread priority (as pjd@ suggested) gives quite promising results (no livelocks at all). Though my bottleneck were caused by GELI thread. The pattern there is like this: sched_prio(curthread, PRIBIO); [...] msleep(sc, &sc->sc_queue_mtx, PDROP | PRIBIO, "geli:w", 0); I'm running right now with changed wersion - where I have: msleep(sc, &sc->sc_queue_mtx, PDROP, "geli:w", 0); So I don't change initial thread priority. It doesn't give such result as using PUSER prio, but I fear, that using PUSER may cause livelocks in some other cases. This helps my case (geli encryption and periodic locks during ZFS transaction commits) with some performance penalty, but I have similar problems in other cases. When I run: # zfs scrub tank Then "kernel" system process/thread consumes most of CPU (>95% in system) and load rises to 20+ for the period of scrubbing. During scrub my top screen looks like: last pid: 87570; load averages: 8.26, 2.84, 1.68 199 processes: 3 running, 179 sleeping, 17 waiting CPU: 2.4% user, 0.0% nice, 97.0% system, 0.6% interrupt, 0.0% idle Mem: 66M Active, 6256K Inact, 1027M Wired, 104K Cache, 240K Buf, 839M Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 0 root 69 -8 0 0K 544K - 104:40 67.19% kernel 24 root 1 -8 - 0K 8K geli:w 9:56 5.66% g_eli[0] ad= 6 26 root 1 -8 - 0K 8K geli:w 9:50 5.47% g_eli[0] ad= 10 25 root 1 -8 - 0K 8K geli:w 9:53 5.37% g_eli[0] ad= 8 8 root 12 -8 - 0K 104K vgeom: 61:35 3.27% zfskern 3 root 1 -8 - 0K 8K - 3:22 0.68% g_up 11 root 17 -60 - 0K 136K WAIT 31:21 0.29% intr Intresting thing, is that I have 17 processes waiting for CPU reported (though only intr is the only process that is reported as in WAIT state - at least for top40 processes). I just wonder, whether this might be a scheduler related issue. I'm thinking about giving a SCHED_4BSD a try. Cheers, Wiktor Niesiob=C4=99dzki