From owner-freebsd-hackers@FreeBSD.ORG  Tue Dec 15 20:11:10 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E01DB1065693;
	Tue, 15 Dec 2009 20:11:09 +0000 (UTC) (envelope-from google@vink.pl)
Received: from mail-fx0-f163.google.com (mail-fx0-f163.google.com
	[209.85.220.163])
	by mx1.freebsd.org (Postfix) with ESMTP id 492298FC15;
	Tue, 15 Dec 2009 20:11:08 +0000 (UTC)
Received: by fxm3 with SMTP id 3so87513fxm.3
	for <multiple recipients>; Tue, 15 Dec 2009 12:11:08 -0800 (PST)
Received: by 10.223.98.19 with SMTP id o19mr370619fan.82.1260907025765;
	Tue, 15 Dec 2009 11:57:05 -0800 (PST)
Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com
	[209.85.220.227])
	by mx.google.com with ESMTPS id 13sm72027fxm.13.2009.12.15.11.57.03
	(version=SSLv3 cipher=RC4-MD5); Tue, 15 Dec 2009 11:57:05 -0800 (PST)
Received: by fxm27 with SMTP id 27so264939fxm.3
	for <multiple recipients>; Tue, 15 Dec 2009 11:57:03 -0800 (PST)
MIME-Version: 1.0
Received: by 10.223.5.8 with SMTP id 8mr8272925fat.48.1260907023184; Tue, 15 
	Dec 2009 11:57:03 -0800 (PST)
In-Reply-To: <hg8015$999$1@ger.gmane.org>
References: <hg8015$999$1@ger.gmane.org>
Date: Tue, 15 Dec 2009 20:57:03 +0100
Message-ID: <2ae8edf30912151157t53267adek85af80b1e31fb4b@mail.gmail.com>
From: Wiktor Niesiobedzki <bsd@w.evip.pl>
To: Ivan Voras <ivoras@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Tue, 15 Dec 2009 20:59:04 +0000
Cc: freebsd-fs <freebsd-fs@freebsd.org>,
	freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: ZFS, compression, system load, pauses (livelocks?)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Dec 2009 20:11:10 -0000

2009/12/15 Ivan Voras <ivoras@freebsd.org>:
> The context of this post is file servers running FreeBSD 8 and ZFS with
> compressed file systems on low-end hardware, or actually high-end hardwar=
e
> on VMWare ESX 3.5 and 4, which kind of makes it low-end as far as storage=
 is
> concerned. The servers are standby backup mirrors of production servers -
> thus many writes, few reads.
>
> Running this setup I notice two things:
>
> 1) load averages get very high, though the only usage these systems get i=
s
> file system usage:
> 2) long pauses, in what looks like vfs.zfs.txg.timeout second intervals,
> which seemengly block everything, or at least the entire userland. These
> pauses are sometimes so long that file transfers fail, which must be
> avoided.
>
> Looking at the list of processes it looks like a large number of kernel a=
nd
> userland processes are woken up at once. From the kernel side there are
> regularily all g_* threads, but also unrelated threads like bufdaemon,
> softdepflush, etc. and from the userland - top, syslog, cron, etc. It is
> like ZFS livelocks everything else.
>
> Any ideas on the "pauses" issue?
>

Hi,

I've a bit striped your post. It's kind of "me too" message (more
details here: http://lists.freebsd.org/pipermail/freebsd-geom/2009-December=
/003810.html).
What I've figured out so far is, that lowering the kernel thread
priority (as pjd@ suggested) gives quite promising results (no
livelocks at all). Though my bottleneck were caused by GELI thread.

The pattern there is like this:

sched_prio(curthread, PRIBIO);
[...]
msleep(sc, &sc->sc_queue_mtx, PDROP | PRIBIO,  "geli:w", 0);

I'm running right now with changed wersion - where I have:
msleep(sc, &sc->sc_queue_mtx, PDROP,  "geli:w", 0);

So I don't change initial thread priority. It doesn't give such result
as using PUSER prio, but I fear, that using PUSER may cause livelocks
in some  other cases.

This helps my case (geli encryption and periodic locks during ZFS
transaction commits) with some performance penalty, but I have similar
problems in other cases.

When I run:
# zfs scrub tank

Then "kernel" system process/thread consumes most of CPU (>95% in
system) and load rises to 20+ for the period of scrubbing. During
scrub my top screen looks like:
last pid: 87570;  load averages:  8.26,  2.84,  1.68
199 processes: 3 running, 179 sleeping, 17 waiting
CPU:  2.4% user,  0.0% nice, 97.0% system,  0.6% interrupt,  0.0% idle
Mem: 66M Active, 6256K Inact, 1027M Wired, 104K Cache, 240K Buf, 839M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
    0 root       69  -8    0     0K   544K -      104:40 67.19% kernel
   24 root        1  -8    -     0K     8K geli:w   9:56  5.66% g_eli[0] ad=
6
   26 root        1  -8    -     0K     8K geli:w   9:50  5.47% g_eli[0] ad=
10
   25 root        1  -8    -     0K     8K geli:w   9:53  5.37% g_eli[0] ad=
8
    8 root       12  -8    -     0K   104K vgeom:  61:35  3.27% zfskern
    3 root        1  -8    -     0K     8K -        3:22  0.68% g_up
   11 root       17 -60    -     0K   136K WAIT    31:21  0.29% intr

Intresting thing, is that I have 17 processes waiting for CPU reported
(though only intr is the only process that is reported as in WAIT
state - at least for top40 processes).

I just wonder, whether this might be a scheduler related issue. I'm
thinking about giving a SCHED_4BSD a try.

Cheers,

Wiktor Niesiob=C4=99dzki