From owner-freebsd-stable@FreeBSD.ORG  Mon Nov 19 17:04:06 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 6FB3D6A6
 for <freebsd-stable@freebsd.org>; Mon, 19 Nov 2012 17:04:06 +0000 (UTC)
 (envelope-from markus.gebert@hostpoint.ch)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch
 [IPv6:2a00:d70:0:a::e0])
 by mx1.freebsd.org (Postfix) with ESMTP id F2D6B8FC0C
 for <freebsd-stable@freebsd.org>; Mon, 19 Nov 2012 17:04:05 +0000 (UTC)
Received: from [2001:1620:2013:1:acf3:9ba1:5b14:72da] (port=59755)
 by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128)
 (Exim 4.69 (FreeBSD)) (envelope-from <markus.gebert@hostpoint.ch>)
 id 1TaUkx-0005qB-OW; Mon, 19 Nov 2012 18:03:35 +0100
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: thread taskq / unp_gc() using 100% cpu and stalling unix socket
 IPC
From: Markus Gebert <markus.gebert@hostpoint.ch>
In-Reply-To: <6F156867-5334-489E-AC92-BD385B0DBC2F@hostpoint.ch>
Date: Mon, 19 Nov 2012 18:03:28 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <541DECD9-43C7-49EE-BCD2-4FE71753B6F7@hostpoint.ch>
References: <6908B498-6978-4995-B081-8D504ECB5C0A@hostpoint.ch>
 <007F7A73-75F6-48A6-9C01-E7C179CDA48A@hostpoint.ch>
 <20121114072112.GX73505@kib.kiev.ua>
 <6F156867-5334-489E-AC92-BD385B0DBC2F@hostpoint.ch>
To: Konstantin Belousov <kostikbel@gmail.com>
X-Mailer: Apple Mail (2.1499)
Cc: freebsd-stable <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Nov 2012 17:04:06 -0000


On 14.11.2012, at 11:50, Markus Gebert <markus.gebert@hostpoint.ch> =
wrote:

> On 14.11.2012, at 08:21, Konstantin Belousov <kostikbel@gmail.com> =
wrote:
>=20
>> On Wed, Nov 14, 2012 at 01:41:04AM +0100, Markus Gebert wrote:
>>>=20
>>> On 13.11.2012, at 19:30, Markus Gebert <markus.gebert@hostpoint.ch> =
wrote:
>>>=20
>>>> To me it looks like the unix socket GC is triggered way too often =
and/or running too long, which uses cpu and worse, causes a lot of =
contention around the unp_list_lock which in turn causes delays for all =
processes relaying on unix sockets for IPC.
>>>>=20
>>>> I don't know why the unp_gc() is called so often and what's =
triggering this.
>>>=20
>>> I have a guess now. Dovecot and relayd both use unix sockets =
heavily. According to dtrace uipc_detach() gets called quite often by =
dovecot closing unix sockets. Each time uipc_detach() is called =
unp_gc_task is taskqueue_enqueue()d if fds are inflight.
>>>=20
>>> in uipc_detach():
>>> 682		if (local_unp_rights)=09
>>> 683			taskqueue_enqueue(taskqueue_thread, =
&unp_gc_task);
>>>=20
>>> We use relayd in a way that keeps the source address of the client =
when connecting to the backend server (transparent load balancing). This =
requires IP_BINDANY on the socket which cannot be set by unprivileged =
processes, so relayd sends the socket fd to the parent process just to =
set the socket option and send it back. This means an fd gets =
transferred twice for every new backend connection.
>>>=20
>>> So we have dovecot calling uipc_detach() often and relayd making it =
likely that fds are inflight (unp_rights > 0). With a certain amount of =
load this could cause unp_gc_task to be added to the thread taskq too =
often, slowing everything unix socket related down by holding global =
locks in unp_gc().
>>>=20
>>> I don't know if the slowdown can even cause a negative feedback loop =
at some point by inreasing the chance of fds being inflight. This would =
explain why sometimes the condition goes away by itself and sometimes =
requires intervention (taking load away for a moment).
>>>=20
>>> I'll look into a way to (dis)prove all this tomorrow. Ideas still =
welcome :-).
>>>=20
>>=20
>> If the only issue is indeed too aggressive scheduling of the =
taskqueue,
>> than the postpone up to the next tick could do it. The patch below
>> tries to schedule the taskqueue for gc to the next tick if it is not =
yet
>> scheduled. Could you try it ?
>=20
> Sounds like a good idea, thanks! I'm testing the patch right now. It =
could take a few days to know it works for sure. I'll get back to you =
soon.


We haven't had any problems since I booted the patched kernel. So the =
assumption that the gc gets scheduled too often in that situation seems =
correct.

I realize we're creating an edge case with relayd passing around so many =
fds. On the other hand, I think the patch makes the unix socket code =
more robust without hurting anyone. So do you see any chance to get it =
commited?


Markus