From owner-freebsd-bugs@FreeBSD.ORG  Sun Feb 15 06:43:28 2015
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 34F6BA82
 for <freebsd-bugs@FreeBSD.org>; Sun, 15 Feb 2015 06:43:28 +0000 (UTC)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 02F43FFD
 for <freebsd-bugs@FreeBSD.org>; Sun, 15 Feb 2015 06:43:28 +0000 (UTC)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t1F6hR0s066217
 for <freebsd-bugs@FreeBSD.org>; Sun, 15 Feb 2015 06:43:27 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-bugs@FreeBSD.org
Subject: [Bug 192889] accept4 socket hangs in CLOSED (memcached)
Date: Sun, 15 Feb 2015 06:43:25 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.0-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: mp39590@gmail.com
X-Bugzilla-Status: New
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-192889-8-KNz9DlRU3G@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-192889-8@https.bugs.freebsd.org/bugzilla/>
References: <bug-192889-8@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs/>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Feb 2015 06:43:28 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=192889

mp39590@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mp39590@gmail.com

--- Comment #15 from mp39590@gmail.com ---
Reason for this bug to happen lies not in the network stack, but in
capabilities subsystem.

Memcached consists of a dispatcher thread and several worker threads, which
communicates through a pipe, for example if new connection is accepted,
dispatcher writes 'c' to a pipe for a selected worker thread (it switches them
in round-robin manner), worker thread then popup the connection from the queue
and serves it.

Due to a slight race condition in capabilities, kevent() mechanism sometimes
may return spurious ENOTCAPABLE errors for the descriptors. It makes libevent
to abort the loop which works with the connections and return. Memcached
doesn't expect it to happen and worker thread silenty returns[1] and dies. You
may see it with procstat command, comparing count of threads in normal and
failing situation - you will be one thread short for the last.

Dispatcher is not aware of this catastrophic event, and therefor continues to
write "c"'s about new connection to the pipe of that, already dead, thread, but
of course no one will serve those connections and they're left on the air.

And reasons why you see it as massive amount of CLOSED\CLOSE_WAIT connections
is simply the fact that client by timeout or by any other ways decided to
close() its connection. Network stack receives FIN packet and expects our
application to issue close() on the descriptor, but since thread is already
dead - it will never happen.

This bug was addressed by Mateusz in r273137[2].

[1] - https://github.com/memcached/memcached/blob/master/thread.c#L369
[2] - https://svnweb.freebsd.org/base?view=revision&revision=273137

-- 
You are receiving this mail because:
You are the assignee for the bug.