From owner-svn-src-all@FreeBSD.ORG Wed Jan 21 08:31:24 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1957F45; Wed, 21 Jan 2015 08:31:24 +0000 (UTC) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 87E42B56; Wed, 21 Jan 2015 08:31:24 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id B91B31FE023; Wed, 21 Jan 2015 09:31:21 +0100 (CET) Message-ID: <54BF640B.6000700@selasky.org> Date: Wed, 21 Jan 2015 09:32:11 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: sbruno@freebsd.org, "K. Macy" Subject: Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys References: <201501151532.t0FFWV2Y037455@svn.freebsd.org> <54BDD9E1.6090505@selasky.org> <20150120075126.GA42409@kib.kiev.ua> <20150120211137.GY15484@FreeBSD.org> <54BED6FB.8060401@selasky.org> <54BEE62D.2060703@ignoranthack.me> <54BEE8E6.3080009@ignoranthack.me> <54BEEA7F.1070301@ignoranthack.me> In-Reply-To: <54BEEA7F.1070301@ignoranthack.me> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , "src-committers@freebsd.org" , Jason Wolfe , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" , Gleb Smirnoff , Konstantin Belousov X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2015 08:31:25 -0000 On 01/21/15 00:53, Sean Bruno wrote: > Unkown to me. Nor am I aware of anyone else who ever hit our panics > either. Our environment, and the failure, was only seen in the Intel > 10GE space (ixgbe). This is an artifact of our use cases, and hasn't > been expanded nor tested in our environment with other vendor interfaces. > > sean Hi, I've seen this with Mellanox hardware when running some special tests, but not during regular use yet. That was the reason for going into the callout subsystem in the first place. 40GE. Also I would like to mention during the heat of this discussion, that during X-mas this year, I had a very heavy discussion with Attilio and a few other FreeBSD developers, who's name was on a patch (r220456) that changed how the return value of "callout_active()" works. "callout_active()" is heavily used inside the TCP stack and what was found is there is a potential race related to migrating the callout from one CPU to the other, which in turn might give other symptoms than a spinlock hang. FYI: https://svnweb.freebsd.org/base?view=revision&revision=225057 Cite: "If the newly scheduled thread wants to acquire the old queue it will just spin forever." This description reminds me very much of what "Jason Wolfe", others and myself have seen. Konstantin, you're responsible for r220456 (Approved by: kib). I would like to ask what investigation you did to ensure that you solved the problem as described in the commit message and didn't introduce a new one? In r220456 the "callout_reset_on()" function was changed in a way that directly conflicts with how the TCP stack works, by not always ensuring that "callout_active()" returns non-zero after a callout is restarted! See return at line 821: > https://svnweb.freebsd.org/base/head/sys/kern/kern_timeout.c?revision=225057&view=markup&pathrev=225057#l821 Kib: Any comments? --HPS