Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jan 2015 09:32:11 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        sbruno@freebsd.org, "K. Macy" <kmacy@freebsd.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Jason Wolfe <nitroboost@gmail.com>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys
Message-ID:  <54BF640B.6000700@selasky.org>
In-Reply-To: <54BEEA7F.1070301@ignoranthack.me>
References:  <201501151532.t0FFWV2Y037455@svn.freebsd.org>	<CAJ-Vmok0GXZoojyi=jE=b5D-d338APztaf3Pw0_AAQ-173XSWw@mail.gmail.com>	<54BDD9E1.6090505@selasky.org>	<20150120075126.GA42409@kib.kiev.ua>	<20150120211137.GY15484@FreeBSD.org>	<54BED6FB.8060401@selasky.org>	<54BEE62D.2060703@ignoranthack.me>	<CAHM0Q_MDJN_8sTvTDXfqA7UtJVO3Y8S8%2BNRCs_=6Nj4dkTzjOA@mail.gmail.com>	<54BEE8E6.3080009@ignoranthack.me> <CAHM0Q_N_53BM-6RvXu8UpjfDzQHEn5oXZo1Nn8RO0cuOUhe8tg@mail.gmail.com> <54BEEA7F.1070301@ignoranthack.me>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/21/15 00:53, Sean Bruno wrote:
> Unkown to me.  Nor am I aware of anyone else who ever hit our panics
> either.  Our environment, and the failure, was only seen in the Intel
> 10GE space (ixgbe).  This is an artifact of our use cases, and hasn't
> been expanded nor tested in our environment with other vendor interfaces.
>
> sean

Hi,

I've seen this with Mellanox hardware when running some special tests, 
but not during regular use yet. That was the reason for going into the 
callout subsystem in the first place. 40GE.

Also I would like to mention during the heat of this discussion, that 
during X-mas this year, I had a very heavy discussion with Attilio and a 
few other FreeBSD developers, who's name was on a patch (r220456) that 
changed how the return value of "callout_active()" works. 
"callout_active()" is heavily used inside the TCP stack and what was 
found is there is a potential race related to migrating the callout from 
one CPU to the other, which in turn might give other symptoms than a 
spinlock hang.

FYI:

https://svnweb.freebsd.org/base?view=revision&revision=225057

Cite: "If the newly scheduled thread wants to acquire the old queue it 
will just spin forever."

This description reminds me very much of what "Jason Wolfe", others and 
myself have seen.

Konstantin, you're responsible for r220456 (Approved by: kib). I would 
like to ask what investigation you did to ensure that you solved the 
problem as described in the commit message and didn't introduce a new one?

In r220456 the "callout_reset_on()" function was changed in a way that 
directly conflicts with how the TCP stack works, by not always ensuring 
that "callout_active()" returns non-zero after a callout is restarted! 
See return at line 821:

> https://svnweb.freebsd.org/base/head/sys/kern/kern_timeout.c?revision=225057&view=markup&pathrev=225057#l821

Kib: Any comments?

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54BF640B.6000700>