Date: Wed, 21 Jan 2015 09:32:11 +0100 From: Hans Petter Selasky <hps@selasky.org> To: sbruno@freebsd.org, "K. Macy" <kmacy@freebsd.org> Cc: Adrian Chadd <adrian@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Jason Wolfe <nitroboost@gmail.com>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com> Subject: Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys Message-ID: <54BF640B.6000700@selasky.org> In-Reply-To: <54BEEA7F.1070301@ignoranthack.me> References: <201501151532.t0FFWV2Y037455@svn.freebsd.org> <CAJ-Vmok0GXZoojyi=jE=b5D-d338APztaf3Pw0_AAQ-173XSWw@mail.gmail.com> <54BDD9E1.6090505@selasky.org> <20150120075126.GA42409@kib.kiev.ua> <20150120211137.GY15484@FreeBSD.org> <54BED6FB.8060401@selasky.org> <54BEE62D.2060703@ignoranthack.me> <CAHM0Q_MDJN_8sTvTDXfqA7UtJVO3Y8S8%2BNRCs_=6Nj4dkTzjOA@mail.gmail.com> <54BEE8E6.3080009@ignoranthack.me> <CAHM0Q_N_53BM-6RvXu8UpjfDzQHEn5oXZo1Nn8RO0cuOUhe8tg@mail.gmail.com> <54BEEA7F.1070301@ignoranthack.me>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/21/15 00:53, Sean Bruno wrote: > Unkown to me. Nor am I aware of anyone else who ever hit our panics > either. Our environment, and the failure, was only seen in the Intel > 10GE space (ixgbe). This is an artifact of our use cases, and hasn't > been expanded nor tested in our environment with other vendor interfaces. > > sean Hi, I've seen this with Mellanox hardware when running some special tests, but not during regular use yet. That was the reason for going into the callout subsystem in the first place. 40GE. Also I would like to mention during the heat of this discussion, that during X-mas this year, I had a very heavy discussion with Attilio and a few other FreeBSD developers, who's name was on a patch (r220456) that changed how the return value of "callout_active()" works. "callout_active()" is heavily used inside the TCP stack and what was found is there is a potential race related to migrating the callout from one CPU to the other, which in turn might give other symptoms than a spinlock hang. FYI: https://svnweb.freebsd.org/base?view=revision&revision=225057 Cite: "If the newly scheduled thread wants to acquire the old queue it will just spin forever." This description reminds me very much of what "Jason Wolfe", others and myself have seen. Konstantin, you're responsible for r220456 (Approved by: kib). I would like to ask what investigation you did to ensure that you solved the problem as described in the commit message and didn't introduce a new one? In r220456 the "callout_reset_on()" function was changed in a way that directly conflicts with how the TCP stack works, by not always ensuring that "callout_active()" returns non-zero after a callout is restarted! See return at line 821: > https://svnweb.freebsd.org/base/head/sys/kern/kern_timeout.c?revision=225057&view=markup&pathrev=225057#l821 Kib: Any comments? --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54BF640B.6000700>