From owner-freebsd-arch@FreeBSD.ORG Thu Jan 22 10:27:46 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BA418D4A; Thu, 22 Jan 2015 10:27:46 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5C27ABF8; Thu, 22 Jan 2015 10:27:46 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t0MARe4r095047 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Jan 2015 12:27:40 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t0MARe4r095047 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t0MARenC095046; Thu, 22 Jan 2015 12:27:40 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 22 Jan 2015 12:27:40 +0200 From: Konstantin Belousov To: Hans Petter Selasky Subject: Re: [RFC] kern/kern_timeout.c rewrite in progress Message-ID: <20150122102740.GZ42409@kib.kiev.ua> References: <54B29A49.3080600@selasky.org> <54B67DA7.3070106@selasky.org> <54B7DECF.8070209@selasky.org> <54BADFB3.3030405@selasky.org> <54BE03EB.2070604@selasky.org> <20150120104736.GA78629@zxy.spb.ru> <54C0CE09.500@selasky.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54C0CE09.500@selasky.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Current , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 10:27:46 -0000 On Thu, Jan 22, 2015 at 11:16:41AM +0100, Hans Petter Selasky wrote: > On 01/20/15 11:47, Slawa Olhovchenkov wrote: > > On Tue, Jan 20, 2015 at 08:29:47AM +0100, Hans Petter Selasky wrote: > > > >> On 01/17/15 23:18, Hans Petter Selasky wrote: > >>> On 01/17/15 20:11, Jason Wolfe wrote: > >>>> > >>>> HPS, > >>>> > >>>> Just to give a quick status update, this patch has most certainly > >>>> resolved our spin lock held too long panics on stable/10. > >>>> > >>>> Thank you to JHB for spending some time digging into the issue and > >>>> leading us to td_slpcallout as the culprit, and HPS for your rewrite. > >>>> I had heard rumors of other being affected by similar issues, so this > >>>> seems like a fine candidate for an MFC if possible. > >>>> > >>>> Jason > >>>> > >>> > >>> Hi Jason, > >>> > >>> I'm glad to hear that my patch has resolved your issue and I'm happy we > >>> now have a more stable system. > >>> > >>> It was actually a co-worker at work which wrote some bad code which I > >>> started debugging which then lead me to look at the callout subsystem. > >>> One bug kills the other ;-) > >>> > >>> I'm planning a MFC to 10-stable - yes, and will possibly add the > >>> _callout_stop_safe() function to not break binary compatibility with > >>> existing drivers as part of the MFC. > >>> > >>> --HPS > >> > >> Hi, > >> > >> Here is a followup patch for the TCP stack like I mentioned in the > >> beginning of the work done on the callout subsystem: > >> > >> https://reviews.freebsd.org/D1563 > >> > >> If someone has a setup for massive TCP testing please give it a spin. > > > > I have on 10.1 (with applied r261906). > > FYI: > > r277213 is going to be pulled out from -current in at maximum a few > hours from now, because developers need more time to review patches in > surrounding areas like the TCP stack area to restore distribution of > callouts on multiple CPUs when using MPSAFE callouts to avoid congestion > in the TCP stack. No, r277213 was requested to be reverted not due to TCP issues. The main complain is that you left indefinite amount of cases degraded, and there is no analysis of each such case, nor even a list of the cases that need to be fixed (or argumentation why consumer of the callout KPI could be left as is). Just providing fix for one place is not enough.