From owner-freebsd-arch@FreeBSD.ORG Thu Jan 22 10:15:53 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9A955A65; Thu, 22 Jan 2015 10:15:53 +0000 (UTC) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53332AEC; Thu, 22 Jan 2015 10:15:52 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 11AE91FE023; Thu, 22 Jan 2015 11:15:51 +0100 (CET) Message-ID: <54C0CE09.500@selasky.org> Date: Thu, 22 Jan 2015 11:16:41 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Slawa Olhovchenkov Subject: Re: [RFC] kern/kern_timeout.c rewrite in progress References: <54A9A71E.70609@selasky.org> <54B29A49.3080600@selasky.org> <54B67DA7.3070106@selasky.org> <54B7DECF.8070209@selasky.org> <54BADFB3.3030405@selasky.org> <54BE03EB.2070604@selasky.org> <20150120104736.GA78629@zxy.spb.ru> In-Reply-To: <20150120104736.GA78629@zxy.spb.ru> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , FreeBSD Current , Jason Wolfe , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 10:15:53 -0000 On 01/20/15 11:47, Slawa Olhovchenkov wrote: > On Tue, Jan 20, 2015 at 08:29:47AM +0100, Hans Petter Selasky wrote: > >> On 01/17/15 23:18, Hans Petter Selasky wrote: >>> On 01/17/15 20:11, Jason Wolfe wrote: >>>> >>>> HPS, >>>> >>>> Just to give a quick status update, this patch has most certainly >>>> resolved our spin lock held too long panics on stable/10. >>>> >>>> Thank you to JHB for spending some time digging into the issue and >>>> leading us to td_slpcallout as the culprit, and HPS for your rewrite. >>>> I had heard rumors of other being affected by similar issues, so this >>>> seems like a fine candidate for an MFC if possible. >>>> >>>> Jason >>>> >>> >>> Hi Jason, >>> >>> I'm glad to hear that my patch has resolved your issue and I'm happy we >>> now have a more stable system. >>> >>> It was actually a co-worker at work which wrote some bad code which I >>> started debugging which then lead me to look at the callout subsystem. >>> One bug kills the other ;-) >>> >>> I'm planning a MFC to 10-stable - yes, and will possibly add the >>> _callout_stop_safe() function to not break binary compatibility with >>> existing drivers as part of the MFC. >>> >>> --HPS >> >> Hi, >> >> Here is a followup patch for the TCP stack like I mentioned in the >> beginning of the work done on the callout subsystem: >> >> https://reviews.freebsd.org/D1563 >> >> If someone has a setup for massive TCP testing please give it a spin. > > I have on 10.1 (with applied r261906). FYI: r277213 is going to be pulled out from -current in at maximum a few hours from now, because developers need more time to review patches in surrounding areas like the TCP stack area to restore distribution of callouts on multiple CPUs when using MPSAFE callouts to avoid congestion in the TCP stack. --HPS