From owner-freebsd-hackers@FreeBSD.ORG Sun Nov 25 14:01:19 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7DC9EEE7; Sun, 25 Nov 2012 14:01:19 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id A67EC8FC12; Sun, 25 Nov 2012 14:01:18 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so9850830lah.13 for ; Sun, 25 Nov 2012 06:01:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=pHhZLCx3MEByWkSqgclgU6JxDW6v9mXvSmB/uxx6iis=; b=XNIvC83IBPsrJGZZsAvx21/DiB21efzLR9+WPgNkZhHRjUrkShMEznCkTHdDKHIs5d rQWUBZHLEhGFBncqej+759e+bwYN3hHHd7y6MZch7MRBDr+tFE2wIxU8OQrQX3i0ST6v coFIhJHPUhy1U1Ul/NDPe1MjvcJ+TFL0ISKoQg2AV0zFMPyqvLhl+SScatfWyF6v475b mkM0jMBqG1KuUnB07RQLD8xqkWhR5udaexcl1GJ/ywFLuUMTcwVYeuuxRvG4LC3lY32D I80C1guFrPDAvx2nEIc0HpVEEhT4IvQNSHnB+xMnkts8IyD6HdmwmsV2baW9wWkL2NN0 3ubw== MIME-Version: 1.0 Received: by 10.152.104.50 with SMTP id gb18mr8453539lab.9.1353852076629; Sun, 25 Nov 2012 06:01:16 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.134.5 with HTTP; Sun, 25 Nov 2012 06:01:16 -0800 (PST) In-Reply-To: <50B21545.5060807@FreeBSD.org> References: <50A5F12C.1050902@FreeBSD.org> <50A63D1D.9090500@FreeBSD.org> <50A65208.4050804@FreeBSD.org> <50B21545.5060807@FreeBSD.org> Date: Sun, 25 Nov 2012 14:01:16 +0000 X-Google-Sender-Auth: a06cDD71dlzgDx1OZ8UBbxKRbmY Message-ID: Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org, Ryan Stone X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2012 14:01:19 -0000 On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon wrote: > on 25/11/2012 14:29 Attilio Rao said the following: >> I think the patch you propose makes such effects even worse, because >> it disables interrupts in generic_stop_cpus(). >> What I suggest to do, is the following: >> - The CPU which wins the race for generic_stop_cpus also signals the >> CPUs it is willing to stop on a global mask >> - Another CPU entering generic_stop_cpus() and loosing the race, >> checks the mask of cpus which might be stopped and stops itself if >> necessary (ie. not yet done). We must be careful with races here, but >> I'm confindent this can be done easily enough. > > I think that you either misunderstood my patch or I misunderstand your > suggestion, because my patch does exactly what you wrote above. The patch is someway incomplete: - I don't think that we need specific checks in cpustop_handler() (and if you have added them to prevent races, I don't think they are enough, see below) - setting of "stopping_cpus" map must happen atomically/before the stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL mask in the check - Did you consider the races about when a stop and restart request happen just after the CPU_ISSET() check? I think CPUs can deadlock there. - I'm very doubious about the spinlock_enter() stuff, I think I can just make the problem worse atm. However you are right, the concept of your patch is the same I really wanted to get, we maybe need to just lift it up a bit. In the while I also double-checked suspended_cpus and I don't think there are real showstoppers to have it in stopped_cpus map. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein