From owner-freebsd-stable@FreeBSD.ORG Wed Feb 29 19:22:41 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B53ED10656D7 for ; Wed, 29 Feb 2012 19:22:41 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3ED2F8FC12 for ; Wed, 29 Feb 2012 19:22:40 +0000 (UTC) Received: by eekd17 with SMTP id d17so2494894eek.13 for ; Wed, 29 Feb 2012 11:22:40 -0800 (PST) Received-SPF: pass (google.com: domain of asmrookie@gmail.com designates 10.112.27.199 as permitted sender) client-ip=10.112.27.199; Authentication-Results: mr.google.com; spf=pass (google.com: domain of asmrookie@gmail.com designates 10.112.27.199 as permitted sender) smtp.mail=asmrookie@gmail.com; dkim=pass header.i=asmrookie@gmail.com Received: from mr.google.com ([10.112.27.199]) by 10.112.27.199 with SMTP id v7mr794248lbg.36.1330543360176 (num_hops = 1); Wed, 29 Feb 2012 11:22:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=kQrFIrj9louqc/wEfbdNbDMbjXQx8NV7eQ+Y/K0i0Zs=; b=EWe+0aQbBs502yfOE5u9SzAb+uOvAzJCk6RC6WZIU4DLnKOawy1itAL9BzR+uy2H+v l87TAyUQtyzfzAzsQtYF9qDHOKmoyVxIZ5EMvCSY7BDU47ac5NbP4BsuaI3E9rKlf2tQ YhWzG54W8oanBc3tAcDrPYxQiyE2F7PWW1MiA= MIME-Version: 1.0 Received: by 10.112.27.199 with SMTP id v7mr649443lbg.36.1330543360061; Wed, 29 Feb 2012 11:22:40 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.41.5 with HTTP; Wed, 29 Feb 2012 11:22:40 -0800 (PST) In-Reply-To: References: Date: Wed, 29 Feb 2012 19:22:40 +0000 X-Google-Sender-Auth: xvyIBXxBvgl-xvspiNN8Eh8xInU Message-ID: From: Attilio Rao To: Arnaud Lacombe Content-Type: text/plain; charset=UTF-8 Cc: freebsd-stable Subject: Re: Complete hang on 9.0-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 19:22:41 -0000 2012/2/29, Arnaud Lacombe : > Hi, > > On Wed, Feb 29, 2012 at 1:44 PM, Attilio Rao wrote: >> 2012/2/29, Arnaud Lacombe : >>> Hi, >>> >>> On Wed, Feb 29, 2012 at 12:59 PM, Arnaud Lacombe >>> wrote: >>>> Hi, >>>> >>>> On Mon, Feb 27, 2012 at 12:48 PM, Arnaud Lacombe >>>> wrote: >>>>> Hi, >>>>> >>>>> On Mon, Feb 27, 2012 at 10:36 AM, Attilio Rao >>>>> wrote: >>>>>> 2012/2/27, Arnaud Lacombe : >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe >>>>>>> wrote: >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> For the records, I was running some tests yesterday on top of a >>>>>>>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of the >>>>>>>> hang, the box was running a process with about 2800 threads with >>>>>>>> heavy >>>>>>>> IPC between 1400 writers and 1400 readers. The box was in single >>>>>>>> user >>>>>>>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginning >>>>>>>> of the dmesg: >>>>>>>> >>>>>>> This happened a second time, now with FreeBSD 8.2-RELEASE. Complete >>>>>>> machine hang. The machine was running about 4000 threads in a single >>>>>>> process, all the other condition are the same. >>>>>> >>>>>> Arnaud, >>>>>> can you please break in your kernel via KDB, collect the following >>>>>> informations from the DDB prompt: >>>>>> - ps >>>>>> - alltrace >>>>>> - show allpcpu >>>>>> - possibly get a coredump with 'call doadump' >>>>>> >>>>> Will do, but I'll need to rebuild a kernel to include DDB. >>>>> >>>>>> and in the end provide all those along with kernel binary and possibly >>>>>> sources somewhere? >>>>>> >>>>> I'll be testing a bare `release/8.2.0' with the following patch: >>>>> >>>>> diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC >>>>> index c3e0095..7bd997f 100644 >>>>> --- a/sys/amd64/conf/GENERIC >>>>> +++ b/sys/amd64/conf/GENERIC >>>>> @@ -79,6 +79,10 @@ options INCLUDE_CONFIG_FILE # Include this >>>>> file in kernel >>>>> >>>>> options KDB # Kernel debugger related code >>>>> options KDB_TRACE # Print a stack trace for a panic >>>>> +options DDB >>>>> +options BREAK_TO_DEBUGGER >>>>> +options ALT_BREAK_TO_DEBUGGER >>>>> >>>>> # Make an SMP-capable kernel by default >>>>> options SMP # Symmetric MultiProcessor Kernel >>>>> >>>> ok, it happened again after 2 days, the process was running about 3200 >>>> threads. I'm trying to break into DDB and let you know, I'm not that >>>> successful for now... >>>> >>> No luck. None of BREAK or ALT_BREAK are responding. I will not touch >>> the system in the next few hours if you want me to test something on >>> it. In the event of 8.2-RELEASE or 9.0-RELEASE are not meant to work >>> reliably on top of a 7.4-RELEASE userland, I will re-setup the test to >>> occurs on a clean 9.0-RELEASE system and re-try. >> >> We allow to break KBI when new releases happens, thus this may cause a >> breakage for you, even if a deadlock is really not something you want. >> >> Can you try enabling SW_WATCHDOG, DEADLKRES and possibly arm your ichwd? >> if the breakage involves clocks or interrupt sources there are still >> chances they will be able to catch it though. >> >> However, it doesn't seem you are setup with a proper serial console? > The serial console is working definitively fine. I can break into DDB > at will when the test is running. I did not test with ALT_BREAK > per-se, but BREAK does work. So if you try to break in DDB via serial break it doesn't work? That is definitively very bad... Can you try with the options I mentioned earlier and see if something changes? Attilio -- Peace can only be achieved by understanding - A. Einstein