From owner-freebsd-mips@FreeBSD.ORG Tue Apr 20 10:56:40 2010 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D11F4106566B for ; Tue, 20 Apr 2010 10:56:40 +0000 (UTC) (envelope-from rpaulo@freebsd.org) Received: from karen.lavabit.com (karen.lavabit.com [72.249.41.33]) by mx1.freebsd.org (Postfix) with ESMTP id 9C9E18FC1A for ; Tue, 20 Apr 2010 10:56:40 +0000 (UTC) Received: from e.earth.lavabit.com (e.earth.lavabit.com [192.168.111.14]) by karen.lavabit.com (Postfix) with ESMTP id 0E0D424EDE1; Tue, 20 Apr 2010 05:56:40 -0500 (CDT) Received: from 10.0.10.3 (54.81.54.77.rev.vodafone.pt [77.54.81.54]) by lavabit.com with ESMTP id N7QPQN4M8FYC; Tue, 20 Apr 2010 05:56:40 -0500 Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Rui Paulo In-Reply-To: Date: Tue, 20 Apr 2010 11:56:36 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <3BCD65EB-B997-449D-864C-CA24C7B19026@freebsd.org> <6BDB3874-D779-45A6-ABAE-4C331D78A189@lakerest.net> <7BEFA3F5-97AE-477C-9DD3-EF1C4B7DCEB0@freebsd.org> To: "C. Jayachandran" X-Mailer: Apple Mail (2.1078) Cc: freebsd-mips@freebsd.org Subject: Re: SMP support for XLR processors. X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2010 10:56:41 -0000 On 20 Apr 2010, at 11:49, C. Jayachandran wrote: > On Tue, Apr 20, 2010 at 4:03 PM, Rui Paulo wrote: >> On 20 Apr 2010, at 11:05, Rui Paulo wrote: >>=20 >>> On 20 Apr 2010, at 10:52, C. Jayachandran wrote: >>>=20 >>>> On Mon, Apr 19, 2010 at 7:27 PM, C. Jayachandran >>>> wrote: >>>>> I have a possible cause for the panic with invariants - we should = not >>>>> schedule the msgring threads unless the smp is completely up. I = guess >>>>> we start getting message ring interrupts on before the message = ring >>>>> threads can be scheduled. I am trying out some changes for this - >>>>> will send you a patch if this fixes it. >>>>=20 >>>> I've attached a patch that should fix the issue. The cause was the = way >>>> message ring threads are started on individual cores and the way >>>> interrupts are enabled in the core. I've moved starting message = ring >>>> threads on other cpus to be a SYSINIT after SMP is started. I'd >>>> thought originally that it was due to some clash with the changes = in >>>> HEAD - but looks like I was completely off-track there. >>>>=20 >>>> Please let me know if you don't get multi-user with 32 cpus with = this >>>> patch. There is still the original hang in buildworld, but that = should >>>> be a bug elsewhere >>>>=20 >>>> I have a copy at http://sites.google.com/site/cjayachandran/files = too >>>=20 >>> This works perfectly, thanks! >>=20 >> On further inspection, I noticed that the load avg is now 7. >>=20 >> last pid: 1613; load averages: 6.99, 6.97, 6.08 up 0+00:30:11 = 10:32:48 >> 108 processes: 40 running, 24 sleeping, 44 waiting >> CPU: 0.0% user, 0.0% nice, 21.9% system, 0.0% interrupt, 78.1% = idle >> Mem: 8444K Active, 6028K Inact, 37M Wired, 308K Cache, 6800K Buf, = 3190M Free >> Swap: >>=20 >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND >> 10 root 32 171 ki31 0G 0G CPU0 0 263:26 2500.00% = idle >> 17 root 1 -16 - 0K 0G CPU12 2 0:00 100.00% = msg_intr12 >> 15 root 1 -16 - 0K 0G CPU4 2 0:00 100.00% = msg_intr4 >> 16 root 1 -16 - 0K 0G CPU8 2 0:00 100.00% = msg_intr8 >> 20 root 1 -16 - 0K 0G CPU24 1 0:00 100.00% = msg_intr24 >> 19 root 1 -16 - 0K 0G CPU20 1 0:00 100.00% = msg_intr20 >> 21 root 1 -16 - 0K 0G CPU28 1 0:00 100.00% = msg_intr28 >> 18 root 1 -16 - 0K 0G CPU16 1 0:00 100.00% = msg_intr16 >>=20 >> What are these msg_intrXX kprocs doing? >=20 > They should really be sleeping unless there is a lot of network > traffic :) The msg_intr threads are interrupt handlers which we run > one per core, in the first thread of each core. They were modelled > after interrupt threads (in FreeBSD 6). This should be sleeping until > there is a message ring interrupt (which tells us that an IO has send > data to our core over the message ring). >=20 > Thanks for the report - I will look at the sleep logic. There's almost no network traffic and only one rge (rge1) is connected. = BTW, I'm not using the rge patch you sent yesterday. Regards, -- Rui Paulo