From owner-freebsd-hackers@FreeBSD.ORG Mon Aug 29 18:21:21 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0F41106564A; Mon, 29 Aug 2011 18:21:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7255C8FC12; Mon, 29 Aug 2011 18:21:21 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 11AF146B17; Mon, 29 Aug 2011 14:21:21 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7CF468A037; Mon, 29 Aug 2011 14:21:20 -0400 (EDT) From: John Baldwin To: Ivan Voras Date: Mon, 29 Aug 2011 14:15:32 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201108291415.32605.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 29 Aug 2011 14:21:20 -0400 (EDT) Cc: mdf@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Large machine test ideas X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 18:21:21 -0000 On Monday, August 29, 2011 1:28:37 pm Ivan Voras wrote: > On 29 August 2011 18:33, wrote: > > On Mon, Aug 29, 2011 at 7:46 AM, Ivan Voras wrote: > >> On 26/08/2011 19:44, Garrett Cooper wrote: > >>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras wrote: > >>> > >>> ... > >>> > >>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs, > >>>> right? > >>> > >>> A 9.0-BETA1 snapshot, yes. > >> > >> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot > >> froze on boot after showing a "SRAT: No CPU found for memory domain 4". > > > > This message implies the memory affinity information coming from ACPI > > is either non-sensical, or you have an unexpected physical setup where > > there really are CPUs with no memory in the local sockets. > > > > You should be able to boot with something like hint.srat.0="disabled" > > at the boot loader prompt. > > Unfortunately, neither the memtest or the srat disabling tunables > worked (I also tried disabling srat.4). > > My time with the machine is over, so I can't do more testing. The hint to set would be 'hint.srat.0.disabled=1'. However, the SRAT code just ignores the table when it encounters an issue like this, it doesn't hang. Something else later in the boot must have hung. -- John Baldwin