From owner-freebsd-arm@FreeBSD.ORG Fri May 15 22:43:02 2015 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E71C011A; Fri, 15 May 2015 22:43:02 +0000 (UTC) Received: from mx0.deglitch.com (unknown [IPv6:2001:16d8:ff00:19d::2]) by mx1.freebsd.org (Postfix) with ESMTP id A29311E70; Fri, 15 May 2015 22:43:02 +0000 (UTC) Received: from dhcp-250-37.sj.pi-coral.com (unknown [12.218.212.178]) by mx0.deglitch.com (Postfix) with ESMTPSA id 1F5EA8FC2D; Sat, 16 May 2015 02:42:53 +0400 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: UMA initialization failure with 48 core ARM64 From: Stanislav Sedov In-Reply-To: Date: Fri, 15 May 2015 15:42:49 -0700 Cc: freebsd-current@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <2A6C7643-0C10-4451-B547-9D50EA6809B8@freebsd.org> References: To: =?utf-8?Q?Micha=C5=82_Stanek?= X-Mailer: Apple Mail (2.2098) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 May 2015 22:43:03 -0000 > On May 15, 2015, at 11:30 AM, Micha=C5=82 Stanek = wrote: >=20 > Hi, >=20 > I am experiencing an early failure of UMA on an ARM64 platform with 48 > cores enabled. I get a kernel panic during initialization of VM. Here = is > the boot log (lines with 'MST:' are my own debug printfs). >=20 > Copyright (c) 1992-2015 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, = 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 11.0-CURRENT #333 52fd91e(smp_48)-dirty: Fri May 15 18:26:56 = CEST > 2015 > = mst@arm64-prime:/usr/home/mst/freebsd_v8/obj_kernel/arm64.aarch64/usr/home= /mst/freebsd_v8/kernel/sys/THUNDER-88XX > arm64 > FreeBSD clang version 3.6.0 (tags/RELEASE_360/final 230434) 20150225 > MST: in vm_mem_init() > MST: in vmem_init() with param *vm =3D=3D kernel_arena > MST: in vmem_xalloc() with param *vm =3D=3D kernel_arena > MST: in vmem_xalloc() with param *vm =3D=3D kmem_arena > panic: mtx_lock() of spin mutex (null) @ > /usr/home/mst/freebsd_v8/kernel/sys/kern/subr_vmem.c:1165 > cpuid =3D 0 > KDB: enter: panic > [ thread pid 0 tid 0 ] > Stopped at 0xffffff80001f4f80: >=20 > The kernel boots fine when MAXCPU is set to 30 or lower, but the error > above always appears when it is set to a higher value. >=20 > The panic is triggered by a KASSERT in __mtx_lock_flags() which is = called > with the macro VMEM_LOCK(vm) in vmem_xalloc(). This is line 1143 in > subr_vmem.c (log shows different line number due to added printfs). > It looks like the lock belongs to 'kmem_arena' which is uninitialized = at > this point (kmeminit() has not been called yet). >=20 > While debugging, I tried modifying VM code as a quick workaround. I > replaced the number of cores to 1 wherever mp_ncpus, mp_maxid or = MAXCPU > (and others) are read. This, I believe, limits UMA per-cpu caches to = just > one, while the rest of the OS (scheduler, etc) sees all 48 cores. > In addition, I changed UMA_BOOT_PAGES in sys/vm/uma_int.h to 512 = (default > was 64). > With these tweaks, I got a successful (but not really stable) boot = with 48 > cores. Of course these are dirty hacks and a proper solution is = needed. >=20 > I am a bit surprised that the kernel fails with MAXCPU=3D=3D48 as the = amd64 > arch has this value set to '256' and I have read posts that other = platforms > with even more cores have worked fine. Perhaps I need to tweak some = other > VM parameters, apart from UMA_BOOT_PAGES (AKA vm.boot_pages), but I am = not > sure how. >=20 > I included a full stacktrace and a more verbose log (with UMA_DEBUG = macros > enabled) in the attachment. There is also a diff of the hacks I used = while > debugging. >=20 >=20 Hi, Michal! It looks like the log attachment didn=E2=80=99t make it though the = mailing list. Can you please resend it again? The panic suggests that a mutex was left uninitialized... -- ST4096-RIPE