From owner-freebsd-current@FreeBSD.ORG Thu Oct 24 10:48:29 2013 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BB9C921A for ; Thu, 24 Oct 2013 10:48:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1888F2255 for ; Thu, 24 Oct 2013 10:48:28 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA26744 for ; Thu, 24 Oct 2013 13:48:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VZISp-0005yD-Gl for current@FreeBSD.org; Thu, 24 Oct 2013 13:48:27 +0300 Message-ID: <5268FAC3.5070803@FreeBSD.org> Date: Thu, 24 Oct 2013 13:47:31 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: FreeBSD Current Subject: some experience with a many core machine: event timer, hwpmc X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Oct 2013 10:48:29 -0000 I don't think that I have seen observations like the following posted before. I had some brief contact with a 48 core Opteron system (4 packages). Observation #1. Event timers subsystem picked a HPET timer as its source. This resulted in a lot of inter-core / inter-package traffic to re-distribute timer interrupts. This also caused contention on a lock used internally by the kern_et code in the case of a single global timer, because many CPUs tried to grab it concurrently. Additionally, I saw some statistics artifacts like top reported weird and unstable results. I believe that there should be some logic to prefer per-CPU timers over global timers as number of CPUs increases. Observation #2. hwpmc was quite unusable on that system. Attempts to use it resulted in lockups or panics like waiting too long on spinlock. It appears that hwpmc performs some actions on each CPU and those actions are driven by timer interrupts. The actions use a single global lock for arbitration. It appears that contention on that lock make hwpmc unusable. Just in case, this was the case even after I switched the timer to per-CPU LAPIC timers. HZ was default 1000. So perhaps 1ms / 42 (~24us) was not enough for hwpmc to do its per tick per CPU actions before the next tick. The contention appeared to be in pmclog_reserve (called from pmclog_process_callchain). Some details about the hardware just in case: CPU: AMD Opteron(tm) Processor 6172 (2100.07-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f91 Family = 0x10 Model = 0x9 Stepping = 1 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x837ff TSC: P-state invariant FreeBSD/SMP: Multiprocessor System Detected: 48 CPUs FreeBSD/SMP: 4 package(s) x 12 core(s) -- Andriy Gapon