From owner-freebsd-arch@FreeBSD.ORG Tue Apr 28 23:32:30 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7729A3A5 for ; Tue, 28 Apr 2015 23:32:30 +0000 (UTC) Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 422151688 for ; Tue, 28 Apr 2015 23:32:30 +0000 (UTC) Received: by igbyr2 with SMTP id yr2so105406484igb.0 for ; Tue, 28 Apr 2015 16:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ZZs1ydaXaqCxtEC9pBx/o1ptA2BaBG+l7mud9JS8/u8=; b=kkFN0C/IP3L1GLFuSzIUyjJtMcg8tdS8egO/2RL5RmIoJowBV1hd6VguwcsqYEGpdE NSrb1fNWG5wQB4MUF5Agq6zV6HGwPCMqYuBqtq5MkmV1bA2PJNGtc0/aifcZiAXNJwGU G7FSW/EKx4oKFAq5OzcmexF+ePdmmZ9xng3D54Ap5SgbIpCX85qxPcLtF4i66/RRKpjP TcPVNmzV8AXufYfvYRZIZZgKWbfINukI9J9kZ+DY5EnC10Q/AMcHJAoarEwGC9zwNW6c A/kqmFOcn3d2bGZ2zcwhFthAz7EQhOLoAVxbFALd7bcSOcwB5g7BeBrV+P8NHtVxzVVu IUAA== MIME-Version: 1.0 X-Received: by 10.43.163.129 with SMTP id mo1mr395770icc.61.1430263949658; Tue, 28 Apr 2015 16:32:29 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Tue, 28 Apr 2015 16:32:29 -0700 (PDT) In-Reply-To: References: Date: Tue, 28 Apr 2015 16:32:29 -0700 X-Google-Sender-Auth: nSierJALtjIeSja88mBq_CJQbF8 Message-ID: Subject: Re: RFT: numa policy branch From: Adrian Chadd To: Rui Paulo Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Apr 2015 23:32:30 -0000 On 28 April 2015 at 14:39, Rui Paulo wrote: > On Apr 26, 2015, at 01:30 PM, Adrian Chadd wrote: > > Hi! > > Another update: > > * updated to recent -HEAD; > * numactl now can set memory policy and cpuset domain information - so > it's easy to say "this runs in memory domain X and cpu domain Y" in > one pass with it; > > > That works, but --mempolicy=first-touch should ignore the --memdomain > argument (or print an error) if it's present. Ok. > * the locality matrix is now available. Here's an example from scott's > 2x haswell v3, with cluster-on-die enabled: > > vm.phys_locality: > 0: 10 21 31 31 > 1: 21 10 31 31 > 2: 31 31 10 21 > 3: 31 31 21 10 > > And on the westmere-ex box, with no SLIT table: > > vm.phys_locality: > 0: -1 -1 -1 -1 > 1: -1 -1 -1 -1 > 2: -1 -1 -1 -1 > 3: -1 -1 -1 -1 > > > This worked for us on IvyBridge a SLIT table. Cool. > * I've tested in on westmere-ex (4x socket), sandybridge, ivybridge, > haswell v3 and haswell v3 cluster on die. > * I've discovered that our implementation of libgomp (from gcc-4.2) is > very old and doesn't include some of the thread control environment > variables, grr. > * .. and that the gcc libgomp code doesn't at all have freebsd thread > affinity routines, so I added them to gcc-4.8. > > > I used gcc 4.9 > > I'd appreciate any reviews / testing people are able to provide. I'm > about at the functionality point where I'd like to submit it for > formal review and try to land it in -HEAD. > > There's a bug in the default sysctl policy. You're calling strcat on an > uninitialised string, so it produces garbage output. We also hit the a > panic when our application starts allocation many GBs of memory. In this > case, the memory is split between two sockets and I think it's crashing like > you described on IRC. I'll fix the former soon, thanks for pointing that out. As for the crash - yeah, I reproducd it and sent a patch to alc for review. It's because vm_page_alloc() doesn't expect calls to vm_phys to fail a second time around. Trouble is - the VM thresholds are all global. Failing an allocation in one domain does cause pagedaemon to start up on that domain, but no paging actually occurs. Unfortunately the pager still thinks there's plenty of memory available, so it doesn't know it needs to run. There's a pagedaemon per domain, but no thresholds per domain or paging / paging targets per domain. I don't think we're going to be able to fix that this pass - I'd rather get this or something like this into the kernel so at least first-touch-rr, fixed-domain-rr and rr work. Then yes, the VM will need some updating. -adrian