From owner-freebsd-arch@FreeBSD.ORG Sun Mar 8 01:35:28 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7EFBBED1 for ; Sun, 8 Mar 2015 01:35:28 +0000 (UTC) Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com [IPv6:2607:f8b0:4001:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 470FD173 for ; Sun, 8 Mar 2015 01:35:28 +0000 (UTC) Received: by igal13 with SMTP id l13so12948284iga.0 for ; Sat, 07 Mar 2015 17:35:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=4WS7kRLTaZqrmDty51iDHo9ubtl4TiFDxXH9g+OtYHA=; b=ajghhv8HTRXFfyoP+RZF8OPucv7aY/x7zXuvz4/9SAVm4LqjtaNYLdcuI6C/+YIWOH WbM4Ob1Fv1FyWi4upbaDo6jMACbpgUmAtn0mZCJyc0I5hsb0/3FpoKTWd7u9hwBmbFhU DAlrjn10RYxon1mGlc3pAcBPd8oxFShVpzG0yi1trAdM+/p/hzmeMEkUx4uT4wJMFC83 H3zi0+pWvMY8IYDSbjbhcIrMQ5XVCZj4C5zU7jT9q6+C/gVHMavaiA/K9zM03GnCPy9P L0uASKnSTksFY+I1ojc2niMlmG8FePZWJ3v3fXhOtbzJCHsKw9ytRbCOaYED6xdDInAZ PrYA== MIME-Version: 1.0 X-Received: by 10.107.5.211 with SMTP id 202mr4193117iof.88.1425778527599; Sat, 07 Mar 2015 17:35:27 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.36.17.66 with HTTP; Sat, 7 Mar 2015 17:35:27 -0800 (PST) Date: Sat, 7 Mar 2015 17:35:27 -0800 X-Google-Sender-Auth: hE0wTSuwU_Oq3tYeQg2PdYFh6GA Message-ID: Subject: A quick dumpster dive through the busdma allocation path.. From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Mar 2015 01:35:28 -0000 Hi, On a lark, I spent today doing a NUMA dumpster dive through the busdma allocation path. The intent was to tag a dmat with a NUMA domain early on (say in pci), so devices would inherit a domain tag when they created tags, and busdma allocations could occur from that domain. here's how it looks so far, dirty as it is: * I've grabbed the vm phys first touch allocator stuff from -9, and shoehorned it into -head; * I've added iterators to the vm_phys code, so there's some concept of configurable policies; * and there's an iterator init function that can take a specific domain, rather than PCPU_GET(domain). That works enough for first-touch and round-robin userland page allocation. it'd be easy to extend it so each proc/thread had a NUMA allocation policy, but I haven't done that. I've done memory bandwidth / math benchmarks and abused pcm.x to check if the allocations are working and indeed the first-touch allocator is doing what is expected. But, I'm much more interested in device allocation in the kernel for ${WORK}. So I wanted to give that a whirl and see what the minimum amount of work to support that is. * the vm_phys routines now have _domain() versions that take a domain id, or -1 for "system/thread default"; * the kmem_alloc routines now have _domain() versions that take a domain id, or -1; * malloc() has a domain() version that takes a domain id or -1; * busdma for x86 has a 'domain' tag that I'm populating as part of PCI, based on bus_get_domain(). That's just a total hack, but hey, it worked enough for testing; I've plumbed the domain id down through uma enough for large page allocation, and that all worked fine. However, I hit a roadblock here: t5nex0: alloc_ring: numa domain: 1; alloc len: 65536 bounce_bus_dmamem_alloc: dmat domain: 1 bounce_bus_dmamem_alloc: kmem_alloc_contig_domain vm_page_alloc_contig_domain: called; domain=1 .. so that's okay. then vm_page_alloc_contig_domain() calls vm_reserv_alloc_contig(), and that's returning an existing region. So vm_page_alloc_contig_domain() never gets to call vm_phys_alloc_contig_domain() to get the physical memory itself. That's where I'm stuck. Right now it seems that since there's no domain awareness in the vm_reserv code, it's just returning an existing region on whatever domain that particular region came from. So, I'm done with the dive for now. It looks like the VM reservation code may need to learn about the existence of domains? Or would it be cleaner/possible to have multiple kmem_object / kernel_object's, one per domain? Thanks, -adrian