From owner-freebsd-mips@FreeBSD.ORG Wed Aug 29 05:25:17 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 17DC31065670 for ; Wed, 29 Aug 2012 05:25:17 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id D84718FC16 for ; Wed, 29 Aug 2012 05:25:16 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 0F66D4C02BE; Wed, 29 Aug 2012 00:25:16 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 0DF404C02BB; Wed, 29 Aug 2012 00:25:16 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id rd-zKxs4CMoF; Wed, 29 Aug 2012 00:25:15 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 94A4C4C02B6; Wed, 29 Aug 2012 00:25:15 -0500 (CDT) Message-ID: <503DA7BA.3030102@rice.edu> Date: Wed, 29 Aug 2012 00:25:14 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: "Jayachandran C." References: <50228F5C.1000408@rice.edu> <50269AD4.9050804@rice.edu> <5029635A.4050209@rice.edu> <502D2271.6080105@rice.edu> <50325DC3.3090201@rice.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: mips@freebsd.org Subject: Re: mips pmap patch X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Aug 2012 05:25:17 -0000 On 08/27/2012 10:24, Jayachandran C. wrote: > On Mon, Aug 20, 2012 at 9:24 PM, Alan Cox wrote: >> On 08/20/2012 05:36, Jayachandran C. wrote: >>> On Thu, Aug 16, 2012 at 10:10 PM, Alan Cox wrote: >>>> On 08/15/2012 17:21, Jayachandran C. wrote: >>>>> On Tue, Aug 14, 2012 at 1:58 AM, Alan Cox wrote: >>>>>> On 08/13/2012 11:37, Jayachandran C. wrote: >>> [...] >>>>>>> I could not test for more than an hour on 32-bit due to another >>>>>>> problem (freelist 1 containing direct-mapped pages runs out of pages >>>>>>> after about an hour of compile test). This issue has been there for a >>>>>>> long time, I am planning to look at it when I get a chance. >>>>>>> >>>>>> What exactly happens? panic? deadlock? >>>>> The build slows down to a crawl and hangs when it runs out of pages in >>>>> the freelist. >>>> >>>> I'd like to see the output of "sysctl vm.phys_segs" and "sysctl >>>> vm.phys_free" from this machine. Even better would be running "sysctl >>>> vm.phys_free" every 60 seconds during the buildworld. Finally, I'd like >>>> to >>>> know whether or not either "ps" or "top" shows any threads blocked on the >>>> "swwrt" wait channel once things slow to a crawl. >>> I spent some time looking at this issue. I use a very large kernel >>> image with built-in root filesystem, and this takes about 120 MB out >>> of the direct mapped area. The remaining pages (~64 MB) are not enough >>> for the build process. If I increase free memory in this area either >>> by reducing the rootfs size of by adding a few more memory segments to >>> this area, the build goes through fine. >> >> I'm still curious to see what "sysctl vm.phys_segs" says. It sounds like >> roughly half of the direct map region is going to DRAM and half to >> memory-mapped I/O devices. Is that correct? > Yes, about half of the direct mapped region in 32-bit is taken by > flash, PCIe and other memory mapped IO. I also made the problem even > worse by not reclaiming some bootloader areas in the direct mapped > region, which reduced the available direct mapped memory. > > Here's the output of sysctls: > > root@testboard:/root # sysctl vm.phys_segs > vm.phys_segs: > SEGMENT 0: > > start: 0x887e000 > end: 0xc000000 > domain: 0 > free list: 0x887a407c > > SEGMENT 1: > > start: 0x1d000000 > end: 0x1fc00000 > domain: 0 > free list: 0x887a407c > > SEGMENT 2: > > start: 0x20000000 > end: 0xbc0b3000 > domain: 0 > free list: 0x887a3f38 > > SEGMENT 3: > > start: 0xe0000000 > end: 0xfffff000 > domain: 0 > free list: 0x887a3f38 > > root@testboard:/root # sysctl vm.phys_free > vm.phys_free: > FREE LIST 0: > > ORDER (SIZE) | NUMBER > | POOL 0 | POOL 1 | POOL 2 > -- -- -- -- -- -- -- -- > 8 ( 1024K) | 2877 | 0 | 0 > 7 ( 512K) | 0 | 1 | 0 > 6 ( 256K) | 1 | 0 | 0 > 5 ( 128K) | 0 | 1 | 0 > 4 ( 64K) | 0 | 1 | 0 > 3 ( 32K) | 0 | 1 | 0 > 2 ( 16K) | 0 | 1 | 0 > 1 ( 8K) | 0 | 0 | 0 > 0 ( 4K) | 0 | 0 | 0 > > FREE LIST 1: > > ORDER (SIZE) | NUMBER > | POOL 0 | POOL 1 | POOL 2 > -- -- -- -- -- -- -- -- > 8 ( 1024K) | 66 | 0 | 0 > 7 ( 512K) | 1 | 1 | 0 > 6 ( 256K) | 0 | 0 | 0 > 5 ( 128K) | 0 | 0 | 0 > 4 ( 64K) | 0 | 1 | 0 > 3 ( 32K) | 0 | 0 | 0 > 2 ( 16K) | 0 | 0 | 0 > 1 ( 8K) | 1 | 1 | 0 > 0 ( 4K) | 0 | 1 | 0 > >>> I also found that when the build slows down, most of the pages taken >>> from freelist 1 are allocated by the UMA subsystem, which seems to >>> keep quite a few pages allocated. >> >> At worst, it may be necessary to disable the use of uma_small_alloc() for >> this machine configuration. At best, uma_small_alloc() could be revised >> opportunistically use pages in the direct map region, but have the ability >> to fall back to pages that have to be mapped. > I think this probably is not a bug, but a configuration problem (we > cannot have such a huge built-in root filesystem when the direct > mapped area is at this low). Anyway, I have checked in code to > recover more areas from the bootloader, and this mostly solves the > issue for me. The above output is taken before the check-in. I'm afraid that exhaustion of freelist 1 is still highly likely to occur under some workloads that require the allocation of a lot of small objects in the kernel's heap. Alan