From owner-freebsd-bugs@FreeBSD.ORG Thu Aug 5 14:20:12 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86E9B106567B for ; Thu, 5 Aug 2010 14:20:12 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 47B118FC21 for ; Thu, 5 Aug 2010 14:20:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o75EKC0X065276 for ; Thu, 5 Aug 2010 14:20:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o75EKCgv065268; Thu, 5 Aug 2010 14:20:12 GMT (envelope-from gnats) Resent-Date: Thu, 5 Aug 2010 14:20:12 GMT Resent-Message-Id: <201008051420.o75EKCgv065268@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Robert Blayzor Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E42E0106567C for ; Thu, 5 Aug 2010 14:15:08 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id D0BFA8FC1F for ; Thu, 5 Aug 2010 14:15:08 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o75EF8ah084825 for ; Thu, 5 Aug 2010 14:15:08 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id o75EF8RC084824; Thu, 5 Aug 2010 14:15:08 GMT (envelope-from nobody) Message-Id: <201008051415.o75EF8RC084824@www.freebsd.org> Date: Thu, 5 Aug 2010 14:15:08 GMT From: Robert Blayzor To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/149323: Applications with large memory footprint failing with mmap - memory allocation errors X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Aug 2010 14:20:12 -0000 >Number: 149323 >Category: kern >Synopsis: Applications with large memory footprint failing with mmap - memory allocation errors >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Aug 05 14:20:11 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Robert Blayzor >Release: 8.1-RELEASE >Organization: INOC, LLC >Environment: FreeBSD dcc0.albyny.inoc.net 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Thu Aug 5 12:49:45 UTC 2010 root@dcc0.albyny.inoc.net:/usr/obj/mnt/pub/FreeBSD/8.1-RELEASE/sys/PE1750 i386 >Description: System previously ran 7.3-RELEASE with no issues. We are seeing a problem with an application which uses are rather large memory footprint. The binary is "dccd" from the port /usr/ports/mail/dcc-dccd. Typically the binary would run ata bout 1800-1900MB of RAM under 7.3, with no problems. This is the only application which runs on this server other than an ntp daemon. After upgrading to FreeBSD 8.1-RELEASE we started noticing that the dccd daemon would start failing with memory allocation errors after the process grew to about 850-900MB of RAM. There are no FreeBSD kernel or userland errors, but dccd was logging in massive numbers to syslog: Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x49b72000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x49b72000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x48c84000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x48c84000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x47d96000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x47d96000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x46ea8000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x46ea8000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x45fba000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x45fba000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x450cc000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x450cc000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x441de000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x441de000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x432f0000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x432f0000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x42402000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x42402000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x41514000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x41514000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x40626000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x40626000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x3e84a000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x3e84a000) ok Aug 4 14:18:16 dcc0 dccd[742]: try #1 mmap(dcc_db 0xeee000,0x3,0x801,11,0x3d95c000): Cannot allocate memory Aug 4 14:18:16 dcc0 dccd[742]: try #2 mmap(dcc_db,0xeee000,0x3d95c000) ok We first noticed this problem after immediately booting into the 8.1 GENERIC kernel. We tried compiling our custom kernel (which included PAE) and also rebuilt all ports against the new 8.1 libraries. The result was the same. When the above happens, we can see the following in top: last pid: 13710; load averages: 0.00, 0.00, 0.00 up 0+06:54:40 23:02:38 17 processes: 1 running, 16 sleeping CPU: 0.0% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle Mem: 867M Active, 54M Inact, 153M Wired, 164K Cache, 112M Buf, 2685M Free Swap: 3915M Total, 3915M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12179 dcc 1 44 0 885M 886M select 0 2:35 0.00% dccd >How-To-Repeat: Unsure at this time. We can repeat the problem by simply running our dccd daemon and letting it build it's database up in memory until it hits the breaking point. We did not experience this problem under 7.3. Some of our system configurations: [dcc0:~] cat /boot/loader.conf kern.maxdsiz="2048M" [dcc0:~] cat /etc/sysctl.conf kern.maxfiles=131072 kern.maxfilesperproc=32768 # kern.ipc.nmbclusters=65536 kern.ipc.maxsockbuf=4194304 kern.ipc.somaxconn=2048 kern.ipc.maxsockets=65536 net.inet.udp.recvspace=131072 net.inet.udp.maxdgram=16384 net.inet.tcp.msl=7500 # net.inet.icmp.log_redirect=1 net.inet.icmp.drop_redirect=1 net.inet.tcp.delayed_ack=0 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.link.ether.inet.log_arp_wrong_iface=0 kern.sugid_coredump=1 net.inet.tcp.keepidle=120000 net.inet.tcp.keepintvl=10000 Custom kernel parts: cpu I486_CPU cpu I586_CPU cpu I686_CPU ident PE1750 options PAE makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols makeoptions NO_MODULES=yes options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=100 device bge # Broadcom BCM570xx Gigabit Ethernet device scbus # SCSI bus (required for SCSI) #device ch # SCSI media changers device da # Direct Access (disks) #device sa # Sequential Access (tape etc) device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers interfaced to the SCSI subsystem device amr # AMI MegaRAID dmesg output: Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-RELEASE #0: Thu Aug 5 12:49:45 UTC 2010 root@dcc0.albyny.inoc.net:/usr/obj/mnt/pub/FreeBSD/8.1-RELEASE/sys/PE1750 i386 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2781.67-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Family = f Model = 2 Stepping = 9 Features=0xbfebfbff Features2=0x4400 real memory = 4294967296 (4096 MB) avail memory = 3936641024 (3754 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 2 package(s) x 1 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 6 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic2: Changing APIC ID to 10 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard ioapic2 irqs 32-47 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 cpu1: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 vgapci0: port 0xec00-0xecff mem 0xfd000000-0xfdffffff,0xfe101000-0xfe101fff at device 14.0 on pci0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x8b0-0x8bf at device 15.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] ohci0: mem 0xfe100000-0xfe100fff irq 11 at device 15.2 on pci0 ohci0: [ITHREAD] usbus0: on ohci0 isab0: at device 15.3 on pci0 isa0: on isab0 pcib1: on acpi0 pci4: on pcib1 amr0: mem 0xf0000000-0xf7ffffff,0xfcd00000-0xfcd3ffff irq 18 at device 3.0 on pci4 amr0: Using 64-bit DMA amr0: [ITHREAD] amr0: delete logical drives supported by controller amr0: Firmware 412W, BIOS H406, 128MB RAM pcib2: on acpi0 pci3: on pcib2 pcib3: on acpi0 pci2: on pcib3 bge0: mem 0xfcf30000-0xfcf3ffff,0xfcf20000-0xfcf2ffff irq 16 at device 0.0 on pci2 miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:0f:1f:66:a8:03 bge0: [ITHREAD] bge1: mem 0xfcf10000-0xfcf1ffff,0xfcf00000-0xfcf0ffff irq 17 at device 0.1 on pci2 miibus1: on bge1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:0f:1f:66:a8:04 bge1: [ITHREAD] pcib4: on acpi0 pci1: on pcib4 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] atrtc0: port 0x70-0x7f irq 8 on acpi0 orm0: at iomem 0xc0000-0xc7fff,0xec000-0xeffff pnpid ORM0000 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] p4tcc0: on cpu0 p4tcc1: on cpu1 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to deny, logging disabled usbus0: 12Mbps Full Speed USB v1.0 ugen0.1: <(0x1166)> at usbus0 uhub0: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 acd0: CDROM at ata1-master UDMA33 amr0: delete logical drives supported by controller amrd0: on amr0 amrd0: 17280MB (35389440 sectors) RAID 1 (optimal) SMP: AP CPU #1 Launched! uhub0: 4 ports with 4 removable, self powered Trying to mount root from ufs:/dev/amrd0s1a >Fix: The only workaround we've been able to find was recommended by the DCCD author Vernon Schryver to set the built time option in the port: --with-max-db-mem=800 That limited the database memory footprint under the breaking point. One thing we did try, but did not work was upping the vm.max_proc_mmap value in sysctl. vm.max_proc_mmap=65535 Which had no effect at all. >Release-Note: >Audit-Trail: >Unformatted: