From owner-freebsd-stable@FreeBSD.ORG Mon Nov 24 14:59:40 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 698DB1065673 for ; Mon, 24 Nov 2008 14:59:40 +0000 (UTC) (envelope-from michael.grant@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159]) by mx1.freebsd.org (Postfix) with ESMTP id 923D98FC12 for ; Mon, 24 Nov 2008 14:59:39 +0000 (UTC) (envelope-from michael.grant@gmail.com) Received: by fg-out-1718.google.com with SMTP id l26so1562681fgb.35 for ; Mon, 24 Nov 2008 06:59:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=T+yB2IgYMvVs7GmQSOh90vXLPgpaNI7ALK3dcRlsS0s=; b=Ut3r6d34FGNUGqY6UNCoBwemLmHKZGGOpD4Snk+hMTnlgEp7H0Jsl/UXeUnop489kC y9TIm5MfVACzVTARKtudqphUI6FFFz14+VECI3b8snNZeTGOQTRJToXWzRPHV2gM3hvb uhcXs3eWZv+nBYbeolW46xYcglsH/i4pkYPZE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=Zfy7hg+K1sZqyBwV9sAs+OQ+W7ZAA8kdZV3c5Mwy3qFn/chB6itTyak6sDg7MD1Gg4 XZrg0HgYWKCtLbcqScoTjf7z8CgAGvs4CBs9DcthrimnECMAU668Y6GyB93NZocicrpg IzJjdh4ertRJvNxLrm9tbDOLNx407jRRUVmCo= Received: by 10.181.20.13 with SMTP id x13mr1151800bki.164.1227538778234; Mon, 24 Nov 2008 06:59:38 -0800 (PST) Received: by 10.181.30.1 with HTTP; Mon, 24 Nov 2008 06:59:38 -0800 (PST) Message-ID: <62b856460811240659v4e8a8dfx601e5bc9a4e69c7e@mail.gmail.com> Date: Mon, 24 Nov 2008 09:59:38 -0500 From: "Michael Grant" Sender: michael.grant@gmail.com To: "FreeBSD Stable List" In-Reply-To: <20080911105631.GB25493@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080716203900.5jt4qce17gg0og0o@mail.basicnets.co.uk> <62b856460807241309k3cea60dbh24eea677cd6751f7@mail.gmail.com> <4888E207.4020606@FreeBSD.org> <62b856460809110138o5fb10171h9832ac8b964fa3f6@mail.gmail.com> <20080911092047.GA24499@icarus.home.lan> <62b856460809110308sa44f057mc08189a97efa9d0c@mail.gmail.com> <20080911105631.GB25493@icarus.home.lan> X-Google-Sender-Auth: ef6d94e920acb365 Subject: Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2008 14:59:40 -0000 On Thu, Sep 11, 2008 at 11:56 AM, Jeremy Chadwick wrote: > On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote: >> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick wrote: >> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote: >> >> My box crashed again: >> >> >> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated >> >> cpuid = 0 >> >> Uptime: 33d11h12m58s >> >> Dumping 3327 MB (2 chunks) >> >> chunk 0: 1MB (151 pages) ... ok >> >> chunk 1: 3327MB (851568 pages) <---hung here >> >> >> >> Still no valid dump. >> >> >> >> There is 4gig of physical memory in the machine. >> >> >> >> In /boot/loader.conf, I currently have the following: >> >> >> >> vm.kmem_size=1G >> >> vm.kmem_size_max=1G >> >> vm.kmem_size_scale=2 >> >> >> >> and in my kernel conf file I have: >> >> >> >> options KVA_PAGES=512 >> >> >> >> It stayed up for 33 days this time. Is there anything else I can do? >> > >> > First and foremost: are you using ZFS on this machine? If so, there are >> > many tunables you can apply to try and limit this; I'm willing to bet >> > it's ARC which is doing it. See below. >> > >> > In general, it appears that you need to increase the maximum range of >> > kmem. The kernel attempted to utilise more than 1GB, and your limit is >> > 1G. My machines running RELENG_7 on amd64, with only 2GB of RAM >> > installed, use the following tunables in loader.conf: >> > >> > vm.kmem_size="1536M" >> > vm.kmem_size_max="1536M" >> > >> > If ZFS is in use, I recommend these as well: >> > >> > vfs.zfs.arc_min="16M" >> > vfs.zfs.arc_max="64M" >> > vfs.zfs.prefetch_disable="1" >> > >> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you >> > have in the machine, with regards to RELENG_7, will not help. This is a >> > known limitation which has been fixed in HEAD/CURRENT (where the limit >> > has been increased to 512GB). See the "Kernel" section below; you'll >> > see the applicable item. >> > >> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues >> > >> > Your only solution may be to run HEAD/CURRENT. >> >> I am not running ZFS. My file systems are ufs. >> >> This feels like some sort of memory leak in the kernel. Giving it >> more and more memory just seems to delay the crash. Are you saying >> the crash is fixed in HEAD/CURRENT? > > It's an intentional crash, not "the program tried to access NULL, which > crashed the machine" crash. The kernel wants more memory to accomplish > a certain thing, and it's not available. kris@ can explain this in > better terms than I can. > > First and foremost, it would be good to find out what all you are > running on this machine (process-wise). A process could be tickling > something in the kernel which requires a large amount of memory to be > required. I can imagine something like MySQL would require this. > > Ideally what needs to happen is to debug the kernel or get a full map > of kmem to find out what's using what. I believe vmstat -m or vmstat -z > output might help. > > Obviously since the machine panics, you won't be able to run those > commands after the fact. I would recommend you set up a cronjob that > runs every 1-2 minutes and logs the output of both of those commands > to a file. When the panic happens, restart the system and look at > the logfile to see if you can figure out if anything suddenly starts > taking up a large amount of memory, or if it's a gradual thing > (indicating a memory leak). > > If you can figure out what might be tickling the problem, you can > ultimately figure out if increasing kmem is the right thing to do, or if > there's a greater problem here. > >> I'm running 6.3 by the way. >> >> I have put your changes into my loader.conf, we'll see how long it >> goes this time. I'm not qute in position to update everything to 7.x >> at the moment. > > Our production webservers run RELENG_6 and RELENG_7, and we don't > encounter this kind of problem. I'm not saying what you're experiencing > is indicative of hardware issues or something like that -- I'm simply > saying I have loaded systems which don't ever hit that condition. So > figuring out what's causing it in your case would be good. > This appears to be too high as the machine reboots immediately after the fsck: >> > vm.kmem_size="1536M" >> > vm.kmem_size_max="1536M" Returning it to 1G, it panics again about a month later. Here's vmstat -m and -z roughly 1 minute before it crashed (I was logging to a file every minute via cron): Fri Nov 21 15:15:00 EST 2008 Type InUse MemUse HighUse Requests Size(s) pfs_vncache 2 1K - 864205 32 GEOM 168 24K - 416279 16,32,64,128,256,512,1024,2048,4096 isadev 17 2K - 17 64 CAM periph 1 1K - 1 128 cdev 26 4K - 26 128 CAM queue 3 1K - 3 16 file desc 739 474K - 284943537 16,32,64,256,512,1024,2048,4096 sigio 3 1K - 4802 32 kenv 116 8K - 118 16,32,64,4096 kqueue 246 154K - 17652506 256,1024 proc-args 153 10K - 107101480 16,32,64,128,256 zombie 0 0K - 99871925 128 ithread 147 15K - 147 16,64,128 KTRACE 100 13K - 265722 16,32,64,128,256,512,1024,2048,4096 linker 178 453K - 475 16,32,256,512,1024,2048,4096 lockf 18 2K - 7774966702 64 devbuf 594 1779K - 598 16,32,64,128,256,512,1024,2048,4096 temp 3170780 795024K - 684086094 16,32,64,128,256,512,1024,2048, 4096 ip6opt 1 1K - 1 128 ip6ndp 7 1K - 8 64,128 module 403 26K - 403 64,128 mtx_pool 1 8K - 1 CAM dev queue 1 1K - 1 64 pgrp 90 6K - 785669 64 session 65 9K - 681185 128 proc 2 8K - 2 4096 subproc 1307 1576K - 99873232 256,4096 cred 268 34K - 1054173599 128 ata_generic 9 9K - 9 1024 plimit 44 11K - 5647664 256 uidinfo 29 2K - 384426 32,1024 sysctl 0 0K - 2200402 16,32,64 sysctloid 3411 104K - 3411 16,32,64 sysctltmp 0 0K - 2662228 16,32,128 umtx 1750 110K - 3360 64 SWAP 2 2189K - 2 64 bus 1090 46K - 7017 16,32,64,128,1024 bus-sc 79 28K - 3015 16,32,64,128,256,512,1024,2048,4096 devstat 12 25K - 12 16,4096 eventhandler 51 3K - 51 32,128 CAM SIM 1 1K - 1 64 kobj 257 514K - 315 2048 CAM XPT 10 1K - 17 16,64,512 ad_driver 8 1K - 8 32 ata_dma 10 2K - 10 128 rman 193 13K - 707 16,64 sbuf 0 0K - 5350749 16,32,64,128,256,512,1024,2048,4096 ar_driver 0 0K - 34 512,2048 taskqueue 11 1K - 11 16,128 Unitno 18 1K - 160999938 16,64 ioctlops 0 0K - 31916658 16,32,64,128,256,512,1024 iov 0 0K - 323400897 16,32,64,128,256,4096 msg 4 25K - 4 1024,4096 sem 4 7K - 4 512,1024,4096 shm 124 135K - 65027 1024 ttys 2337 328K - 100279 128,1024 ptys 21 3K - 21 128 accf 35 1K - 12157 16,32 mbextcnt 11 1K - 87164975 16 mbuf_tag 0 0K - 17517357 32 soname 73 9K - 276614136 16,32,128 pcb 106 6K - 18574167 16,32,64,2048 BIO buffer 28 56K - 11612611 1024,2048 vfscache 1 512K - 1 cluster_save buffer 0 0K - 3154212 32,64 VFS hash 1 256K - 1 vnodes 11 1K - 669 16,128 mount 171 5K - 7997 16,32,64,128,2048 vnodemarker 0 0K - 2210275 512 BPF 6 1K - 3103 16,64,128,256 ifnet 7 7K - 8 256,1024 ifaddr 86 19K - 105 16,32,64,128,256,512,2048 ether_multi 22 1K - 26 16,32,64 clone 6 24K - 6 4096 arpcom 3 1K - 3 16 lo 1 1K - 1 16 acd_driver 1 2K - 1 2048 ppbusdev 3 1K - 3 128 routetbl 212 41K - 16997 16,32,64,128,256 in_multi 4 1K - 5 32 IpFw/IpAcct 1 1K - 1 64 ip_moptions 1 1K - 1 128 hostcache 1 24K - 1 syncache 1 8K - 1 in6_multi 16 1K - 16 16,32,64 NFS req 0 0K - 250799856 128 NFSV3 diroff 0 0K - 183024 512 NFS daemon 1 8K - 1 p1003.1b 1 1K - 1 16 pagedep 1 64K - 1 inodedep 1 256K - 1 newblk 1 1K - 1 256 UFS dirhash 770 175K - 10023288 16,32,64,128,256,512,1024,2048,4096 UFS mount 12 245K - 15 256,2048 UMAHash 9 42K - 46 256,512,1024,2048,4096 entropy 1024 64K - 1024 64 USB 49 5K - 49 16,32,64,128,256 USBdev 4 1K - 13 16,128,512 VM pgdata 2 65K - 2 64 DEVFS2 152 3K - 203 16 atkbddev 2 1K - 2 32 DEVFS3 494 62K - 501 128 DEVFS1 152 38K - 154 256 DEVFS_RULE 34 8K - 34 32,256 DEVFS 38 1K - 42 16,128 I/O APIC 4 4K - 4 1024 memdesc 1 4K - 1 4096 nexusdev 3 1K - 3 16 pfs_nodes 20 3K - 20 128 acpica 1207 66K - 26775 16,32,64,128,256,512,1024,2048 acpitask 0 0K - 1 32 PCI Link 16 2K - 16 32,64,128 acpisem 22 2K - 22 64 acpidev 58 2K - 58 32 raid3_data 4 2K - 2597361 16,32,256,512 NULLFS node 182 3K - 1548645220 16 NULLFS hash 1 1K - 1 64 NULLFS mount 5 1K - 5 16 vlan 2 1K - 2 16,64 netgraph_msg 0 0K - 6464 64,128,256,512,1024 netgraph_node 5 2K - 2521 256 netgraph_hook 16 2K - 156 128 netgraph 1 8K - 18 512 netgraph_sock 1 1K - 2453 64 netgraph_path 0 0K - 6464 16,32 netgraph_iface 1 1K - 2 64 netgraph_ppp 1 2K - 2 2048 netgraph_bpf 6 2K - 144 64,128,256,512 netgraph_ksock 0 0K - 16 64 netgraph_mppc 0 0K - 28 1024 ITEM SIZE LIMIT USED FREE REQUESTS FAILURES UMA Kegs: 140, 0, 77, 19, 77, 0 UMA Zones: 480, 0, 77, 3, 77, 0 UMA Slabs: 64, 0, 6484, 1304, 30596118, 0 UMA RCntSlabs: 104, 0, 625, 189, 1205420, 0 UMA Hash: 128, 0, 3, 27, 12, 0 16 Bucket: 76, 0, 39, 111, 186, 0 32 Bucket: 140, 0, 66, 74, 208, 0 64 Bucket: 268, 0, 118, 36, 459, 9 128 Bucket: 524, 0, 10974, 261, 984992, 4546474 VM OBJECT: 132, 0, 42296, 60016, 2315027163, 0 MAP: 192, 0, 7, 13, 7, 0 KMAP ENTRY: 68, 90104, 160, 7512, 98287339, 0 MAP ENTRY: 68, 0, 36757, 15379, 4327383373, 0 PV ENTRY: 24, 2067410, 626121, 1238434, 52068959685, 0 DP fakepg: 72, 0, 0, 0, 0, 0 mt_zone: 1024, 0, 219, 237, 219, 0 16: 16, 0, 3875, 1606, 2218944237, 0 32: 32, 0, 2007, 3643, 157755404, 0 64: 64, 0, 5655, 1012, 8091390625, 0 128: 128, 0, 4065, 1245, 1507077079, 0 256: 256, 0, 3169837, 458, 269064785, 0 512: 512, 0, 928, 1288, 12048433, 0 1024: 1024, 0, 2493, 1407, 405766834, 0 2048: 2048, 0, 512, 788, 103888082, 0 4096: 4096, 0, 399, 533, 114531797, 0 Files: 72, 0, 1799, 2070, 2326899098, 0 TURNSTILE: 52, 0, 1751, 373, 3361, 0 PROC: 536, 0, 332, 641, 99872258, 0 THREAD: 384, 0, 1114, 636, 80501077, 0 KSEGRP: 88, 0, 994, 606, 2875793, 0 UPCALL: 44, 0, 72, 630, 3421747, 0 SLEEPQUEUE: 32, 0, 1751, 509, 3361, 0 VMSPACE: 296, 0, 282, 836, 99798265, 0 mbuf_packet: 256, 0, 288, 804, 1623032273, 0 mbuf: 256, 0, 29, 649, 7723849747, 0 mbuf_cluster: 2048, 25600, 1092, 158, 41217209, 0 mbuf_jumbo_pagesize: 4096, 0, 0, 0, 0, 0 mbuf_jumbo_9k: 9216, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 0, 0, 0, 0, 0 ACL UMA zone: 388, 0, 0, 0, 0, 0 g_bio: 132, 0, 0, 1218, 4046376667, 2 ata_request: 204, 0, 0, 798, 1167883416, 2 ata_composite: 196, 0, 0, 0, 0, 0 VNODE: 272, 0, 32878, 67432, 4286504460, 0 VNODEPOLL: 76, 0, 2, 248, 39, 0 S VFS Cache: 68, 0, 32722, 65670, 3034960875, 0 L VFS Cache: 291, 0, 629, 2790, 66550626, 0 NAMEI: 1024, 0, 1, 667, 9997159801, 0 DIRHASH: 1024, 0, 1850, 434, 21697253, 0 NFSMOUNT: 480, 0, 1, 7, 2, 0 NFSNODE: 464, 0, 1, 3943, 221540609, 0 PIPE: 408, 0, 24, 543, 54218876, 0 KNOTE: 68, 0, 4132, 796, 110922846, 0 socket: 356, 12331, 349, 1081, 47659527, 0 ipq: 32, 904, 0, 904, 2259778, 0 udpcb: 180, 12342, 46, 218, 14346034, 0 inpcb: 180, 12342, 260, 1170, 17672886, 0 tcpcb: 464, 12328, 142, 690, 17672886, 0 tcptw: 48, 2496, 118, 1442, 8139533, 0 syncache: 100, 15366, 2, 622, 12257432, 0 hostcache: 76, 15400, 1220, 1130, 1125859, 0 tcpreass: 20, 1690, 0, 845, 564503, 0 sackhole: 20, 0, 1, 675, 2544305, 0 ripcb: 180, 12342, 1, 153, 637466, 0 unpcb: 144, 12339, 158, 787, 15000640, 0 rtentry: 132, 0, 50, 182, 5723, 0 IPFW dynamic rule: 108, 0, 0, 0, 0, 0 SWAPMETA: 276, 121576, 14525, 22393, 55820649, 0 Mountpoints: 664, 0, 15, 21, 17, 0 FFS inode: 132, 0, 32619, 58383, 2515314975, 0 FFS1 dinode: 128, 0, 0, 0, 0, 0 FFS2 dinode: 256, 0, 32619, 56586, 2515314975, 0 gr3:64k: 65536, 0, 0, 292, 10698518, 178524 gr3:16k: 16384, 0, 0, 348, 53407139, 2817786 gr3:4k: 4096, 0, 0, 284, 53870651, 3399 gr3:64k: 65536, 0, 0, 434, 30935972, 267730 gr3:16k: 16384, 0, 0, 722, 253649141, 26383756 gr3:4k: 4096, 0, 0, 659, 86316934, 4074 NetGraph items: 36, 546, 0, 312, 176587, 0