Date: Wed, 10 Jan 2007 16:52:07 -0500 From: "Brad L. Chisholm" <blc@bsdwins.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: Kernel hang on 6.x Message-ID: <20070110215207.GA85834@bsdone.bsdwins.com>
next in thread | raw e-mail | index | archive | help
On Thu, 4 Jan 2007 12:53:47 -0500, John Baldwin wrote: > On Thursday 04 January 2007 10:27, Brian Dean wrote: > > > > I believe that I can generate a kernel dump. We tried this yesterday > > but didn't have a dump device configured. > > If this is 6.x, turn on minidumps via the sysctl. The dump size normally is > the size of RAM. With minidumps it can be a lot smaller. If you get a dump, > let me know and I'll point you at some gdb scripts to generate 'ps' type > output, etc. > I work with Brian, and have been helping him analyze this problem. We have been able to generate kernel dumps, and have also done some additional analysis under ddb. Here is a summary of our analysis so far. Suggestions as to how to proceed from here are most welcome. Summary ------- There appears to be a VM deadlock, not due to any actual locks or mutexes being held, but due to an inability to satisfy page faults. In our scenario, one huge process is essentially consuming most of the memory on the machine. Depending upon the physical memory configured, that process will consume 1.5-3 times physical memory. When the machine hangs, however, there is still plenty of swap space available. One possibly interesting observation is that the hang always seems to occur when ~14G of swap has been consumed, regardless of the size of physmem (6G, 8G, 16G) or amount of swap (32G, 64G) configured. Dropping to DDB on the serial console, and running "show locks", "show alllocks", and "show lockedvnods" usually indicate that only an sio lock is being held (probably for the serial console). A "ps" indicates that the pagedaemon is in VMWait state, while a number of other processes (including the huge memory process) are in pfault state. We have been able to generate a full core dump, as well as a couple of minidumps with differing hw.physmem settings. We can make these dumps available, if that would help, or we can provide information from gdb on request. We can usually reproduce the hang within 1.5-2 hours, and can provide additional information from ddb, as well. Details ------- Here is an example of a 'top' session that was running when the machine became unresponsive. Note the one huge 'sdsbuild' process. last pid: 3331; load averages: 0.00, 0.27, 0.44 up 0+03:22:40 17:06:05 42 processes: 1 running, 41 sleeping CPU states: 0.0% user, 0.0% nice, 0.8% system, 0.0% interrupt, 99.2% idle Mem: 14G Active, 1115M Inact, 535M Wired, 39M Cache, 214M Buf, 21M Free Swap: 64G Total, 14G Used, 50G Free, 21% Inuse PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 2698 blc 1 114 0 29579M 14635M pfault 30:40 0.00% sdsbuild 3159 blc 1 5 0 7492K 856K ttyin 0:23 0.00% systat 3158 root 1 96 0 6984K 744K RUN 0:04 0.00% top 2096 root 1 96 0 3656K 104K select 0:02 0.00% rlogind 2699 blc 1 -8 0 2492K 212K piperd 0:01 0.00% tee 1092 root 1 96 0 2692K 168K select 0:01 0.00% cvd 773 root 1 96 0 7712K 600K select 0:01 0.00% ntpd 736 root 1 97 0 3984K 0K select 0:00 0.00% <mountd> 2525 root 1 96 0 3656K 104K select 0:00 0.00% rlogind 1986 root 1 96 0 3656K 104K select 0:00 0.00% rlogind 707 root 1 96 0 9760K 352K select 0:00 0.00% amd 817 root 1 96 0 9340K 564K select 0:00 0.00% sendmail 657 root 1 96 0 3644K 304K select 0:00 0.00% syslogd 2000 root 1 20 0 9796K 0K pause 0:00 0.00% <csh> 825 root 1 -16 0 3664K 336K vmwait 0:00 0.00% cron 2527 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh> 676 root 1 96 0 4940K 352K select 0:00 0.00% rpcbind 1956 root 1 96 0 6108K 0K select 0:00 0.00% <inetd> 738 root 1 109 0 2724K 0K select 0:00 0.00% <nfsd> 1987 root 1 8 0 18408K 0K wait 0:00 0.00% <login> 2097 root 1 8 0 18408K 0K wait 0:00 0.00% <login> 2526 root 1 8 0 18408K 0K wait 0:00 0.00% <login> 2098 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh> 793 root 1 96 0 2516K 152K select 0:00 0.00% usbd 1999 blc 1 8 0 18400K 0K wait 0:00 0.00% <su> 1988 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh> 2682 blc 1 20 0 964K 0K pause 0:00 0.00% <ksh> 1982 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> 1974 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> 1981 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> 1978 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> 1977 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> 1976 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty> Here is a 'ps' from ddb corresponding to the 'top' session above: db> ps pid ppid pgrp uid state wmesg wchan cmd 3332 817 817 0 N sendmail 3331 825 825 0 N cron 3159 2527 3159 207 S+ ttyin 0xffffff02574d4c10 systat 3158 2000 3158 0 SL+ pfault 0xffffffff809326f8 top 3155 0 0 0 SL mdwait 0xffffff030c9e7800 [md0] 2699 2682 2682 207 S+ piperd 0xffffff004e432000 tee 2698 2682 2682 207 SL+ pfault 0xffffffff809326f8 sdsbuild 2682 2098 2682 207 SW+ pause 0xffffff03253933c0 ksh 2527 2526 2527 207 SW+ pause 0xffffff030b906068 ksh 2526 2525 2526 0 SWs+ wait 0xffffff031af246b0 login 2525 1956 2525 0 Ss select 0xffffffff8091fe50 rlogind 2098 2097 2098 207 SW+ pause 0xffffff030bbc7a70 ksh 2097 2096 2097 0 SWs+ wait 0xffffff030bbc7358 login 2096 1956 2096 0 Ss select 0xffffffff8091fe50 rlogind 2000 1999 2000 0 SW+ pause 0xffffff030663b718 csh 1999 1988 1999 207 SW+ wait 0xffffff030663ba08 su 1988 1987 1988 207 SW+ pause 0xffffff031b2e4718 ksh 1987 1986 1987 0 SWs+ wait 0xffffff031b2e4a08 login 1986 1956 1986 0 Ss select 0xffffffff8091fe50 rlogind 1982 1 1982 0 ?s+ getty 1981 1 1981 0 SWs+ ttyin 0xffffff03e0c42410 getty 1980 1 1980 0 SWs+ ttyin 0xffffff03e0c42010 getty 1979 1 1979 0 SWs+ ttyin 0xffffff03dd03c010 getty 1978 1 1978 0 SWs+ ttyin 0xffffff03e14b6410 getty 1977 1 1977 0 SWs+ ttyin 0xffffff03e0921810 getty 1976 1 1976 0 SWs+ ttyin 0xffffff03e0921410 getty 1975 1 1975 0 SWs+ ttyin 0xffffff03e0200410 getty 1974 1 1974 0 SWs+ ttyin 0xffffff03e0200010 getty 1956 1 1956 0 SWs select 0xffffffff8091fe50 inetd 1092 1 1092 0 Ss select 0xffffffff8091fe50 cvd 825 1 825 0 SLs vmwait 0xffffffff809326f8 cron 817 1 817 0 SLs vmwait 0xffffffff809326f8 sendmail 811 1 811 0 SWs select 0xffffffff8091fe50 sshd 793 1 793 0 Ss select 0xffffffff8091fe50 usbd 773 1 773 0 SLs pfault 0xffffffff809326f8 ntpd 745 738 738 0 SW - 0xffffff03e068bc00 nfsd 742 738 738 0 SW - 0xffffff0000f47600 nfsd 741 738 738 0 SW - 0xffffff03e068b400 nfsd 740 738 738 0 SW - 0xffffff03e068b600 nfsd 738 1 738 0 SWs select 0xffffffff8091fe50 nfsd 736 1 736 0 SWs select 0xffffffff8091fe50 mountd 707 1 707 0 SLs pfault 0xffffffff809326f8 amd 676 1 676 0 Ss select 0xffffffff8091fe50 rpcbind 657 1 657 0 Ss select 0xffffffff8091fe50 syslogd 547 1 547 0 SWs select 0xffffffff8091fe50 devd 41 0 0 0 SL - 0xffffffffbf5ddbe4 [schedcpu] 40 0 0 0 SL sdflush 0xffffffff80931ae0 [softdepflush] 39 0 0 0 SL syncer 0xffffffff8089aa20 [syncer] 38 0 0 0 SL vlruwt 0xffffff03dccfc6b0 [vnlru] 37 0 0 0 SL psleep 0xffffffff80920718 [bufdaemon] 36 0 0 0 SL pgzero 0xffffffff80933460 [pagezero] 35 0 0 0 SL psleep 0xffffffff80932b2c [vmdaemon] 34 0 0 0 SL VMWait 0xffffffff80932ae0 [pagedaemon] 33 0 0 0 WL [swi0: sio] 32 0 0 0 WL [irq15: ata1] 31 0 0 0 WL [irq14: ata0] 30 0 0 0 SL usbevt 0xffffffff94066420 [usb1] 29 0 0 0 SL usbtsk 0xffffffff80895750 [usbtask] 28 0 0 0 SL usbevt 0xffffffff94064420 [usb0] 27 0 0 0 WL [irq19: ohci0 ohci1 ] 26 0 0 0 SL idle 0xffffffff940562e0 [mpt_raid0] 25 0 0 0 SL idle 0xffffffff94056000 [mpt_recovery0] 24 0 0 0 WL [irq28: mpt0] 23 0 0 0 SL - 0xffffff0000e2d700 [em3 taskq] 22 0 0 0 SL - 0xffffff0000e2dd00 [em2 taskq] 21 0 0 0 SL - 0xffffff0000e2a200 [em1 taskq] 20 0 0 0 SL - 0xffffff0000e36300 [em0 taskq] 19 0 0 0 WL [irq9: acpi0] 18 0 0 0 WL [swi6: Giant taskq] 9 0 0 0 SL - 0xffffff0000db5000 [thread taskq] 17 0 0 0 WL [swi5: +] 8 0 0 0 SL - 0xffffff0000db5300 [acpi_task_2] 7 0 0 0 SL - 0xffffff0000db5300 [acpi_task_1] 6 0 0 0 SL - 0xffffff0000db5300 [acpi_task_0] 5 0 0 0 SL - 0xffffff0000db5400 [kqueue taskq] 16 0 0 0 WL [swi2: cambio] 15 0 0 0 WL [swi6: task queue] 14 0 0 0 SL - 0xffffffff80893300 [yarrow] 4 0 0 0 SL - 0xffffffff80896468 [g_down] 3 0 0 0 SL - 0xffffffff80896460 [g_up] 2 0 0 0 SL - 0xffffffff80896450 [g_event] 13 0 0 0 WL [swi1: net] 12 0 0 0 WL [swi3: vm] 11 0 0 0 WL [swi4: clock sio] 10 0 0 0 RL CPU 0 [idle] 1 0 1 0 SLs wait 0xffffff03e1566000 [init] 0 0 0 0 SLs vmwait 0xffffffff809326f8 [swapper] Here is where the pagedaemon process appears to be whenever the hang occurs: db> tr 34 Tracing pid 34 tid 100027 td 0xffffff03e1557000 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait() at sleepq_wait+0x2e msleep() at msleep+0x191 swp_pager_meta_build() at swp_pager_meta_build+0x13b swap_pager_putpages() at swap_pager_putpages+0x261 vm_pageout_flush() at vm_pageout_flush+0xde vm_pageout() at vm_pageout+0x1668 fork_exit() at fork_exit+0x87 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffbf56ed00, rbp = 0 --- Here is the lock info from ddb: db> show locks exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390 db> show alllocks db> show lockedvnods Locked vnodes db> show pcpu cpuid = 0 curthread = 0xffffff017f421720: pid 10 "idle" curpcb = 0xffffffffb6ce2d10 fpcurthread = none idlethread = 0xffffff017f421720: pid 10 "idle" spin locks held: exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390 db> show allpcpu Current CPU: 0 cpuid = 0 curthread = 0xffffff017f421720: pid 10 "idle" curpcb = 0xffffffffb6ce2d10 fpcurthread = none idlethread = 0xffffff017f421720: pid 10 "idle" spin locks held: exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390 Here is a 'vmstat -s' snapshot taken as the machine hung: 45794937 cpu context switches 20220635 device interrupts 810196 software interrupts 66195618 traps 6615004 system calls 61 kernel threads created 5940 fork() calls 560 vfork() calls 0 rfork() calls 1851 swap pager pageins 13980 swap pager pages paged in 234711 swap pager pageouts 3670484 swap pager pages paged out 1088 vnode pager pageins 5591 vnode pager pages paged in 0 vnode pager pageouts 0 vnode pager pages paged out 744 page daemon wakeups 301408821 pages examined by the page daemon 3228902 pages reactivated 233903 copy-on-write faults 425 copy-on-write optimized faults 55595560 zero fill pages zeroed 10623188 zero fill pages prezeroed 1878 intransit blocking page faults 66078208 total VM faults taken 0 pages affected by kernel thread creation 377526 pages affected by fork() 1282533 pages affected by vfork() 0 pages affected by rfork() 71450207 pages freed 3805 pages freed by daemon 7974727 pages freed by exiting processes 3577239 pages active 294202 pages inactive 13662 pages in VM cache 106557 pages wired down 10548 pages free 4096 bytes per page 3665219 total name lookups cache hits (75% pos + 0% neg) system 1% per-directory deletions 0%, falsehits 0%, toolong 0% Here is a snapshot of 'sysctl vm' as the machine hung: vm.vmtotal: System wide totals computed every five seconds: (values in kilobytes) =============================================== Processes: (RUNQ: 2 Disk Wait: 1 Page Wait: 1 Sleep: 43) Virtual Memory: (Total: 35242K, Active 30153728K) Real Memory: (Total: 15917288K Active 15540600K) Shared Virtual Memory: (Total: 7128K Active: 6576K) Shared Real Memory: (Total: 2368K Active: 2368K) Free Memory Pages: 96860K vm.loadavg: { 2.67 1.84 1.53 } vm.v_free_min: 25512 vm.v_free_target: 107384 vm.v_free_reserved: 5336 vm.v_inactive_target: 161076 vm.v_cache_min: 107384 vm.v_cache_max: 214768 vm.v_pageout_free_min: 34 vm.pageout_algorithm: 0 vm.swap_enabled: 1 vm.kmem_size_scale: 3 vm.kmem_size_max: 419430400 vm.kmem_size: 419430400 vm.nswapdev: 1 vm.dmmax: 32 vm.swap_async_max: 4 vm.zone_count: 69 vm.zone: ITEM SIZE LIMIT USED FREE REQUESTS FFS2 dinode: 256, 0, 38765, 25060, 633124 FFS1 dinode: 128, 0, 0, 0, 0 FFS inode: 192, 0, 38765, 26355, 633124 Mountpoints: 936, 0, 181, 11, 181 SWAPMETA: 288, 116519, 116519, 0, 116551 rtentry: 264, 0, 40, 30, 72 ripcb: 304, 33792, 0, 24, 18 sackhole: 32, 0, 0, 0, 0 tcpreass: 40, 2184, 0, 0, 0 hostcache: 136, 15372, 17, 39, 20 syncache: 128, 15370, 0, 58, 12 tcptw: 80, 6795, 0, 810, 1595 tcpcb: 752, 33795, 34, 21, 1655 inpcb: 304, 33792, 34, 794, 1655 udpcb: 304, 33792, 204, 24, 3543 ipq: 56, 1071, 0, 126, 2604 unpcb: 192, 33800, 12, 28, 577 socket: 616, 33792, 250, 26, 5794 KNOTE: 120, 0, 0, 62, 6251 PIPE: 768, 0, 2, 23, 2416 NFSNODE: 816, 0, 9012, 7898, 65144 NFSMOUNT: 584, 0, 174, 15, 174 DIRHASH: 1024, 0, 1219, 293, 2098 L VFS Cache: 327, 0, 729, 687, 10160 S VFS Cache: 104, 0, 52929, 30231, 714353 NAMEI: 1024, 0, 4, 8, 1395238 VNODEPOLL: 152, 0, 0, 0, 0 VNODE: 648, 0, 47992, 34748, 698512 ata_composit: 376, 0, 0, 0, 0 ata_request: 336, 0, 0, 22, 24 g_bio: 216, 0, 0, 1818, 1418264 ACL UMA zone: 388, 0, 0, 0, 0 mbuf_jumbo_1: 16384, 0, 0, 0, 0 mbuf_jumbo_9: 9216, 0, 0, 0, 0 mbuf_jumbo_p: 4096, 0, 0, 0, 0 mbuf_cluster: 2048, 33792, 1152, 72, 7441 mbuf: 256, 0, 1169, 121, 2197657 mbuf_packet: 256, 0, 1152, 138, 1780924 VMSPACE: 544, 0, 47, 16, 6504 UPCALL: 88, 0, 0, 0, 0 KSEGRP: 136, 0, 112, 18, 112 THREAD: 608, 0, 112, 8, 112 PROC: 856, 0, 88, 24, 6562 Files: 120, 0, 149, 68, 223724 4096: 4096, 0, 351, 8, 9244 2048: 2048, 0, 29, 137, 3815 1024: 1024, 0, 727, 17, 1059555 512: 512, 0, 593, 149, 16584 256: 256, 0, 753, 117, 57787 128: 128, 0, 5084, 1325, 291338 64: 64, 0, 3596, 436, 53093 32: 32, 0, 1895, 226, 27977 16: 16, 0, 3166, 194, 389649 mt_zone: 64, 0, 203, 77, 203 DP fakepg: 120, 0, 0, 0, 0 PV ENTRY: 48, 5678064, 3753306, 116118, 106702024 MAP ENTRY: 112, 0, 1707, 207, 273770 KMAP ENTRY: 112, 98406, 16, 116, 11646 MAP: 352, 0, 7, 15, 7 VM OBJECT: 224, 0, 41676, 8678, 178636 128 Bucket: 1048, 0, 240, 306, 1081 64 Bucket: 536, 0, 38, 32, 73 32 Bucket: 280, 0, 16, 26, 39 16 Bucket: 152, 0, 30, 20, 43 UMA Hash: 256, 0, 5, 10, 8 UMA RCntSlab: 128, 0, 612, 26, 619 UMA Slabs: 128, 0, 4452, 3668, 14083 UMA Zones: 184, 0, 67, 13, 67 UMA Kegs: 240, 0, 67, 8, 67 vm.old_contigmalloc: 0 vm.swap_idle_threshold2: 10 vm.swap_idle_threshold1: 2 vm.exec_map_entries: 16 vm.stats.misc.zero_page_count: 1415 vm.stats.misc.cnt_prezero: 6936472 vm.stats.vm.v_kthreadpages: 0 vm.stats.vm.v_rforkpages: 0 vm.stats.vm.v_vforkpages: 1282533 vm.stats.vm.v_forkpages: 377616 vm.stats.vm.v_kthreads: 61 vm.stats.vm.v_rforks: 0 vm.stats.vm.v_vforks: 560 vm.stats.vm.v_forks: 5941 vm.stats.vm.v_interrupt_free_min: 2 vm.stats.vm.v_pageout_free_min: 34 vm.stats.vm.v_cache_max: 214768 vm.stats.vm.v_cache_min: 107384 vm.stats.vm.v_cache_count: 13662 vm.stats.vm.v_inactive_count: 294202 vm.stats.vm.v_inactive_target: 161076 vm.stats.vm.v_active_count: 3577241 vm.stats.vm.v_wire_count: 106556 vm.stats.vm.v_free_count: 10547 vm.stats.vm.v_free_min: 25512 vm.stats.vm.v_free_target: 107384 vm.stats.vm.v_free_reserved: 5336 vm.stats.vm.v_page_count: 4035617 vm.stats.vm.v_page_size: 4096 vm.stats.vm.v_tfree: 71450323 vm.stats.vm.v_pfree: 7974792 vm.stats.vm.v_dfree: 3805 vm.stats.vm.v_pdpages: 301408821 vm.stats.vm.v_pdwakeups: 744 vm.stats.vm.v_reactivated: 3228902 vm.stats.vm.v_intrans: 1878 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 5591 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 1088 vm.stats.vm.v_swappgsout: 3670484 vm.stats.vm.v_swappgsin: 13980 vm.stats.vm.v_swapout: 234711 vm.stats.vm.v_swapin: 1851 vm.stats.vm.v_ozfod: 10623195 vm.stats.vm.v_zfod: 55595610 vm.stats.vm.v_cow_optim: 425 vm.stats.vm.v_cow_faults: 233938 vm.stats.vm.v_vm_faults: 66078331 vm.stats.sys.v_soft: 810461 vm.stats.sys.v_intr: 20220649 vm.stats.sys.v_syscall: 6616609 vm.stats.sys.v_trap: 66195742 vm.stats.sys.v_swtch: 45795521 vm.v_free_severe: 15424 vm.max_proc_mmap: 37449 vm.old_msync: 0 vm.msync_flush_flags: 3 vm.boot_pages: 48 vm.pageout_lock_miss: 0 vm.disable_swapspace_pageouts: 0 vm.defer_swapspace_pageouts: 0 vm.swap_idle_enabled: 0 vm.pageout_stats_interval: 5 vm.pageout_full_stats_interval: 20 vm.pageout_stats_max: 107384 vm.max_launder: 32 vm.idlezero_maxrun: 16 vm.idlezero_enable: 1 vm.kvm_free: 1042280448 vm.kvm_size: 2147479552 --- Brad Chisholm blc@bsdwins.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070110215207.GA85834>