Date: Fri, 26 Oct 2007 23:02:47 +0300 From: "Gleb Kozyrev" <gkozyrev@gmail.com> To: freebsd-current@freebsd.org Cc: Kostik Belousov <kostikbel@gmail.com>, Robert Watson <rwatson@freebsd.org> Subject: Re: Deadlock, exclusive sx so_rcv_sx, amd64 Message-ID: <ca661e8a0710261302j188a05c4h63728eab67f6068@mail.gmail.com> In-Reply-To: <20071026184536.GE37471@deviant.kiev.zoral.com.ua> References: <ca661e8a0710211529y2a48b4a9h78da4211b3f09274@mail.gmail.com> <200710251435.58984.jhb@freebsd.org> <ca661e8a0710260252u4f5c1c3di22828143e81e3df2@mail.gmail.com> <200710261222.28656.jhb@freebsd.org> <20071026184536.GE37471@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 26/10/2007, Kostik Belousov <kostikbel@gmail.com> wrote: > On Fri, Oct 26, 2007 at 12:22:28PM -0400, John Baldwin wrote: > > "sbwait" is waiting for data to come in on a socket and "pfault" is > > waiting on disk I/O. It is a bit odd that 1187 is holding a lock while > No, pfault means that the process handles page faults, and sleeps > waiting for some page to become available (either from cache or free list). > > > sleeping though that is permitted with an sx lock. Still, if it's > > supposed to be protect socket's receive buffer that is odd. Maybe get > > a trace of the process blocked in "sbwait" (tr <pid>) and bug rwatson@ > > about it. > Thank you Konstantin and John for looking into this. I didn't do alltrace and friends this time so here's info from a previous occurence: http://debug.files.googlepages.com/Serial-20071022-004528.txt Some of it is below as requested: db> show alllocks Process 1406 (systat) thread 0xffffff002aa016a0 (100181) shared sx allproc r = 0 (0xffffffff80a72820) locked @ /usr/src/sys/vm/vm_meter.c:130 exclusive sx sysctl lock r = 0 (0xffffffff80a72f40) locked @ /usr/src/sys/kern/kern_sysctl.c:1396 Process 1129 (rtorrent) thread 0xffffff000a6ae000 (100163) exclusive sx user map r = 0 (0xffffff0001056890) locked @ /usr/src/sys/vm/vm_map.c:2195 Process 1105 (sshd) thread 0xffffff000a6ae350 (100162) exclusive sx so_rcv_sx r = 0 (0xffffff000a78a928) locked @ /usr/src/sys/kern/uipc_sockbuf.c:145 Process 1040 (cron) thread 0xffffff0001c61350 (100130) shared sx proctree r = 0 (0xffffffff80a72860) locked @ /usr/src/sys/kern/kern_fork.c:286 Process 998 (mysqld) thread 0xffffff002aa119f0 (100192) exclusive sx so_rcv_sx r = 0 (0xffffff0003858670) locked @ /usr/src/sys/kern/uipc_sockbuf.c:145 Process 998 (mysqld) thread 0xffffff002aa12000 (100191) exclusive sx so_rcv_sx r = 0 (0xffffff000abbb928) locked @ /usr/src/sys/kern/uipc_sockbuf.c:145 Process 932 (smbd) thread 0xffffff000140c000 (100068) shared sx proctree r = 0 (0xffffffff80a72860) locked @ /usr/src/sys/kern/kern_fork.c:286 db> ps pid ppid pgrp uid state wmesg wchan cmd 1416 1414 1416 1001 S+ zfs:(&tx 0xffffff000152cd90 ftp 1414 1111 1414 1001 Ss+ pause 0xffffff000abff520 tcsh 1406 1127 1406 1001 S+ user map 0xffffff0001056890 systat 1134 932 932 1001 SL pfault 0xffffffff80b1359c smbd 1133 1017 1017 80 S zfs 0xffffff000a399288 httpd 1129 1113 1129 1001 S+ vmwait 0xffffffff80b1359c rtorrent 1127 1111 1127 1001 Ss+ pause 0xffffff000a1f90c0 tcsh 1113 1111 1113 1001 Ss+ pause 0xffffff0001c5b520 tcsh 1111 1110 1111 1001 SLs pfault 0xffffffff80b1359c screen 1110 1108 1108 1001 S+ pause 0xffffff000a1fa980 screen 1108 1107 1108 1001 Ss+ pause 0xffffff000a6a9520 tcsh 1107 1105 1105 1001 S select 0xffffffff80af79d0 sshd 1105 1033 1105 0 Ss sbwait 0xffffff000a78a984 sshd ... Tracing command sshd pid 1105 tid 100162 td 0xffffff000a6ae350 sched_switch() at sched_switch+0x18c mi_switch() at mi_switch+0x22f sleepq_switch() at sleepq_switch+0xc7 sleepq_catch_signals() at sleepq_catch_signals+0x25a sleepq_wait_sig() at sleepq_wait_sig+0x16 _sleep() at _sleep+0x309 soreceive_generic() at soreceive_generic+0x101c dofileread() at dofileread+0xa1 kern_readv() at kern_readv+0x4c read() at read+0x54 syscall() at syscall+0x1ce Xfast_syscall() at Xfast_syscall+0xab --- syscall (3, FreeBSD ELF64, read), rip = 0x8016f528c, rsp = 0x7fffffffe558, rbp = 0x4 --- Tracing command rtorrent pid 1129 tid 100163 td 0xffffff000a6ae000 sched_switch() at sched_switch+0x18c mi_switch() at mi_switch+0x22f sleepq_switch() at sleepq_switch+0xc7 sleepq_wait() at sleepq_wait+0x44 _sleep() at _sleep+0x351 kmem_malloc() at kmem_malloc+0x2a2 uma_large_malloc() at uma_large_malloc+0x4a malloc() at malloc+0x12d arc_get_data_buf() at arc_get_data_buf+0x36e arc_buf_alloc() at arc_buf_alloc+0xe4 arc_read() at arc_read+0xfa dbuf_prefetch() at dbuf_prefetch+0x137 dmu_zfetch_dofetch() at dmu_zfetch_dofetch+0x10b dmu_zfetch() at dmu_zfetch+0x4aa dbuf_read() at dbuf_read+0x535 dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0x9a dmu_tx_count_write() at dmu_tx_count_write+0x178 dmu_tx_hold_write() at dmu_tx_hold_write+0x4a zfs_freebsd_write() at zfs_freebsd_write+0x399 VOP_WRITE_APV() at VOP_WRITE_APV+0x10b vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x19c vnode_pager_putpages() at vnode_pager_putpages+0x97 vm_pageout_flush() at vm_pageout_flush+0x14b vm_object_page_collect_flush() at vm_object_page_collect_flush+0x2d1 vm_object_page_clean() at vm_object_page_clean+0x165 vm_object_sync() at vm_object_sync+0x1f0 vm_map_sync() at vm_map_sync+0x11e msync() at msync+0x62 syscall() at syscall+0x1ce Xfast_syscall() at Xfast_syscall+0xab --- syscall (65, FreeBSD ELF64, msync), rip = 0x8016c37ec, rsp = 0x7fffffffe408, rbp = 0x1 --- (kgdb) print cnt $1 = {v_swtch = 0, v_trap = 0, v_syscall = 0, v_intr = 0, v_soft = 0, v_vm_faults = 0, v_cow_faults = 0, v_cow_optim = 0, v_zfod = 0, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 0, v_vnodeout = 0, v_vnodepgsin = 0, v_vnodepgsout = 0, v_intrans = 0, v_reactivated = 255459, v_pdwakeups = 507, v_pdpages = 34896653, v_tcached = 19148245, v_dfree = 0, v_pfree = 74993, v_tfree = 0, v_page_size = 4096, v_page_count = 246565, v_free_reserved = 371, v_free_target = 6779, v_free_min = 1602, v_free_count = 890, v_wire_count = 43890, v_active_count = 187459, v_inactive_target = 10168, v_inactive_count = 14025, v_cache_count = 0, v_cache_min = 6779, v_cache_max = 13558, v_pageout_free_min = 34, v_interrupt_free_min = 2, v_free_severe = 986, v_forks = 0, v_vforks = 0, v_rforks = 0, v_kthreads = 0, v_forkpages = 0, v_vforkpages = 0, v_rforkpages = 0, v_kthreadpages = 0} I managed to find a kernel message that precedes the freeze: Approaching the limit on PV entries, consider increasing sysctl vm.pmap.shpgperproc or vm.pmap.pv_entry_max rtorrent likes to mmap large files. After increasing (one of) these sysctls everything seems ok. Probably, setting rtorrent's memory limit could have helped either. Is there a way to know if some process is becoming close to that limit? -- With best regards, Gleb Kozyrev.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ca661e8a0710261302j188a05c4h63728eab67f6068>