Date: Mon, 08 Sep 2014 08:34:42 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 193457] New: Enabling RACCT/RCTL causes system hang Message-ID: <bug-193457-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193457 Bug ID: 193457 Summary: Enabling RACCT/RCTL causes system hang Product: Base System Version: 9.3-RELEASE Hardware: amd64 OS: Any Status: Needs Triage Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: j.david.lists@gmail.com When compiling a 9.3 kernel with the RACCT & RCTL options for rctl, the resulting system hangs repeatably. To reproduce this, create a persistent jail and run some processes in it. Nothing needs to be set up for rctl limits. Once there are no more processes in the jail, run rctl on the jail: # rctl -hu jail:jailname The entire system will hang. db> show threads 100285 (0xfffffe0040a8a900) (stack 0xffffff812beaa000) sched_switch() at sched_switch+0x250/frame 0xffffff812bead770 ... probably over 100 more sched_switch() lines 100046 (0xfffffe0002d03000) (stack 0xffffff81222a8000) sched_switch() at sched_switch+0x250/frame 0xffffff81222abb60 100045 (0xfffffe0002d03480) (stack 0xffffff81222a3000) fork_trampoline() at fork_trampoline 100044 (0xfffffe0002d03900) (stack 0xffffff812229e000) sched_switch() at sched_switch+0x250/frame 0xffffff81222a1b60 100041 (0xfffffe0002d05900) (stack 0xffffff812228f000) sched_switch() at sched_switch+0x250/frame 0xffffff8122292b60 100040 (0xfffffe0002d06000) (stack 0xffffff812228a000) sched_switch() at sched_switch+0x250/frame 0xffffff812228db60 100039 (0xfffffe0002c01000) (stack 0xffffff8122285000) fork_trampoline() at fork_trampoline 100036 (0xfffffe0002c0b000) (stack 0xffffff80f77ec000) sched_switch() at sched_switch+0x250/frame 0xffffff80f77efb60 100035 (0xfffffe0002c0b480) (stack 0xffffff80f77e7000) sched_switch() at sched_switch+0x250/frame 0xffffff80f77eab60 100034 (0xfffffe0002c0b900) (stack 0xffffff80f77e2000) fork_trampoline() at fork_trampoline 100033 (0xfffffe0002c0c000) (stack 0xffffff80f77d5000) sched_switch() at sched_switch+0x250/frame 0xffffff80f77d8b60 100032 (0xfffffe0002c0c480) (stack 0xffffff80f77d0000) fork_trampoline() at fork_trampoline 100026 (0xfffffe0002b28900) (stack 0xffffff80f77ac000) fork_trampoline() at fork_trampoline 100025 (0xfffffe0002b29000) (stack 0xffffff80002f5000) fork_trampoline() at fork_trampoline 100024 (0xfffffe0002b29480) (stack 0xffffff80002cd000) fork_trampoline() at fork_trampoline 100023 (0xfffffe0002b29900) (stack 0xffffff80002a5000) sched_switch() at sched_switch+0x250/frame 0xffffff80002a8b60 100022 (0xfffffe0002b23000) (stack 0xffffff80002a0000) sched_switch() at sched_switch+0x250/frame 0xffffff80002a3b60 100016 (0xfffffe0002b26000) (stack 0xffffff8000282000) fork_trampoline() at fork_trampoline 100014 (0xfffffe000296f480) (stack 0xffffff8000278000) sched_switch() at sched_switch+0x250/frame 0xffffff800027bb60 100008 (0xfffffe000295c900) (stack 0xffffff800025a000) fork_trampoline() at fork_trampoline 100007 (0xfffffe000296c000) (stack 0xffffff8000255000) sched_switch() at sched_switch+0x250/frame 0xffffff8000258b60 100006 (0xfffffe000296c480) (stack 0xffffff8000250000) sched_switch() at sched_switch+0x250/frame 0xffffff8000253b60 100005 (0xfffffe000296c900) (stack 0xffffff800024b000) sched_switch() at sched_switch+0x250/frame 0xffffff800024eb60 100004 (0xfffffe000295b000) (stack 0xffffff8000246000) cpustop_handler() at cpustop_handler+0x27/frame 0xffffff8000233cf0 100003 (0xfffffe000295b480) (stack 0xffffff8000241000) kdb_break() at kdb_break+0x55/frame 0xffffff8000244990 100002 (0xfffffe000295b900) (stack 0xffffff800023c000) sched_switch() at sched_switch+0x250/frame 0xffffff800023f720 ... another dozen or so sched_switch() threads 100000 (0xffffffff8152eb80) (stack 0xffffffff817ff000) sched_switch() at sched_switch+0x250/frame 0xffffffff81802bb0 db> show all procs pid ppid pgrp uid state wmesg wchan cmd 7255 645 7255 106 DsJ allpriso 0xffffffff8152f340 jail_start 7252 7233 7233 0 D allproc 0xffffffff815326d0 php 7233 7231 7233 0 Ss wait 0xfffffe004070f000 sh 7231 709 709 0 S piperd 0xfffffe0002f312d8 cron 1634 1616 1634 1000 Ds+ allproc 0xffffffff815326d0 bash 1616 1527 1527 1000 S select 0xfffffe0040a60dc0 sshd 1527 689 1527 0 Ss select 0xfffffe00405656c0 sshd 937 645 937 25000 SsJ select 0xfffffe0002f1c740 python2.7 931 632 627 0 S select 0xfffffe00409a7740 php 929 0 0 0 DL proctree 0xffffffff81532658 [newnfs 1] 928 0 0 0 DL proctree 0xffffffff81532658 [newnfs 0] 925 924 925 1000 Ss+ ttyin 0xfffffe0002cf4ca8 bash 924 921 921 1000 S select 0xfffffe0002ded640 sshd 921 689 921 0 Ss select 0xfffffe00403f12c0 sshd 860 643 627 25000 S (threaded) httpd 100260 S accept 0xfffffe004041e30e httpd 100259 S uwait 0xfffffe00409a9280 httpd .... lots of httpd's waiting around, probably not relevant .... 100132 S uwait 0xfffffe0040567500 httpd 100130 S piperd 0xfffffe0040757888 httpd 855 643 627 0 S piperd 0xfffffe0002f08b60 apache_log_client 854 643 627 65534 S piperd 0xfffffe0040757b60 clogd 852 634 627 59 S kqread 0xfffffe00408a9300 unbound 751 1 751 0 Ss+ ttyin 0xfffffe0002cf78a8 getty 750 1 750 0 Ss+ ttyin 0xfffffe0002d0d4a8 getty 749 1 749 0 Ss+ ttyin 0xfffffe0002d0d8a8 getty 748 1 748 0 Ss+ ttyin 0xfffffe0002d0f0a8 getty 747 1 747 0 Ss+ ttyin 0xfffffe0002d0f4a8 getty 746 1 746 0 Ss+ ttyin 0xfffffe0002d0fca8 getty 745 1 745 0 Ss+ ttyin 0xfffffe0002d104a8 getty 744 1 744 0 Ss+ ttyin 0xfffffe0002d100a8 getty 743 1 743 0 Ss+ ttyin 0xfffffe0002d0f8a8 getty 709 1 709 0 Ss nanslp 0xffffffff81499470 cron 705 1 705 25 Ss pause 0xfffffe004043c548 sendmail 701 1 701 0 Ss select 0xfffffe0040565cc0 sendmail 689 1 689 0 Ss select 0xfffffe00402581c0 sshd 656 1 656 181 Ss select 0xfffffe0040258ac0 nrpe2 643 637 627 0 D proctree 0xffffffff81532658 httpd 639 633 627 0 S select 0xfffffe00409a8340 ntpd 638 629 627 0 S select 0xfffffe0002f1cdc0 supervise 637 629 627 0 S select 0xfffffe0002d857c0 supervise 636 629 627 0 S select 0xfffffe00403f15c0 supervise 634 629 627 0 S select 0xfffffe0002f1a4c0 supervise 633 629 627 0 S select 0xfffffe0002f1bb40 supervise 632 629 627 0 S select 0xfffffe0002f1a240 supervise 630 1 627 0 S piperd 0xfffffe00401cf000 readproctitle 629 1 627 0 D proctree 0xffffffff81532658 svscan 571 1 571 0 Ss rpcsvc 0xfffffe0002f1c6a8 NLM: master 568 1 568 0 Ss select 0xfffffe0002f1c3c0 rpc.statd 545 0 0 0 DL mdwait 0xfffffe004020d800 [md0] 534 1 534 0 Ss select 0xfffffe0002f1a6c0 amd 523 1 523 0 Ss select 0xfffffe0002dee4c0 rpcbind 497 1 497 0 Ss select 0xfffffe0002f1bbc0 syslogd 379 1 379 0 Ss select 0xfffffe0002f1a840 devd 20 0 0 0 DL - 0xffffffff81498320 [schedcpu] 19 0 0 0 DL - 0xffffffff81498320 [racctd] 18 0 0 0 DL sdflush 0xffffffff814c9594 [softdepflush] 17 0 0 0 DL syncer 0xffffffff814c2998 [syncer] 16 0 0 0 DL vlruwt 0xfffffe0002d8c000 [vnlru] 9 0 0 0 DL psleep 0xffffffff814c25bc [bufdaemon] 8 0 0 0 DL pgzero 0xffffffff814cae5c [pagezero] 7 0 0 0 DL pollid 0xffffffff81497e70 [idlepoll] 6 0 0 0 DL psleep 0xffffffff814ca0a0 [vmdaemon] 5 0 0 0 DL psleep 0xffffffff81541e84 [pagedaemon] 4 0 0 0 DL - 0xffffffff81460130 [xpt_thrd] 3 0 0 0 DL waiting_ 0xffffffff81535d48 [sctp_iterator] 2 0 0 0 DL pftm 0xffffffff80336bc0 [pfpurge] 15 0 0 0 DL (threaded) [usb] 100030 D - 0xffffff8000765ef0 [usbus0] 100029 D - 0xffffff8000765e98 [usbus0] 100028 D - 0xffffff8000765e40 [usbus0] 100027 D - 0xffffff8000765de8 [usbus0] 14 0 0 0 DL - 0xffffffff81498320 [yarrow] 13 0 0 0 DL (threaded) [geom] 100011 D - 0xffffffff8152e5d8 [g_down] 100010 D - 0xffffffff8152e5d0 [g_up] 100009 D - 0xffffffff8152e5c8 [g_event] 12 0 0 0 WL (threaded) [intr] 100046 I [swi0: uart] 100045 I [irq12: psm0] 100044 I [irq1: atkbd0] 100041 I [irq263: virtio_pci3] 100040 I [irq262: virtio_pci3] 100039 I [irq261: virtio_pci3] 100036 I [irq260: virtio_pci2] 100035 I [irq259: virtio_pci2] 100034 I [irq258: virtio_pci2] 100033 I [irq257: virtio_pci1] 100032 I [irq256: virtio_pci1] 100026 I [irq11: uhci0+] 100025 I [irq15: ata1] 100024 I [irq14: ata0] 100023 I [swi6: task queue] 100022 I [swi2: cambio] 100016 I [swi5: fast taskq] 100014 I [swi6: Giant taskq] 100008 I [swi3: vm] 100007 I [swi1: netisr 0] 100006 I [swi4: clock] 100005 I [swi4: clock] 11 0 0 0 RL (threaded) [idle] 100004 Run CPU 1 [idle: cpu1] 100003 Run CPU 0 [idle: cpu0] 1 0 1 0 SLs wait 0xfffffe0002958950 [init] 10 0 0 0 DL audit_wo 0xffffffff81539480 [audit] 0 0 0 0 DLs (threaded) [kernel] 100047 D - 0xfffffe0002d1dc00 [mca taskq] 100043 D - 0xfffffe0002ca8000 [vtnet1 txq 0] 100042 D - 0xfffffe0002ca8080 [vtnet1 rxq 0] 100038 D - 0xfffffe0002c0aa80 [vtnet0 txq 0] 100037 D - 0xfffffe0002c0ab00 [vtnet0 rxq 0] 100031 D vtbslp 0xfffffe0002c00080 [virtio_balloon] 100021 D - 0xfffffe0002b0b380 [acpi_task_2] 100020 D - 0xfffffe0002b0b380 [acpi_task_1] 100019 D - 0xfffffe0002b0b380 [acpi_task_0] 100018 D - 0xfffffe0002b0b400 [kqueue taskq] 100017 D - 0xfffffe0002b0b480 [ffs_trim taskq] 100015 D - 0xfffffe0002b0c080 [thread taskq] 100012 D - 0xfffffe0002970980 [firmware taskq] 100000 D swapin 0xffffffff8152e6c8 [swapper] The stuck processes above are those that show state "D" with allproc, proctree, or allpriso. Here is the info about racctd: db> show proc 19 Process 19 (racctd) at 0xfffffe0002d89000: state: NORMAL uid: 0 gids: 0 parent: pid 0 at 0xffffffff8152e6c8 ABI: null threads: 1 100059 D - 0xffffffff81498320 [racctd] And here are the stack traces for some stuck processes: db> trace 1634 Tracing pid 1634 tid 100074 td 0xfffffe0002ea4480 sched_switch() at sched_switch+0x250/frame 0xffffff812bab1930 mi_switch() at mi_switch+0xe1/frame 0xffffff812bab1970 sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bab19a0 _sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bab1a50 fork1() at fork1+0x31e/frame 0xffffff812bab1b00 sys_fork() at sys_fork+0x19/frame 0xffffff812bab1b20 amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812bab1c30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812bab1c30 --- syscall (2, FreeBSD ELF64, sys_fork), rip = 0x800bb308c, rsp = 0x7fffffffe4e8, rbp = 0x7fffffffe530 --- db> trace 7255 Tracing pid 7255 tid 100285 td 0xfffffe0040a8a900 sched_switch() at sched_switch+0x250/frame 0xffffff812bead770 mi_switch() at mi_switch+0xe1/frame 0xffffff812bead7b0 sleepq_wait() at sleepq_wait+0x3a/frame 0xffffff812bead7e0 _sx_xlock_hard() at _sx_xlock_hard+0x4b9/frame 0xffffff812bead890 kern_jail_set() at kern_jail_set+0x3eb8/frame 0xffffff812beadaf0 sys_jail_set() at sys_jail_set+0x41/frame 0xffffff812beadb20 amd64_syscall() at amd64_syscall+0x338/frame 0xffffff812beadc30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff812beadc30 --- syscall (507, FreeBSD ELF64, sys_jail_set), rip = 0x80191ab1c, rsp = 0x7fffffffe268, rbp = 0x7fffffffe310 --- So it looks like racctd is holding onto a lock that is deadlocking any other thread or process from touching the process or jail tables. The underlying cause is probably related to racctd trying to handle the empty jail. This issue is extremely reproducible for us, so if any other information would be helpful, please let me know. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-193457-8>