Date: Thu, 27 Dec 2012 12:28:54 +0100 From: Andreas Longwitz <longwitz@incore.de> To: freebsd-stable@freebsd.org Subject: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup Message-ID: <50DC30F6.1050904@incore.de>
next in thread | raw e-mail | index | archive | help
On a FreeBSD 8-Stable machine with UFS + GJOURNAL (no SU) I observed the same behaviour as described for UFS+SU+J in lists.freebsd.org/pipermail/freebsd-current/2012-January/030937.html. The snapshot was initiated by amanda with dump 3ubLshf 64 1048576 0 - /dev/mirror/gm0p10.journal (pid 50347) and the process creating the snapshot is /sbin/mksnap_ffs /home /home/.snap/dump_snapshot (pid 50350). The process 50350 hangs and also all following processes that tried to access the home partition, some of them work, but don't come to an end, e.g. a shell after (Ctrl T): load: 0.61 cmd: sh 52670 [suspfs] 43.46r 0.00u 0.00s 0% 2272k. All write I/O's to all gjournaled partitions are stopped. Under normal circumstances the snapshot is taken in five seconds, so I have definitiv not the problem I have described in lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html. My system disk with root,var,usr and home is completely mirrored and gjournaled with journals in extra partitions: gpart show mirror/gm0 --> => 34 286749420 mirror/gm0 GPT (136G) 34 128 1 freebsd-boot (64k) 162 8601600 2 freebsd-swap (4.1G) 8601762 2097152 3 freebsd-swap (1.0G) 10698914 4194304 4 freebsd-swap (2.0G) 14893218 4194304 5 freebsd-swap (2.0G) 19087522 4194304 6 freebsd-swap (2.0G) 23281826 2097152 7 freebsd-ufs (1.0G) 25378978 8388608 8 freebsd-ufs (4.0G) 33767586 67108864 9 freebsd-ufs (32G) 100876450 185873004 10 freebsd-ufs (88G) df -h -t ufs --> Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0p7.journal 989M 313M 596M 34% / /dev/mirror/gm0p8.journal 3.9G 2.2G 1.4G 61% /var /dev/mirror/gm0p9.journal 31G 8.6G 19G 30% /usr /dev/mirror/gm0p10.journal 85G 17G 62G 22% /home gmirror status --> Name Status Components mirror/gm0 COMPLETE da6 (ACTIVE) da7 (ACTIVE) gjournal status --> Name Status Components mirror/gm0p7.journal N/A mirror/gm0p3 mirror/gm0p7 mirror/gm0p8.journal N/A mirror/gm0p4 mirror/gm0p8 mirror/gm0p9.journal N/A mirror/gm0p5 mirror/gm0p9 mirror/gm0p10.journal N/A mirror/gm0p6 mirror/gm0p10 I got some information from the hanging system with DDB: KDB: enter: Break to debugger [thread pid 11 tid 100004 ] Stopped at kdb_enter+0x3b: movq $0,0x483332(%rip) db> show pcpu cpuid = 2 dynamic pcpu = 0xffffff807f7d0080 curthread = 0xffffff000235c000: pid 11 "idle: cpu2" curpcb = 0xffffff8000051d00 fpcurthread = none idlethread = 0xffffff000235c000: tid 100004 "idle: cpu2" curpmap = 0xffffffff80889170 tssp = 0xffffffff808f65d0 commontssp = 0xffffffff808f65d0 rsp0 = 0xffffff8000051d00 gs32p = 0xffffffff808f5408 ldt = 0xffffffff808f5448 tss = 0xffffffff808f5438 db> show allpcpu Current CPU: 2 cpuid = 0 dynamic pcpu = 0x449080 curthread = 0xffffff0002368470: pid 11 "idle: cpu0" curpcb = 0xffffff800005bd00 fpcurthread = none idlethread = 0xffffff0002368470: tid 100006 "idle: cpu0" curpmap = 0xffffffff80889170 tssp = 0xffffffff808f6500 commontssp = 0xffffffff808f6500 rsp0 = 0xffffff800005bd00 gs32p = 0xffffffff808f5338 ldt = 0xffffffff808f5378 tss = 0xffffffff808f5368 cpuid = 1 dynamic pcpu = 0xffffff807f7c9080 curthread = 0xffffff00023688e0: pid 11 "idle: cpu1" curpcb = 0xffffff8000056d00 fpcurthread = none idlethread = 0xffffff00023688e0: tid 100005 "idle: cpu1" curpmap = 0xffffffff80889170 tssp = 0xffffffff808f6568 commontssp = 0xffffffff808f6568 rsp0 = 0xffffff8000056d00 gs32p = 0xffffffff808f53a0 ldt = 0xffffffff808f53e0 tss = 0xffffffff808f53d0 cpuid = 2 dynamic pcpu = 0xffffff807f7d0080 curthread = 0xffffff000235c000: pid 11 "idle: cpu2" curpcb = 0xffffff8000051d00 fpcurthread = none idlethread = 0xffffff000235c000: tid 100004 "idle: cpu2" curpmap = 0xffffffff80889170 tssp = 0xffffffff808f65d0 commontssp = 0xffffffff808f65d0 rsp0 = 0xffffff8000051d00 gs32p = 0xffffffff808f5408 ldt = 0xffffffff808f5448 tss = 0xffffffff808f5438 cpuid = 3 dynamic pcpu = 0xffffff807f7d7080 curthread = 0xffffff000235c470: pid 11 "idle: cpu3" curpcb = 0xffffff800004cd00 fpcurthread = none idlethread = 0xffffff000235c470: tid 100003 "idle: cpu3" curpmap = 0xffffffff80889170 tssp = 0xffffffff808f6638 commontssp = 0xffffffff808f6638 rsp0 = 0xffffff800004cd00 gs32p = 0xffffffff808f5470 ldt = 0xffffffff808f54b0 tss = 0xffffffff808f54a0 db> trace Tracing pid 11 tid 100004 td 0xffffff000235c000 kdb_enter() at kdb_enter+0x3b kdb_alt_break_internal() at kdb_alt_break_internal+0x8a uart_intr() at uart_intr+0x2cd intr_event_handle() at intr_event_handle+0x62 intr_execute_handlers() at intr_execute_handlers+0x5f lapic_handle_intr() at lapic_handle_intr+0x37 Xapic_isr1() at Xapic_isr1+0xa5 --- interrupt, rip = 0xffffffff805ae2c6, rsp = 0xffffff8000051b10, rbp = 0xffffff8000051b20 --- acpi_cpu_c1() at acpi_cpu_c1+0x6 acpi_cpu_idle() at acpi_cpu_idle+0x20a sched_idletd() at sched_idletd+0x11f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8000051cf0, rbp = 0 --- db> show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {UNOWNED} db> show lockedvnods Locked vnodes 0xffffff012a730588: tag ufs, type VREG usecount 1, writecount 0, refcount 2 mountedhere 0 flags (VV_SYSTEM) lock type snaplk: EXCL by thread 0xffffff00847cc000 (pid 50350) with exclusive waiters pending ino 23552, on dev mirror/gm0p10.journal db> show lockchain thread 100004 (pid 11, idle: cpu2) running on CPU 2 db> show sleepchain thread 100004 (pid 11, idle: cpu2) running on CPU 2 db> show thread Thread 100004 at 0xffffff000235c000: proc (pid 11): 0xffffff000235a470 name: idle: cpu2 stack: 0xffffff800004e000-0xffffff8000051fff flags: 0x50024 pflags: 0x200000 state: RUNNING (CPU 2) priority: 255 container lock: sched lock 2 (0xffffffff80894d80) db> show proc 50350 Process 50350 (mksnap_ffs) at 0xffffff00847b68e0: state: NORMAL uid: 0 gids: 5 parent: pid 50347 at 0xffffff00842d48e0 ABI: FreeBSD ELF64 arguments: /sbin/mksnap_ffs threads: 1 100215 D suspfs 0xffffff0002f2907c mksnap_ffs db> show threads 100111 (0xffffff0002d85000) (stack 0xffffff8245057000) sched_switch() at sched_switch+0xde 100166 (0xffffff0002f6b000) (stack 0xffffff824516a000) sched_switch() at sched_switch+0xde 100622 (0xffffff006a8d2000) (stack 0xffffff8245a52000) sched_switch() at sched_switch+0xde .... 100215 (0xffffff00847cc000) (stack 0xffffff824525f000) sched_switch() at sched_switch+0xde .... It seems there is a deadlock on the suspfs lock, but I could not figure out who holds this lock. Any hints how to get better diagnostic information for next time the error occurs are welcome. -- Andreas Longwitz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DC30F6.1050904>