Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Dec 2012 12:28:54 +0100
From:      Andreas Longwitz <longwitz@incore.de>
To:        freebsd-stable@freebsd.org
Subject:   FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup
Message-ID:  <50DC30F6.1050904@incore.de>

next in thread | raw e-mail | index | archive | help
On a FreeBSD 8-Stable machine with UFS + GJOURNAL (no SU) I observed the
same behaviour as described for UFS+SU+J in
  lists.freebsd.org/pipermail/freebsd-current/2012-January/030937.html.

The snapshot was initiated by amanda with
  dump 3ubLshf 64 1048576 0 - /dev/mirror/gm0p10.journal (pid 50347)
and the process creating the snapshot is
  /sbin/mksnap_ffs /home /home/.snap/dump_snapshot (pid 50350).

The process 50350 hangs and also all following processes that tried to
access the home partition, some of them work, but don't come to an end,
e.g. a shell after (Ctrl T):
  load: 0.61  cmd: sh 52670 [suspfs] 43.46r 0.00u 0.00s 0% 2272k.

All write I/O's to all gjournaled partitions are stopped. Under normal
circumstances the snapshot is taken in five seconds, so I have definitiv
not the problem I have described in
  lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html.

My system disk with root,var,usr and home is completely mirrored and
gjournaled with journals in extra partitions:
gpart show mirror/gm0 -->
=>       34  286749420  mirror/gm0  GPT  (136G)
         34        128           1  freebsd-boot  (64k)
        162    8601600           2  freebsd-swap  (4.1G)
    8601762    2097152           3  freebsd-swap  (1.0G)
   10698914    4194304           4  freebsd-swap  (2.0G)
   14893218    4194304           5  freebsd-swap  (2.0G)
   19087522    4194304           6  freebsd-swap  (2.0G)
   23281826    2097152           7  freebsd-ufs  (1.0G)
   25378978    8388608           8  freebsd-ufs  (4.0G)
   33767586   67108864           9  freebsd-ufs  (32G)
  100876450  185873004          10  freebsd-ufs  (88G)
df -h -t ufs -->
Filesystem                    Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0p7.journal     989M    313M    596M    34%    /
/dev/mirror/gm0p8.journal     3.9G    2.2G    1.4G    61%    /var
/dev/mirror/gm0p9.journal      31G    8.6G     19G    30%    /usr
/dev/mirror/gm0p10.journal     85G     17G     62G    22%    /home
gmirror status -->
      Name    Status  Components
mirror/gm0  COMPLETE  da6 (ACTIVE)
                      da7 (ACTIVE)
gjournal status -->
                 Name  Status  Components
 mirror/gm0p7.journal     N/A  mirror/gm0p3
                               mirror/gm0p7
 mirror/gm0p8.journal     N/A  mirror/gm0p4
                               mirror/gm0p8
 mirror/gm0p9.journal     N/A  mirror/gm0p5
                               mirror/gm0p9
mirror/gm0p10.journal     N/A  mirror/gm0p6
                               mirror/gm0p10

I got some information from the hanging system with DDB:
KDB: enter: Break to debugger
[thread pid 11 tid 100004 ]
Stopped at      kdb_enter+0x3b: movq    $0,0x483332(%rip)
db> show pcpu
cpuid        = 2
dynamic pcpu = 0xffffff807f7d0080
curthread    = 0xffffff000235c000: pid 11 "idle: cpu2"
curpcb       = 0xffffff8000051d00
fpcurthread  = none
idlethread   = 0xffffff000235c000: tid 100004 "idle: cpu2"
curpmap      = 0xffffffff80889170
tssp         = 0xffffffff808f65d0
commontssp   = 0xffffffff808f65d0
rsp0         = 0xffffff8000051d00
gs32p        = 0xffffffff808f5408
ldt          = 0xffffffff808f5448
tss          = 0xffffffff808f5438
db> show allpcpu
Current CPU: 2

cpuid        = 0
dynamic pcpu = 0x449080
curthread    = 0xffffff0002368470: pid 11 "idle: cpu0"
curpcb       = 0xffffff800005bd00
fpcurthread  = none
idlethread   = 0xffffff0002368470: tid 100006 "idle: cpu0"
curpmap      = 0xffffffff80889170
tssp         = 0xffffffff808f6500
commontssp   = 0xffffffff808f6500
rsp0         = 0xffffff800005bd00
gs32p        = 0xffffffff808f5338
ldt          = 0xffffffff808f5378
tss          = 0xffffffff808f5368

cpuid        = 1
dynamic pcpu = 0xffffff807f7c9080
curthread    = 0xffffff00023688e0: pid 11 "idle: cpu1"
curpcb       = 0xffffff8000056d00
fpcurthread  = none
idlethread   = 0xffffff00023688e0: tid 100005 "idle: cpu1"
curpmap      = 0xffffffff80889170
tssp         = 0xffffffff808f6568
commontssp   = 0xffffffff808f6568
rsp0         = 0xffffff8000056d00
gs32p        = 0xffffffff808f53a0
ldt          = 0xffffffff808f53e0
tss          = 0xffffffff808f53d0

cpuid        = 2
dynamic pcpu = 0xffffff807f7d0080
curthread    = 0xffffff000235c000: pid 11 "idle: cpu2"
curpcb       = 0xffffff8000051d00
fpcurthread  = none
idlethread   = 0xffffff000235c000: tid 100004 "idle: cpu2"
curpmap      = 0xffffffff80889170
tssp         = 0xffffffff808f65d0
commontssp   = 0xffffffff808f65d0
rsp0         = 0xffffff8000051d00
gs32p        = 0xffffffff808f5408
ldt          = 0xffffffff808f5448
tss          = 0xffffffff808f5438

cpuid        = 3
dynamic pcpu = 0xffffff807f7d7080
curthread    = 0xffffff000235c470: pid 11 "idle: cpu3"
curpcb       = 0xffffff800004cd00
fpcurthread  = none
idlethread   = 0xffffff000235c470: tid 100003 "idle: cpu3"
curpmap      = 0xffffffff80889170
tssp         = 0xffffffff808f6638
commontssp   = 0xffffffff808f6638
rsp0         = 0xffffff800004cd00
gs32p        = 0xffffffff808f5470
ldt          = 0xffffffff808f54b0
tss          = 0xffffffff808f54a0

db> trace
Tracing pid 11 tid 100004 td 0xffffff000235c000
kdb_enter() at kdb_enter+0x3b
kdb_alt_break_internal() at kdb_alt_break_internal+0x8a
uart_intr() at uart_intr+0x2cd
intr_event_handle() at intr_event_handle+0x62
intr_execute_handlers() at intr_execute_handlers+0x5f
lapic_handle_intr() at lapic_handle_intr+0x37
Xapic_isr1() at Xapic_isr1+0xa5
--- interrupt, rip = 0xffffffff805ae2c6, rsp = 0xffffff8000051b10, rbp =
0xffffff8000051b20 ---
acpi_cpu_c1() at acpi_cpu_c1+0x6
acpi_cpu_idle() at acpi_cpu_idle+0x20a
sched_idletd() at sched_idletd+0x11f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8000051cf0, rbp = 0 ---

db> show lock Giant
 class: sleep mutex
 name: Giant
 flags: {DEF, RECURSE}
 state: {UNOWNED}

db> show lockedvnods
Locked vnodes

0xffffff012a730588: tag ufs, type VREG
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags (VV_SYSTEM)
    lock type snaplk: EXCL by thread 0xffffff00847cc000 (pid 50350)
 with exclusive waiters pending
        ino 23552, on dev mirror/gm0p10.journal

db> show lockchain
thread 100004 (pid 11, idle: cpu2) running on CPU 2

db> show sleepchain
thread 100004 (pid 11, idle: cpu2) running on CPU 2

db> show thread
Thread 100004 at 0xffffff000235c000:
 proc (pid 11): 0xffffff000235a470
 name: idle: cpu2
 stack: 0xffffff800004e000-0xffffff8000051fff
 flags: 0x50024  pflags: 0x200000
 state: RUNNING (CPU 2)
 priority: 255
 container lock: sched lock 2 (0xffffffff80894d80)

db> show proc 50350
Process 50350 (mksnap_ffs) at 0xffffff00847b68e0:
 state: NORMAL
 uid: 0  gids: 5
 parent: pid 50347 at 0xffffff00842d48e0
 ABI: FreeBSD ELF64
 arguments: /sbin/mksnap_ffs
 threads: 1
100215                   D       suspfs   0xffffff0002f2907c mksnap_ffs

db> show threads
  100111 (0xffffff0002d85000) (stack 0xffffff8245057000)  sched_switch()
at sched_switch+0xde
  100166 (0xffffff0002f6b000) (stack 0xffffff824516a000)  sched_switch()
at sched_switch+0xde
  100622 (0xffffff006a8d2000) (stack 0xffffff8245a52000)  sched_switch()
at sched_switch+0xde
  ....
  100215 (0xffffff00847cc000) (stack 0xffffff824525f000)  sched_switch()
at sched_switch+0xde
  ....

It seems there is a deadlock on the suspfs lock, but I could not figure
out who holds this lock.
Any hints how to get better diagnostic information for next time the
error occurs are welcome.

-- 
Andreas Longwitz




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DC30F6.1050904>