Date: Thu, 27 Dec 2012 15:33:55 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Andreas Longwitz <longwitz@incore.de> Cc: freebsd-stable@freebsd.org Subject: Re: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup Message-ID: <20121227133355.GI82219@kib.kiev.ua> In-Reply-To: <50DC30F6.1050904@incore.de> References: <50DC30F6.1050904@incore.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--Op27XXJsWz80g3oF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 27, 2012 at 12:28:54PM +0100, Andreas Longwitz wrote: > On a FreeBSD 8-Stable machine with UFS + GJOURNAL (no SU) I observed the > same behaviour as described for UFS+SU+J in > lists.freebsd.org/pipermail/freebsd-current/2012-January/030937.html. >=20 > The snapshot was initiated by amanda with > dump 3ubLshf 64 1048576 0 - /dev/mirror/gm0p10.journal (pid 50347) > and the process creating the snapshot is > /sbin/mksnap_ffs /home /home/.snap/dump_snapshot (pid 50350). >=20 > The process 50350 hangs and also all following processes that tried to > access the home partition, some of them work, but don't come to an end, > e.g. a shell after (Ctrl T): > load: 0.61 cmd: sh 52670 [suspfs] 43.46r 0.00u 0.00s 0% 2272k. >=20 > All write I/O's to all gjournaled partitions are stopped. Under normal > circumstances the snapshot is taken in five seconds, so I have definitiv > not the problem I have described in > lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html. >=20 > My system disk with root,var,usr and home is completely mirrored and > gjournaled with journals in extra partitions: > gpart show mirror/gm0 --> > =3D> 34 286749420 mirror/gm0 GPT (136G) > 34 128 1 freebsd-boot (64k) > 162 8601600 2 freebsd-swap (4.1G) > 8601762 2097152 3 freebsd-swap (1.0G) > 10698914 4194304 4 freebsd-swap (2.0G) > 14893218 4194304 5 freebsd-swap (2.0G) > 19087522 4194304 6 freebsd-swap (2.0G) > 23281826 2097152 7 freebsd-ufs (1.0G) > 25378978 8388608 8 freebsd-ufs (4.0G) > 33767586 67108864 9 freebsd-ufs (32G) > 100876450 185873004 10 freebsd-ufs (88G) > df -h -t ufs --> > Filesystem Size Used Avail Capacity Mounted on > /dev/mirror/gm0p7.journal 989M 313M 596M 34% / > /dev/mirror/gm0p8.journal 3.9G 2.2G 1.4G 61% /var > /dev/mirror/gm0p9.journal 31G 8.6G 19G 30% /usr > /dev/mirror/gm0p10.journal 85G 17G 62G 22% /home > gmirror status --> > Name Status Components > mirror/gm0 COMPLETE da6 (ACTIVE) > da7 (ACTIVE) > gjournal status --> > Name Status Components > mirror/gm0p7.journal N/A mirror/gm0p3 > mirror/gm0p7 > mirror/gm0p8.journal N/A mirror/gm0p4 > mirror/gm0p8 > mirror/gm0p9.journal N/A mirror/gm0p5 > mirror/gm0p9 > mirror/gm0p10.journal N/A mirror/gm0p6 > mirror/gm0p10 >=20 > I got some information from the hanging system with DDB: > KDB: enter: Break to debugger > [thread pid 11 tid 100004 ] > Stopped at kdb_enter+0x3b: movq $0,0x483332(%rip) > db> show pcpu > cpuid =3D 2 > dynamic pcpu =3D 0xffffff807f7d0080 > curthread =3D 0xffffff000235c000: pid 11 "idle: cpu2" > curpcb =3D 0xffffff8000051d00 > fpcurthread =3D none > idlethread =3D 0xffffff000235c000: tid 100004 "idle: cpu2" > curpmap =3D 0xffffffff80889170 > tssp =3D 0xffffffff808f65d0 > commontssp =3D 0xffffffff808f65d0 > rsp0 =3D 0xffffff8000051d00 > gs32p =3D 0xffffffff808f5408 > ldt =3D 0xffffffff808f5448 > tss =3D 0xffffffff808f5438 > db> show allpcpu > Current CPU: 2 >=20 > cpuid =3D 0 > dynamic pcpu =3D 0x449080 > curthread =3D 0xffffff0002368470: pid 11 "idle: cpu0" > curpcb =3D 0xffffff800005bd00 > fpcurthread =3D none > idlethread =3D 0xffffff0002368470: tid 100006 "idle: cpu0" > curpmap =3D 0xffffffff80889170 > tssp =3D 0xffffffff808f6500 > commontssp =3D 0xffffffff808f6500 > rsp0 =3D 0xffffff800005bd00 > gs32p =3D 0xffffffff808f5338 > ldt =3D 0xffffffff808f5378 > tss =3D 0xffffffff808f5368 >=20 > cpuid =3D 1 > dynamic pcpu =3D 0xffffff807f7c9080 > curthread =3D 0xffffff00023688e0: pid 11 "idle: cpu1" > curpcb =3D 0xffffff8000056d00 > fpcurthread =3D none > idlethread =3D 0xffffff00023688e0: tid 100005 "idle: cpu1" > curpmap =3D 0xffffffff80889170 > tssp =3D 0xffffffff808f6568 > commontssp =3D 0xffffffff808f6568 > rsp0 =3D 0xffffff8000056d00 > gs32p =3D 0xffffffff808f53a0 > ldt =3D 0xffffffff808f53e0 > tss =3D 0xffffffff808f53d0 >=20 > cpuid =3D 2 > dynamic pcpu =3D 0xffffff807f7d0080 > curthread =3D 0xffffff000235c000: pid 11 "idle: cpu2" > curpcb =3D 0xffffff8000051d00 > fpcurthread =3D none > idlethread =3D 0xffffff000235c000: tid 100004 "idle: cpu2" > curpmap =3D 0xffffffff80889170 > tssp =3D 0xffffffff808f65d0 > commontssp =3D 0xffffffff808f65d0 > rsp0 =3D 0xffffff8000051d00 > gs32p =3D 0xffffffff808f5408 > ldt =3D 0xffffffff808f5448 > tss =3D 0xffffffff808f5438 >=20 > cpuid =3D 3 > dynamic pcpu =3D 0xffffff807f7d7080 > curthread =3D 0xffffff000235c470: pid 11 "idle: cpu3" > curpcb =3D 0xffffff800004cd00 > fpcurthread =3D none > idlethread =3D 0xffffff000235c470: tid 100003 "idle: cpu3" > curpmap =3D 0xffffffff80889170 > tssp =3D 0xffffffff808f6638 > commontssp =3D 0xffffffff808f6638 > rsp0 =3D 0xffffff800004cd00 > gs32p =3D 0xffffffff808f5470 > ldt =3D 0xffffffff808f54b0 > tss =3D 0xffffffff808f54a0 >=20 > db> trace > Tracing pid 11 tid 100004 td 0xffffff000235c000 > kdb_enter() at kdb_enter+0x3b > kdb_alt_break_internal() at kdb_alt_break_internal+0x8a > uart_intr() at uart_intr+0x2cd > intr_event_handle() at intr_event_handle+0x62 > intr_execute_handlers() at intr_execute_handlers+0x5f > lapic_handle_intr() at lapic_handle_intr+0x37 > Xapic_isr1() at Xapic_isr1+0xa5 > --- interrupt, rip =3D 0xffffffff805ae2c6, rsp =3D 0xffffff8000051b10, rb= p =3D > 0xffffff8000051b20 --- > acpi_cpu_c1() at acpi_cpu_c1+0x6 > acpi_cpu_idle() at acpi_cpu_idle+0x20a > sched_idletd() at sched_idletd+0x11f > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip =3D 0, rsp =3D 0xffffff8000051cf0, rbp =3D 0 --- >=20 > db> show lock Giant > class: sleep mutex > name: Giant > flags: {DEF, RECURSE} > state: {UNOWNED} >=20 > db> show lockedvnods > Locked vnodes >=20 > 0xffffff012a730588: tag ufs, type VREG > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags (VV_SYSTEM) > lock type snaplk: EXCL by thread 0xffffff00847cc000 (pid 50350) > with exclusive waiters pending > ino 23552, on dev mirror/gm0p10.journal >=20 > db> show lockchain > thread 100004 (pid 11, idle: cpu2) running on CPU 2 >=20 > db> show sleepchain > thread 100004 (pid 11, idle: cpu2) running on CPU 2 >=20 > db> show thread > Thread 100004 at 0xffffff000235c000: > proc (pid 11): 0xffffff000235a470 > name: idle: cpu2 > stack: 0xffffff800004e000-0xffffff8000051fff > flags: 0x50024 pflags: 0x200000 > state: RUNNING (CPU 2) > priority: 255 > container lock: sched lock 2 (0xffffffff80894d80) >=20 > db> show proc 50350 > Process 50350 (mksnap_ffs) at 0xffffff00847b68e0: > state: NORMAL > uid: 0 gids: 5 > parent: pid 50347 at 0xffffff00842d48e0 > ABI: FreeBSD ELF64 > arguments: /sbin/mksnap_ffs > threads: 1 > 100215 D suspfs 0xffffff0002f2907c mksnap_ffs >=20 > db> show threads > 100111 (0xffffff0002d85000) (stack 0xffffff8245057000) sched_switch() > at sched_switch+0xde > 100166 (0xffffff0002f6b000) (stack 0xffffff824516a000) sched_switch() > at sched_switch+0xde > 100622 (0xffffff006a8d2000) (stack 0xffffff8245a52000) sched_switch() > at sched_switch+0xde > .... > 100215 (0xffffff00847cc000) (stack 0xffffff824525f000) sched_switch() > at sched_switch+0xde > .... >=20 > It seems there is a deadlock on the suspfs lock, but I could not figure > out who holds this lock. > Any hints how to get better diagnostic information for next time the > error occurs are welcome. The http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernel= debug-deadlocks.html provides the instructions. The suspfs is owned by the snapshot creator. The question is, where is it blocked. --Op27XXJsWz80g3oF Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ3E5BAAoJEJDCuSvBvK1BkYYP/1jFnmAbMv+p7NYqZJD8jH+b To1OFkKq5+Q1L0n2NVW5bkrQukHnwrygriuaPtxzZuvUKo5QrD467CKPh1V/2YUv PbvOukRD0uI3/pgJzfr8c5loNpw/+2UqoGCYGGYOBQ1sSvvOcUxviG9NVs+JhFv5 /LUSLrq7k7b/AIpw2EyTJKeuViWVfTr2D0uTNgXA8Kol4Anwm9cwqSXRYRiNsiTl NZv8/DWPV2FmCf3tszHdQawhULEUsmwZEt528F5ERiIwrkBm3sanqYgAqAYE1OlP cLsuLXVwKfYPxqZs6hIeFH5pnFezQejKvQy8t28zsK27LPaJoBNGGNyen5uP+i07 UM+hbw9fHYagL4k2zqGAeLA9stsVJHTd/Od5j8PpD7YUsYapuddpyw+RyfHjkMMR liiZdtZsbDT546DAsQgMJVpOdjaMHs3wuTwiaoaMIELY22oivj8MRsticyVmVBZn xN5SidM4RQ0WAOahvmtk67fdG9TwZ/+czPC+fV8VA7bcyozt//0E/ZNRvn3N+Vlj AxquS1KlPb4h7IdJHflyiat+PCQR7jpUSQFTeW6zlZqH7lzK5eSwWkDaTiANKwMU obZMDlvM779Cqi6i9Bq9bPZHQ4jMYFX8rUEtMksJQ+X824fbkCyartOBoq6REckw 3DdEX+PYiJradlr7yfbs =lYy0 -----END PGP SIGNATURE----- --Op27XXJsWz80g3oF--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121227133355.GI82219>