Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Oct 2006 20:16:32 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Vivek Khera <vivek@khera.org>
Cc:        stable@freebsd.org
Subject:   Re: ffs snapshot lockup
Message-ID:  <20061004171632.GI89654@deviant.kiev.zoral.com.ua>
In-Reply-To: <BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807@khera.org>
References:  <917B087C-5E13-4D7F-94FA-95CB0E5C1884@khera.org> <20060922190328.GA64849@xor.obsecurity.org> <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <DFEA4E5F-2337-4383-8765-F5901BDA49E9@khera.org> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org> <BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807@khera.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--54u2kuW9sGWg/X+X
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Oct 04, 2006 at 01:06:37PM -0400, Vivek Khera wrote:
>=20
> On Oct 4, 2006, at 12:39 PM, Kris Kennaway wrote:
>=20
> >>>
> >>>The only thing I think was running at the time would be a large file
> >>>copy from a remote system to this one using rsync.
> >>
> >>As I understand, you got the panic. Then, you shall post the panic =20
> >>message.
> >>If you have core file, then running kgdb on the core may show =20
> >>required
> >>information.
> >>(it shall be on the console exactly before en
> >>and backtrace (using the bt command of ddb) of the paniced thread.
> >
> >YOu can also do 'show msgbuf' from DDB.
> >
>=20
> i ran kgdb on the vmcore file.  since the dump was generated by =20
> calling doadump from DDB, the backtrace was showing the call stack of =20
> that.
>=20
> from what i read in the output from kgdb, it seems that something =20
> locked the kernel and we broke to debugger from the watchdog timeout =20
> (I enable software watchdog).
>=20
>=20
> When I fired up kgdb on my vmcore.19 file and ran the bt command, it =20
> said this:
>=20
>=20
> Unread portion of the kernel message buffer:
> interrupt                   total
> irq1: atkbd0                           2
> irq4: sio0                           348
> irq14: ata0                            1
> irq18: bge0                      3228387
> irq32: aac0                       235404
> irq34: ahc1                           74
> irq35: ahc0                           15
> cpu0: timer                     36123790
> Total                    39588021
> KDB: stack backtrace:
> hardclock() at hardclock+0x1bb
> lapic_handle_timer() at lapic_handle_timer+0x117
> Xtimerint() at Xtimerint+0x76
> ithread_loop() at ithread_loop+0x148
> fork_exit() at fork_exit+0xbb
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip =3D 0, rsp =3D 0xffffffffa61d7d00, rbp =3D 0 ---
> KDB: enter: watchdog timeout
> Locked vnodes
>=20
> 0xffffff002df5b798: tag nfs, type VDIR
>     usecount 2, writecount 0, refcount 2 mountedhere 0
>     flags (VV_ROOT)
>      lock type nfs: EXCL (count 1) by thread 0xffffff002a6c5980 (pid =20
> 49843)#0 0xffffffff802442b4 at lockmgr+0x5b7
> #1 0xffffffff803a0573 at VOP_LOCK_APV+0x80
> #2 0xffffffff802be6e5 at vn_lock+0x65
> #3 0xffffffff802b2cbe at vget+0x8f
> #4 0xffffffff802a84e6 at vfs_hash_get+0xc4
> #5 0xffffffff8030a3cc at nfs_nget+0xb9
> #6 0xffffffff80310a9e at nfs_root+0x34
> #7 0xffffffff802a96d7 at lookup+0xa14
> #8 0xffffffff802a9d12 at namei+0x385
> #9 0xffffffff802b8b59 at kern_lstat+0x62
> #10 0xffffffff802b8e73 at lstat+0x2a
> #11 0xffffffff8037ac13 at syscall+0x470
> #12 0xffffffff80368aa8 at Xfast_syscall+0xa8
>=20
>         fileid 3 fsid 0x400ff02
> Dumping 1015 MB (2 chunks)
>   chunk 0: 1MB (160 pages) ... ok
>   chunk 1: 1015MB (259776 pages) 999 983 967 951 935 919 903 887 871 =20
> 855 839 823 807 791 775 759 743 727 711 695 679 663 647 631 615 599 =20
> 583 567 551 535 519 503 487 471 455 439 423 407 391 375 359 343 327 =20
> 311 295 279 263 247 231 215 199 183 167 151 135 119 103 87 71 55 39 23 7
>=20
> #0  doadump () at pcpu.h:172
> 172             __asm __volatile("movq %%gs:0,%0" : "=3Dr" (td));
> (kgdb) bt
> #0  doadump () at pcpu.h:172
> #1  0xffffffff8017719b in db_fncall (dummy1=3D0, dummy2=3D0, dummy3=3D0, =
=20
> dummy4=3D0x0)
>     at /usr/src/sys/ddb/db_command.c:492
> #2  0xffffffff801775bf in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:350
> #3  0xffffffff801792dd in db_trap (type=3D-1508017968, code=3D0)
>     at /usr/src/sys/ddb/db_main.c:221
> #4  0xffffffff8026c72c in kdb_trap (type=3D3, code=3D0, =20
> tf=3D0xffffffffa61d79d0)
>     at /usr/src/sys/kern/subr_kdb.c:473
> #5  0xffffffff8037a4bf in trap (frame=3D
>       {tf_rdi =3D 0, tf_rsi =3D -2139025408, tf_rdx =3D 1, tf_rcx =3D =20
> 1057545, tf_r8 =3D 1048064, tf_r9 =3D 10, tf_rax =3D 29, tf_rbx =3D 0, tf=
_rbp =20
> =3D -1508017520, tf_r10 =3D -1508017760, tf_r11 =3D 10, tf_r12 =3D =20
> -2141840192, tf_r13 =3D 0, tf_r14 =3D -1099502938944, tf_r15 =3D =20
> -1099511596728, tf_trapno =3D 3, tf_addr =3D 0, tf_flags =3D =20
> -1099511596728, tf_err =3D 0, tf_rip =3D -2144943427, tf_cs =3D 8, =20
> tf_rflags =3D 134, tf_rsp =3D -1508017520, tf_ss =3D 16}) at /usr/src/sys=
/=20
> amd64/amd64/trap.c:442
> #6  0xffffffff8036890b in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:168
> #7  0xffffffff8026c2bd in kdb_enter (msg=3D0x0) at cpufunc.h:63
> #8  0xffffffff8036cc94 in lapic_handle_timer (frame=3D
>       {cf_rdi =3D -2036801520, cf_rsi =3D 1, cf_rdx =3D -1099502946304, =
=20
> cf_rcx =3D -1095242940416, cf_r8 =3D -2143479528, cf_r9 =3D -2143559117, =
=20
> cf_rax =3D 12582912, cf_rbx =3D -2036801536, cf_rbp =3D -1508017200, cf_r=
10 =20
> =3D 0, cf_r11 =3D 4, cf_r12 =3D -1099511596800, cf_r13 =3D 0, cf_r14 =3D =
=20
> -1099502938944, cf_r15 =3D -1099511596728, cf_rip =3D -2145575931, cf_cs =
=20
> =3D 8, cf_rflags =3D 514, cf_rsp =3D -1508017280, cf_ss =3D 16})
>     at /usr/src/sys/amd64/amd64/local_apic.c:635
> #9  0xffffffff80369166 in Xtimerint () at apic_vector.S:153
> #10 0xffffffff801d1c05 in bge_intr (xsc=3D0xffffffff8698e010) at bus.h:241
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Did you have any problems with your network card ? This seems to be
quite popular plot in recent times.

I do not see anything except wedged nfs request in this deadlock. Seems that
nfs server does not respond.

Please, include output of "ps" in the scripts. It makes much easier to grok
the situation quickly..

--54u2kuW9sGWg/X+X
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFFI+xwC3+MBN1Mb4gRAmBIAKDmfjXMhTsxrrbzwr0IdPJXLjp5YwCgvXoI
nXCYhzCRCLHpwiPrAcyEfw8=
=qAU+
-----END PGP SIGNATURE-----

--54u2kuW9sGWg/X+X--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061004171632.GI89654>