From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 17:08:27 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5B55A16A4D0 for ; Wed, 4 Oct 2006 17:08:27 +0000 (UTC) (envelope-from vivek@khera.org) Received: from yertle.kcilink.com (yertle.kcilink.com [65.205.34.180]) by mx1.FreeBSD.org (Postfix) with ESMTP id CF46743E41 for ; Wed, 4 Oct 2006 17:06:45 +0000 (GMT) (envelope-from vivek@khera.org) Received: from [192.168.7.103] (host-103.int.kcilink.com [192.168.7.103]) by yertle.kcilink.com (Postfix) with ESMTP id C47E2B81E; Wed, 4 Oct 2006 13:06:38 -0400 (EDT) In-Reply-To: <20061004163944.GA35412@xor.obsecurity.org> References: <917B087C-5E13-4D7F-94FA-95CB0E5C1884@khera.org> <20060922190328.GA64849@xor.obsecurity.org> <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org> Mime-Version: 1.0 (Apple Message framework v752.2) X-Gpgmail-State: !signed Content-Type: multipart/signed; micalg=sha1; boundary=Apple-Mail-10--861712059; protocol="application/pkcs7-signature" Message-Id: From: Vivek Khera Date: Wed, 4 Oct 2006 13:06:37 -0400 To: Kostik Belousov X-Mailer: Apple Mail (2.752.2) X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: stable@freebsd.org Subject: Re: ffs snapshot lockup X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 17:08:27 -0000 --Apple-Mail-10--861712059 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed On Oct 4, 2006, at 12:39 PM, Kris Kennaway wrote: >>> >>> The only thing I think was running at the time would be a large file >>> copy from a remote system to this one using rsync. >> >> As I understand, you got the panic. Then, you shall post the panic >> message. >> If you have core file, then running kgdb on the core may show >> required >> information. >> (it shall be on the console exactly before en >> and backtrace (using the bt command of ddb) of the paniced thread. > > YOu can also do 'show msgbuf' from DDB. > i ran kgdb on the vmcore file. since the dump was generated by calling doadump from DDB, the backtrace was showing the call stack of that. from what i read in the output from kgdb, it seems that something locked the kernel and we broke to debugger from the watchdog timeout (I enable software watchdog). When I fired up kgdb on my vmcore.19 file and ran the bt command, it said this: Unread portion of the kernel message buffer: interrupt total irq1: atkbd0 2 irq4: sio0 348 irq14: ata0 1 irq18: bge0 3228387 irq32: aac0 235404 irq34: ahc1 74 irq35: ahc0 15 cpu0: timer 36123790 Total 39588021 KDB: stack backtrace: hardclock() at hardclock+0x1bb lapic_handle_timer() at lapic_handle_timer+0x117 Xtimerint() at Xtimerint+0x76 ithread_loop() at ithread_loop+0x148 fork_exit() at fork_exit+0xbb fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffa61d7d00, rbp = 0 --- KDB: enter: watchdog timeout Locked vnodes 0xffffff002df5b798: tag nfs, type VDIR usecount 2, writecount 0, refcount 2 mountedhere 0 flags (VV_ROOT) lock type nfs: EXCL (count 1) by thread 0xffffff002a6c5980 (pid 49843)#0 0xffffffff802442b4 at lockmgr+0x5b7 #1 0xffffffff803a0573 at VOP_LOCK_APV+0x80 #2 0xffffffff802be6e5 at vn_lock+0x65 #3 0xffffffff802b2cbe at vget+0x8f #4 0xffffffff802a84e6 at vfs_hash_get+0xc4 #5 0xffffffff8030a3cc at nfs_nget+0xb9 #6 0xffffffff80310a9e at nfs_root+0x34 #7 0xffffffff802a96d7 at lookup+0xa14 #8 0xffffffff802a9d12 at namei+0x385 #9 0xffffffff802b8b59 at kern_lstat+0x62 #10 0xffffffff802b8e73 at lstat+0x2a #11 0xffffffff8037ac13 at syscall+0x470 #12 0xffffffff80368aa8 at Xfast_syscall+0xa8 fileid 3 fsid 0x400ff02 Dumping 1015 MB (2 chunks) chunk 0: 1MB (160 pages) ... ok chunk 1: 1015MB (259776 pages) 999 983 967 951 935 919 903 887 871 855 839 823 807 791 775 759 743 727 711 695 679 663 647 631 615 599 583 567 551 535 519 503 487 471 455 439 423 407 391 375 359 343 327 311 295 279 263 247 231 215 199 183 167 151 135 119 103 87 71 55 39 23 7 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0xffffffff8017719b in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:492 #2 0xffffffff801775bf in db_command_loop () at /usr/src/sys/ddb/db_command.c:350 #3 0xffffffff801792dd in db_trap (type=-1508017968, code=0) at /usr/src/sys/ddb/db_main.c:221 #4 0xffffffff8026c72c in kdb_trap (type=3, code=0, tf=0xffffffffa61d79d0) at /usr/src/sys/kern/subr_kdb.c:473 #5 0xffffffff8037a4bf in trap (frame= {tf_rdi = 0, tf_rsi = -2139025408, tf_rdx = 1, tf_rcx = 1057545, tf_r8 = 1048064, tf_r9 = 10, tf_rax = 29, tf_rbx = 0, tf_rbp = -1508017520, tf_r10 = -1508017760, tf_r11 = 10, tf_r12 = -2141840192, tf_r13 = 0, tf_r14 = -1099502938944, tf_r15 = -1099511596728, tf_trapno = 3, tf_addr = 0, tf_flags = -1099511596728, tf_err = 0, tf_rip = -2144943427, tf_cs = 8, tf_rflags = 134, tf_rsp = -1508017520, tf_ss = 16}) at /usr/src/sys/ amd64/amd64/trap.c:442 #6 0xffffffff8036890b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff8026c2bd in kdb_enter (msg=0x0) at cpufunc.h:63 #8 0xffffffff8036cc94 in lapic_handle_timer (frame= {cf_rdi = -2036801520, cf_rsi = 1, cf_rdx = -1099502946304, cf_rcx = -1095242940416, cf_r8 = -2143479528, cf_r9 = -2143559117, cf_rax = 12582912, cf_rbx = -2036801536, cf_rbp = -1508017200, cf_r10 = 0, cf_r11 = 4, cf_r12 = -1099511596800, cf_r13 = 0, cf_r14 = -1099502938944, cf_r15 = -1099511596728, cf_rip = -2145575931, cf_cs = 8, cf_rflags = 514, cf_rsp = -1508017280, cf_ss = 16}) at /usr/src/sys/amd64/amd64/local_apic.c:635 #9 0xffffffff80369166 in Xtimerint () at apic_vector.S:153 #10 0xffffffff801d1c05 in bge_intr (xsc=0xffffffff8698e010) at bus.h:241 #11 0xffffffff8023aab5 in ithread_loop (arg=0xffffff00008494c0) ---Type to continue, or q to quit--- at /usr/src/sys/kern/kern_intr.c:682 #12 0xffffffff8023992f in fork_exit ( callout=0xffffffff8023a96d , arg=0xffffff00008494c0, frame=0xffffffffa61d7c50) at /usr/src/sys/kern/kern_fork.c:821 #13 0xffffffff80368c6e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 #14 0x0000000000000000 in ?? () #15 0x0000000000000000 in ?? () #16 0x0000000000000001 in ?? () #17 0x0000000000000000 in ?? () #18 0x0000000000000000 in ?? () #19 0x0000000000000000 in ?? () #20 0x0000000000000000 in ?? () #21 0x0000000000000000 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () ---Type to continue, or q to quit--- #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0x0000000000000000 in ?? () #46 0x0000000000715000 in ?? () #47 0xffffffff00000001 in ?? () #48 0x0000000000000001 in ?? () #49 0xffffff003d0fb6b0 in ?? () #50 0xffffff002a6c5980 in ?? () #51 0xffffffffa61d7b80 in ?? () #52 0xffffffffa61d7b58 in ?? () #53 0xffffff003d0fd4c0 in ?? () #54 0xffffffff80264520 in sched_switch (td=0xffffff00008494c0, newtd=0xffffffff8023a96d, flags=0) at /usr/src/sys/kern/ sched_4bsd.c:973 Previous frame inner to this frame (corrupt stack?) (kgdb) --Apple-Mail-10--861712059--