Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Oct 2006 13:06:37 -0400
From:      Vivek Khera <vivek@khera.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        stable@freebsd.org
Subject:   Re: ffs snapshot lockup
Message-ID:  <BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807@khera.org>
In-Reply-To: <20061004163944.GA35412@xor.obsecurity.org>
References:  <917B087C-5E13-4D7F-94FA-95CB0E5C1884@khera.org> <20060922190328.GA64849@xor.obsecurity.org> <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <DFEA4E5F-2337-4383-8765-F5901BDA49E9@khera.org> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail-10--861712059
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	delsp=yes;
	format=flowed


On Oct 4, 2006, at 12:39 PM, Kris Kennaway wrote:

>>>
>>> The only thing I think was running at the time would be a large file
>>> copy from a remote system to this one using rsync.
>>
>> As I understand, you got the panic. Then, you shall post the panic  
>> message.
>> If you have core file, then running kgdb on the core may show  
>> required
>> information.
>> (it shall be on the console exactly before en
>> and backtrace (using the bt command of ddb) of the paniced thread.
>
> YOu can also do 'show msgbuf' from DDB.
>

i ran kgdb on the vmcore file.  since the dump was generated by  
calling doadump from DDB, the backtrace was showing the call stack of  
that.

from what i read in the output from kgdb, it seems that something  
locked the kernel and we broke to debugger from the watchdog timeout  
(I enable software watchdog).


When I fired up kgdb on my vmcore.19 file and ran the bt command, it  
said this:


Unread portion of the kernel message buffer:
interrupt                   total
irq1: atkbd0                           2
irq4: sio0                           348
irq14: ata0                            1
irq18: bge0                      3228387
irq32: aac0                       235404
irq34: ahc1                           74
irq35: ahc0                           15
cpu0: timer                     36123790
Total                    39588021
KDB: stack backtrace:
hardclock() at hardclock+0x1bb
lapic_handle_timer() at lapic_handle_timer+0x117
Xtimerint() at Xtimerint+0x76
ithread_loop() at ithread_loop+0x148
fork_exit() at fork_exit+0xbb
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffa61d7d00, rbp = 0 ---
KDB: enter: watchdog timeout
Locked vnodes

0xffffff002df5b798: tag nfs, type VDIR
     usecount 2, writecount 0, refcount 2 mountedhere 0
     flags (VV_ROOT)
      lock type nfs: EXCL (count 1) by thread 0xffffff002a6c5980 (pid  
49843)#0 0xffffffff802442b4 at lockmgr+0x5b7
#1 0xffffffff803a0573 at VOP_LOCK_APV+0x80
#2 0xffffffff802be6e5 at vn_lock+0x65
#3 0xffffffff802b2cbe at vget+0x8f
#4 0xffffffff802a84e6 at vfs_hash_get+0xc4
#5 0xffffffff8030a3cc at nfs_nget+0xb9
#6 0xffffffff80310a9e at nfs_root+0x34
#7 0xffffffff802a96d7 at lookup+0xa14
#8 0xffffffff802a9d12 at namei+0x385
#9 0xffffffff802b8b59 at kern_lstat+0x62
#10 0xffffffff802b8e73 at lstat+0x2a
#11 0xffffffff8037ac13 at syscall+0x470
#12 0xffffffff80368aa8 at Xfast_syscall+0xa8

         fileid 3 fsid 0x400ff02
Dumping 1015 MB (2 chunks)
   chunk 0: 1MB (160 pages) ... ok
   chunk 1: 1015MB (259776 pages) 999 983 967 951 935 919 903 887 871  
855 839 823 807 791 775 759 743 727 711 695 679 663 647 631 615 599  
583 567 551 535 519 503 487 471 455 439 423 407 391 375 359 343 327  
311 295 279 263 247 231 215 199 183 167 151 135 119 103 87 71 55 39 23 7

#0  doadump () at pcpu.h:172
172             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0xffffffff8017719b in db_fncall (dummy1=0, dummy2=0, dummy3=0,  
dummy4=0x0)
     at /usr/src/sys/ddb/db_command.c:492
#2  0xffffffff801775bf in db_command_loop ()
     at /usr/src/sys/ddb/db_command.c:350
#3  0xffffffff801792dd in db_trap (type=-1508017968, code=0)
     at /usr/src/sys/ddb/db_main.c:221
#4  0xffffffff8026c72c in kdb_trap (type=3, code=0,  
tf=0xffffffffa61d79d0)
     at /usr/src/sys/kern/subr_kdb.c:473
#5  0xffffffff8037a4bf in trap (frame=
       {tf_rdi = 0, tf_rsi = -2139025408, tf_rdx = 1, tf_rcx =  
1057545, tf_r8 = 1048064, tf_r9 = 10, tf_rax = 29, tf_rbx = 0, tf_rbp  
= -1508017520, tf_r10 = -1508017760, tf_r11 = 10, tf_r12 =  
-2141840192, tf_r13 = 0, tf_r14 = -1099502938944, tf_r15 =  
-1099511596728, tf_trapno = 3, tf_addr = 0, tf_flags =  
-1099511596728, tf_err = 0, tf_rip = -2144943427, tf_cs = 8,  
tf_rflags = 134, tf_rsp = -1508017520, tf_ss = 16}) at /usr/src/sys/ 
amd64/amd64/trap.c:442
#6  0xffffffff8036890b in calltrap ()
     at /usr/src/sys/amd64/amd64/exception.S:168
#7  0xffffffff8026c2bd in kdb_enter (msg=0x0) at cpufunc.h:63
#8  0xffffffff8036cc94 in lapic_handle_timer (frame=
       {cf_rdi = -2036801520, cf_rsi = 1, cf_rdx = -1099502946304,  
cf_rcx = -1095242940416, cf_r8 = -2143479528, cf_r9 = -2143559117,  
cf_rax = 12582912, cf_rbx = -2036801536, cf_rbp = -1508017200, cf_r10  
= 0, cf_r11 = 4, cf_r12 = -1099511596800, cf_r13 = 0, cf_r14 =  
-1099502938944, cf_r15 = -1099511596728, cf_rip = -2145575931, cf_cs  
= 8, cf_rflags = 514, cf_rsp = -1508017280, cf_ss = 16})
     at /usr/src/sys/amd64/amd64/local_apic.c:635
#9  0xffffffff80369166 in Xtimerint () at apic_vector.S:153
#10 0xffffffff801d1c05 in bge_intr (xsc=0xffffffff8698e010) at bus.h:241
#11 0xffffffff8023aab5 in ithread_loop (arg=0xffffff00008494c0)
---Type <return> to continue, or q <return> to quit---
     at /usr/src/sys/kern/kern_intr.c:682
#12 0xffffffff8023992f in fork_exit (
     callout=0xffffffff8023a96d <ithread_loop>, arg=0xffffff00008494c0,
     frame=0xffffffffa61d7c50) at /usr/src/sys/kern/kern_fork.c:821
#13 0xffffffff80368c6e in fork_trampoline ()
     at /usr/src/sys/amd64/amd64/exception.S:394
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000715000 in ?? ()
#47 0xffffffff00000001 in ?? ()
#48 0x0000000000000001 in ?? ()
#49 0xffffff003d0fb6b0 in ?? ()
#50 0xffffff002a6c5980 in ?? ()
#51 0xffffffffa61d7b80 in ?? ()
#52 0xffffffffa61d7b58 in ?? ()
#53 0xffffff003d0fd4c0 in ?? ()
#54 0xffffffff80264520 in sched_switch (td=0xffffff00008494c0,
     newtd=0xffffffff8023a96d, flags=0) at /usr/src/sys/kern/ 
sched_4bsd.c:973
Previous frame inner to this frame (corrupt stack?)
(kgdb)




--Apple-Mail-10--861712059--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807>