Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Nov 2009 16:28:21 -0500 (EST)
From:      Charles Sprickman <spork@bway.net>
To:        stable@freebsd.org
Subject:   Re: panic in 7.2 (ffs_alloc.c?)
Message-ID:  <alpine.OSX.2.00.0911231625220.19128@hotlap.local>
In-Reply-To: <alpine.OSX.2.00.0911220037320.19128@hotlap.local>
References:  <alpine.OSX.2.00.0911220037320.19128@hotlap.local>

next in thread | previous in thread | raw e-mail | index | archive | help
Just a follow-up...  The machine was waiting for a manual fsck - this 
crash seemed to scramble things up pretty good, it hit the jail partition 
hard and seemed to touch others that were quiet at the time.

I'm re-running mstone with an even heavier load to see if I can 
reproduce this again.

Full verbose dmesg:  http://pastie.org/711839

Should I bother with a PR or anything on this?  Doesn't look like a 
hardware issue to me.  It seems like there could be a nasty bug waiting in 
the UFS2 code somewhere, does anyone want to persue this at all?  I have 
the dump available for anyone that wants it.

Thanks,

Charles

On Sun, 22 Nov 2009, Charles Sprickman wrote:

> Howdy,
>
> I'm not expert at getting info out of a dump, but I'll do my best to provide 
> some information.
>
> This is a Dell PE2970 w/PERC6/i RAID running FreeBSD 7.2/amd64.  Brand new 
> box, has been doing very light work for about two weeks.  Last night I 
> started a very long mstone run on a jailed mail server and found that quite a 
> way into this burn-in, the box paniced.  I was going to put it in service 
> Monday (after punishing it all weekend).  Looking for some input on what the 
> root cause is and whether going to a -stable snapshot might be worthwhile.
>
> I can tell you there was a good deal of disk activity at the time in the jail 
> - mstone was simulating 100 POP and SMTP clients hitting the machine at once. 
> This is qmail+courier.  So messages are coming in, hitting the queue, hitting 
> a user's maildir, getting read and deleted via the POP "client" over and over 
> again.  I do see lots of "ffs_*" stuff in the backtrace, which is a little 
> scary.
>
> Here's my stab at a kgdb session (also @ pastie for easier reading: 
> http://pastie.org/709671):
>
> [root@bigmail /usr/obj/usr/src/sys/BWAY7-64]# kgdb kernel.debug 
> /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x12d4b9f5c
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x8:0xffffffff8050382e
> stack pointer           = 0x10:0xffffffff281a75b0
> frame pointer           = 0x10:0xffffff000455f800
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 6324 (vdelivermail)
> trap number             = 12
> panic: page fault
> cpuid = 0
> Uptime: 12d0h32m3s
> Physical memory: 6130 MB
> Dumping 725 MB: 710 694 678 662 646 630 614 598 582 566 550 534 518 502 486 
> 470 454 438 422 406 390 374 358 342 326 310 294 278 262 246 230 214 198 182 
> 166 150 134 118 102 86 70 54 38 22 6
>
> Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from 
> /boot/kernel/nullfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/nullfs.ko
> Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from 
> /boot/kernel/fdescfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/fdescfs.ko
> #0  doadump () at pcpu.h:195
> 195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
> #3  0xffffffff8034cba2 in panic (fmt=0x104 <Address 0x104 out of bounds>)
>    at /usr/src/sys/kern/kern_shutdown.c:574
> #4  0xffffffff80574823 in trap_fatal (frame=0xffffff00046c8000, eva=Variable 
> "eva" is not available.
> )
>    at /usr/src/sys/amd64/amd64/trap.c:757
> #5  0xffffffff80574bf5 in trap_pfault (frame=0xffffffff281a7500, usermode=0)
>    at /usr/src/sys/amd64/amd64/trap.c:673
> #6  0xffffffff80575534 in trap (frame=0xffffffff281a7500)
>    at /usr/src/sys/amd64/amd64/trap.c:444
> #7  0xffffffff8055969e in calltrap ()
>    at /usr/src/sys/amd64/amd64/exception.S:209
> #8  0xffffffff8050382e in ffs_realloccg (ip=0xffffff00267f75c0, lbprev=0,
>    bprev=6288224785898156086, bpref=593305256, osize=0, nsize=2048,
>    flags=33619968, cred=0xffffff00927fe800, bpp=0xffffffff281a7800)
>    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1349
> #9  0xffffffff80506e8e in ffs_balloc_ufs2 (vp=0xffffff0027a64dc8, 
> startoffset=Variable "startoffset" is not available.
> )
>    at /usr/src/sys/ufs/ffs/ffs_balloc.c:692
> #10 0xffffffff805223e5 in ffs_write (ap=0xffffffff281a7a10)
>    at /usr/src/sys/ufs/ffs/ffs_vnops.c:724
> #11 0xffffffff805a0645 in VOP_WRITE_APV (vop=0xffffffff80793d20,
>    a=0xffffffff281a7a10) at vnode_if.c:691
> #12 0xffffffff803dd731 in vn_write (fp=0xffffff001027cd00,
>    uio=0xffffffff281a7b00, active_cred=Variable "active_cred" is not 
> available.
> ) at vnode_if.h:373
> #13 0xffffffff80388768 in dofilewrite (td=0xffffff00046c8000, fd=5,
>    fp=0xffffff001027cd00, auio=dwarf2_read_address: Corrupted DWARF 
> expression.
> ) at file.h:257
> #14 0xffffffff80388a6e in kern_writev (td=0xffffff00046c8000, fd=5,
>    auio=0xffffffff281a7b00) at /usr/src/sys/kern/sys_generic.c:402
> #15 0xffffffff80388aec in write (td=0x800, uap=0x12d4b9f50)
>    at /usr/src/sys/kern/sys_generic.c:318
> #16 0xffffffff80596a66 in ia32_syscall (frame=0xffffffff281a7c80)
>    at /usr/src/sys/amd64/ia32/ia32_syscall.c:182
> #17 0xffffffff80559ad0 in Xint0x80_syscall () at ia32_exception.S:65
> #18 0x0000000028167928 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>
> Full dmesg, verbose boot and kernel config at pastie as well.  Actually no 
> verbose boot...  I rebooted the box after setting verbose boot with 
> "nextboot" and it didn't come back.  Hrmph.  No remote console, so I don't 
> know what's up, perhaps waiting on some manual fsck action.
>
> Thanks,
>
> Charles
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.0911231625220.19128>