Date: Fri, 22 Aug 2008 18:21:12 +0200 From: Kris Kennaway <kris@FreeBSD.org> To: Eric Crist <ecrist@fesecurity.com> Cc: User Questions <freebsd-questions@freebsd.org> Subject: Re: Kernel Panic help. Message-ID: <48AEE778.7020307@FreeBSD.org> In-Reply-To: <8A82B9BE-FE5A-4195-9C36-A164369B8AF2@fesecurity.com> References: <8A82B9BE-FE5A-4195-9C36-A164369B8AF2@fesecurity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Eric Crist wrote: > Hey folks, > > First, please 'reply-all' as I'm not on the list. > > I've got a backup server that, every night, offloads things to a > secondary, USB attached hard disk. We've got two of these disks, which > we rotate so as to have a fairly recent off-site version, in the event > of a disaster. One of the two drives has start to cause the backup > server to core dump and reboot. The other works fine. I tried taking > the problematic drive and repartitioning and reformatting it, but the > problems persist. > > Here is what I get from a kgdb: > > ecrist@leopard:/usr/obj/usr/src/sys/GENERIC-> sudo kgdb kernel.debug > /var/crash/vmcore.17 > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-marcel-freebsd". > > Unread portion of the kernel message buffer: > panic: softdep_deallocate_dependencies: dangling deps > cpuid = 0 > Uptime: 11d20h37m38s > Physical memory: 1011 MB > Dumping 201 MB: 186 170 154 138 122 106 90 74 58 42 26 10 > > #0 doadump () at pcpu.h:195 > 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > > > Any insight is appreciated. uname -a is: > > FreeBSD hostname 7.0-RELEASE-p3 FreeBSD 7.0-RELEASE-p3 #1: Tue Jul 15 > 13:53:28 CDT 2008 root@hostname:/usr/obj/usr/src/sys/GENERIC i386 See the developers handbook for more details on how to report panics (you also need the backtrace, and it may help to catch the problem earlier if you turn on debugging). However, this kind of panic can happen if the drive is marginal. e.g. if it loses or corrupts I/O in transit. Try compiling e.g. the /usr/src/tools/regression/fsx tool and running that against the problem disk for a few days, or even multiple instances on different files at once to really stress it. It will do lots of I/O to a file and verify that the file remains consistent throughout. It won't touch the whole drive though, so if only parts of the disk are bad it won't catch it. For that you could try generating a large random file on another disk, keeping the md5 checksum, then writing lots of copies of it to the bad disk to fill or almost fill it, then read back the md5 checksums of each to compare. A small script could run this in a loop. Yet another option would be to configure the disk as a geli or zfs volume, since that will validate checksums with each read and will catch data corruption anywhere on the disk. I'd validate those things before proceeding with the existing panic. Kris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48AEE778.7020307>