FreeBSD Mail Archives

Date:      Wed, 15 Feb 2012 14:21:49 -0500
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        stable@freebsd.org
Subject:   Re: ZFS + nullfs + Linuxulator = panic?
Message-ID:  <274B6964-3CFF-4706-845C-61FA4F8D0617@gromit.dlib.vt.edu>
In-Reply-To: <20120215002351.GB9938@icarus.home.lan>
References:  <CB455B5A-0583-4DFB-9712-6FFCC8B67AAB@gromit.dlib.vt.edu> <20120215002351.GB9938@icarus.home.lan>

index | next in thread | previous in thread | raw e-mail


On Feb 14, 2012, at 7:23 PM, Jeremy Chadwick wrote:

> On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
>> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08).  It will panic during the daily periodic scripts that run at 3am.  Here is the most recent panic message:
>> 
>> Fatal trap 9: general protection fault while in kernel mode
>> cpuid = 0; apic id = 00
>> instruction pointer     = 0x20:0xffffffff8069d266
>> stack pointer           = 0x28:0xffffff8094b90390
>> frame pointer           = 0x28:0xffffff8094b903a0
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                        = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = resume, IOPL = 0
>> current process         = 72566 (ps)
>> trap number             = 9
>> panic: general protection fault
>> cpuid = 0
>> KDB: stack backtrace:
>> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
>> #1 0xffffffff805facd3 at panic+0x183
>> #2 0xffffffff808e6c20 at trap_fatal+0x290
>> #3 0xffffffff808e715a at trap+0x10a
>> #4 0xffffffff808cec64 at calltrap+0x8
>> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
>> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
>> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
>> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
>> #9 0xffffffff8060473f at sysctl_root+0x14f
>> #10 0xffffffff80604a2a at userland_sysctl+0x14a
>> #11 0xffffffff80604f1a at __sysctl+0xaa
>> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
>> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
>> Uptime: 3d19h6m0s
>> Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
>> Dump complete
>> Automatic reboot in 15 seconds - press a key on the console to abort
>> Rebooting...
>> 
>> 
>> The reason for the subject line is that I have another RELENG_8 system that uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs is not the problem.  I am wondering if it is the combination of the three that is deadly, here.
>> 
>> Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a separate backup script that runs and completes before the regular "periodic daily" run.  This script takes a recursive snapshot of the ZFS pool and then mounts these snapshots via mount_nullfs to provide a coherent view of the filesystem under /backup.  The only difference between the two RELENG_8 systems is that one uses rsync to back up /backup to another machine and the other uses the Linux Tivoli TSM client to back up /backup to a TSM server.  After the backup is completed, a script runs that unmounts the nullfs file systems and then destroys the ZFS snapshot.
>> 
>> The first (rsync backup) RELENG_8 system does not panic.  It has been running the ZFS + nullfs rsync backup job without incident for weeks now.  The second (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic daily" job runs.  (It is using the 32-bit TSM 6.2.4 Linux client running "dsmc schedule" via the linux_base-f10-10_4 package.)  The actual ZFS + nullfs Tivoli TSM backup job appears to run successfully, making me wonder if perhaps it has some memory leak or other subtle corruption that sets up the ensuing panic when the "periodic daily" job later gives the system a workout.
>> 
>> If I can provide more information about the panic, please let me know.  Despite the message about dumping in the panic output above, when the system reboots I get a "No core dumps found" message during boot.  (I have dumpdev="AUTO" set in /etc/rc.conf.)  My swap device is on separate partitions but is mirrored using geom_mirror as /dev/mirror/swap.  Do crash dumps to gmirror devices work on RELENG_8?
> 
> See gmirror(8) man page, section NOTES.  Read the full thing.


Thanks!  I've changed the balance algorithm to "prefer", so hopefully I'll get saved crash dumps to examine from now on.


>> Does anyone have any idea what is to blame for the panic, or how I can fix or work around it?
> 
> Does the panic always happen when "ps" is run?  That's what's shown in
> the above panic message.  Quoting:
> 
>> current process         = 72566 (ps)
> 
> And I'm inclined to think it does, based on the backtrace:
> 
>> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
>> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
>> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
>> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
> 
> But if you can go through the previous panics and confirm that, it would
> be helpful to developers in tracking down the problem.


Just going by memory, at least one other time it did a panic during "df".  But, most of the time I remember the panic occurring during "ps".

Cheers,

Paul.

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?274B6964-3CFF-4706-845C-61FA4F8D0617>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation