Date: Fri, 1 Nov 2013 17:30:57 +0000 From: Shawn Wallbridge <shawn.wallbridge@imaginaryforces.com> To: Kirk McKusick <mckusick@mckusick.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: FFS Softdep Kernel Panic Message-ID: <CE9932CD.10E03%shawn.wallbridge@imaginaryforces.com> In-Reply-To: <201311011656.rA1GuWCp045991@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/1/13 9:56 AM, "Kirk McKusick" <mckusick@mckusick.com> wrote: >> From: Shawn Wallbridge <shawn.wallbridge@imaginaryforces.com> >> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> >> Subject: FFS Softdep Kernel Panic >> Date: Fri, 1 Nov 2013 05:21:21 +0000 >> >> I am running a large (71TB) file (NFS w/ some Samba) server >>(9.2-RELEASE) >> and it has been crashing almost daily. I have been trying to track down >> the issue, but I haven't had any luck. >> >> The panic is.. >> >> panic: handle_workitem_remove: bad file delta >> cpuid =3D 9 >> KDB: stack backtrace: >> #0 0xffffffff80947986 at kdb_backtrace+0x66 >> #1 0xffffffff8090d9ae at panic+0x1ce >> #2 0xffffffff80b4143f at handle_workitem_remove+0x46f >> #3 0xffffffff80b4133a at handle_workitem_remove+0x36a >> #4 0xffffffff80b4069d at process_worklist_item+0x2bd >> #5 0xffffffff80b450da at softdep_process_worklist+0x8a >> #6 0xffffffff80b47a4d at softdep_flush+0x1ad >> #7 0xffffffff808db67f at fork_exit+0x11f >> #8 0xffffffff80cdc23e at fork_trampoline+0xe >> >> >> I looked at the source for ffs_softdep.c and found this, which seems to >>be >> the only place "bad file delta" shows up. >> >> /* >> * Normal file deletion. >> */ >> if ((dirrem->dm_state & RMDIR) =3D=3D 0) { >> ip->i_nlink--; >> DIP_SET(ip, i_nlink, ip->i_nlink); >> ip->i_flag |=3D IN_CHANGE; >> if (ip->i_nlink < ip->i_effnlink) >> panic("handle_workitem_remove: bad file delta"); >> if (ip->i_nlink =3D=3D 0) >> unlinked_inodedep(mp, inodedep); >> inodedep->id_nlinkdelta =3D ip->i_nlink - ip->i_effnlink; >> KASSERT(LIST_EMPTY(&dirrem->dm_jwork), >> ("handle_workitem_remove: worklist not empty. %s", >> TYPENAME(LIST_FIRST(&dirrem->dm_jwork)->wk_type))); >> WORKITEM_FREE(dirrem, D_DIRREM); >> FREE_LOCK(&lk); >> goto out; >> } >> >> I have created a PR, but I haven't had any response (no one has even >> downloaded the crash dumps I linked to). >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183424 >> >> Because this is a file server, in production, this is becoming a HUGE >> problem and is costing us quite a bit of lost production each time it >> crashes (and takes 4hrs to fsck). >> >> Thanks >> shawn > >I have taken a look at your bug report and have a couple of questions >about your system: > >Your kernel was built at the end of September. Has this problem >persisted since that kernel was build, or has it showed up more recently? > >Are you running with journaled soft updates or just regular soft >updates? You can use the mount command with no arguments to find out. > > Kirk McKusick > Thank you. This machine was originally built using 9.1-RELEASE, which had the problem as well, so I updated to 9.2-RELEASE to try to resolve the issue. I am running both, /dev/da0p1 on /sam (ufs, NFS exported, local, journaled soft-updates) I am building a kernel with invariants right now. shawn ________________________________ This e-mail is intended only for the named person or entity to which it is = addressed and contains valuable business information that is proprietary, p= rivileged, confidential and/or otherwise protected from disclosure. If you = received this e-mail in error, any review, use, dissemination, distribution= or copying of this e-mail is strictly prohibited. Please notify us immedia= tely of the error via e-mail to <ifpostmaster> postmaster@imaginaryforces.c= om and please delete the e-mail from your system, retaining no copies in an= y media. We appreciate your cooperation. ...imaginaryforces.com...=0D
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CE9932CD.10E03%shawn.wallbridge>