From owner-freebsd-fs@FreeBSD.ORG Fri Nov 1 17:31:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B018FA2F for ; Fri, 1 Nov 2013 17:31:08 +0000 (UTC) (envelope-from shawn.wallbridge@imaginaryforces.com) Received: from barracuda.imaginaryforces.com (199.193.209.132.static.oneiricsys.com [199.193.209.132]) by mx1.freebsd.org (Postfix) with ESMTP id 8ED502C27 for ; Fri, 1 Nov 2013 17:31:07 +0000 (UTC) X-ASG-Debug-ID: 1383327066-0413ad4d7f37dfc0001-3nHGF7 Received: from newman.IMAGINARYFORCES.LOCAL (newman.imaginaryforces.local [192.168.23.34]) by barracuda.imaginaryforces.com with ESMTP id DE77CfkjkvgUn95N; Fri, 01 Nov 2013 10:31:06 -0700 (PDT) X-Barracuda-Envelope-From: shawn.wallbridge@imaginaryforces.com Received: from NEWMAN.IMAGINARYFORCES.LOCAL ([192.168.23.34]) by newman.IMAGINARYFORCES.LOCAL ([192.168.23.34]) with mapi id 14.01.0438.000; Fri, 1 Nov 2013 10:30:57 -0700 From: Shawn Wallbridge To: Kirk McKusick , "freebsd-fs@freebsd.org" Subject: Re: FFS Softdep Kernel Panic Thread-Topic: FFS Softdep Kernel Panic X-ASG-Orig-Subj: Re: FFS Softdep Kernel Panic Thread-Index: AQHO1ygfRN1puknhSUmkxjIIrFy/ag== Date: Fri, 1 Nov 2013 17:30:57 +0000 Message-ID: In-Reply-To: <201311011656.rA1GuWCp045991@chez.mckusick.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.23.68] Content-Type: text/plain; charset="us-ascii" Content-ID: <351AC073AD8C6C40BC673679239140C7@IMAGINARYFORCES.COM> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: newman.imaginaryforces.local[192.168.23.34] X-Barracuda-Start-Time: 1383327066 X-Barracuda-URL: http://barracuda.imaginaryforces.com:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at imaginaryforces.com X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=8.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.141968 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Nov 2013 17:31:08 -0000 On 11/1/13 9:56 AM, "Kirk McKusick" wrote: >> From: Shawn Wallbridge >> To: "freebsd-fs@freebsd.org" >> Subject: FFS Softdep Kernel Panic >> Date: Fri, 1 Nov 2013 05:21:21 +0000 >> >> I am running a large (71TB) file (NFS w/ some Samba) server >>(9.2-RELEASE) >> and it has been crashing almost daily. I have been trying to track down >> the issue, but I haven't had any luck. >> >> The panic is.. >> >> panic: handle_workitem_remove: bad file delta >> cpuid =3D 9 >> KDB: stack backtrace: >> #0 0xffffffff80947986 at kdb_backtrace+0x66 >> #1 0xffffffff8090d9ae at panic+0x1ce >> #2 0xffffffff80b4143f at handle_workitem_remove+0x46f >> #3 0xffffffff80b4133a at handle_workitem_remove+0x36a >> #4 0xffffffff80b4069d at process_worklist_item+0x2bd >> #5 0xffffffff80b450da at softdep_process_worklist+0x8a >> #6 0xffffffff80b47a4d at softdep_flush+0x1ad >> #7 0xffffffff808db67f at fork_exit+0x11f >> #8 0xffffffff80cdc23e at fork_trampoline+0xe >> >> >> I looked at the source for ffs_softdep.c and found this, which seems to >>be >> the only place "bad file delta" shows up. >> >> /* >> * Normal file deletion. >> */ >> if ((dirrem->dm_state & RMDIR) =3D=3D 0) { >> ip->i_nlink--; >> DIP_SET(ip, i_nlink, ip->i_nlink); >> ip->i_flag |=3D IN_CHANGE; >> if (ip->i_nlink < ip->i_effnlink) >> panic("handle_workitem_remove: bad file delta"); >> if (ip->i_nlink =3D=3D 0) >> unlinked_inodedep(mp, inodedep); >> inodedep->id_nlinkdelta =3D ip->i_nlink - ip->i_effnlink; >> KASSERT(LIST_EMPTY(&dirrem->dm_jwork), >> ("handle_workitem_remove: worklist not empty. %s", >> TYPENAME(LIST_FIRST(&dirrem->dm_jwork)->wk_type))); >> WORKITEM_FREE(dirrem, D_DIRREM); >> FREE_LOCK(&lk); >> goto out; >> } >> >> I have created a PR, but I haven't had any response (no one has even >> downloaded the crash dumps I linked to). >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183424 >> >> Because this is a file server, in production, this is becoming a HUGE >> problem and is costing us quite a bit of lost production each time it >> crashes (and takes 4hrs to fsck). >> >> Thanks >> shawn > >I have taken a look at your bug report and have a couple of questions >about your system: > >Your kernel was built at the end of September. Has this problem >persisted since that kernel was build, or has it showed up more recently? > >Are you running with journaled soft updates or just regular soft >updates? You can use the mount command with no arguments to find out. > > Kirk McKusick > Thank you. This machine was originally built using 9.1-RELEASE, which had the problem as well, so I updated to 9.2-RELEASE to try to resolve the issue. I am running both, /dev/da0p1 on /sam (ufs, NFS exported, local, journaled soft-updates) I am building a kernel with invariants right now. shawn ________________________________ This e-mail is intended only for the named person or entity to which it is = addressed and contains valuable business information that is proprietary, p= rivileged, confidential and/or otherwise protected from disclosure. If you = received this e-mail in error, any review, use, dissemination, distribution= or copying of this e-mail is strictly prohibited. Please notify us immedia= tely of the error via e-mail to postmaster@imaginaryforces.c= om and please delete the e-mail from your system, retaining no copies in an= y media. We appreciate your cooperation. ...imaginaryforces.com...=0D