From owner-freebsd-stable@freebsd.org Wed Oct 7 00:09:13 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B0809D00B6 for ; Wed, 7 Oct 2015 00:09:13 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D96BD6BF for ; Wed, 7 Oct 2015 00:09:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 90795B918; Tue, 6 Oct 2015 20:09:11 -0400 (EDT) From: John Baldwin To: Rick Macklem Cc: Christian Kratzer , freebsd-stable@freebsd.org Subject: Re: smbfs crashes since approx. 10.1-RELEASE Date: Tue, 06 Oct 2015 17:08:54 -0700 Message-ID: <2148690.gx9M0ZzrG1@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca> References: <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 06 Oct 2015 20:09:11 -0400 (EDT) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Oct 2015 00:09:13 -0000 On Monday, October 05, 2015 06:16:54 PM Rick Macklem wrote: > Christian Kratzer wrote: > > Hi, > > > > I run a regular rsync job that runs from cron and copies stuff that gets > > created on a Windows smbfs share. > > > > Starting about 10.1-RELEASE the VM has become unstable and started panicing. > > > > I have narrowed the issue down to the aforementioned rsync job. > > > > When I move the job to a different VM the the other VM starts crashing and > > the VM without the job becomes stable agin. > > > > I have panics and crashinfos stored in /var/crash if anybody is interested: > > > > root@noc2:/var/crash # uname -a > > FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed > > Aug 12 15:26:37 UTC 2015 > > root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > > root@noc2:/var/crash # freebsd-version -u > > 10.2-RELEASE-p5 > > root@noc2:/var/crash # freebsd-version -k > > 10.2-RELEASE > > root@noc2:/var/crash # > > > > This is what I have in /var/crash/core.txt.0 > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x20 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80996c7c > > stack pointer = 0x28:0xfffffe003d6c0ac0 > > frame pointer = 0x28:0xfffffe003d6c0af0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = resume, IOPL = 0 > > current process = 1349 (smbiod10) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff80984e30 at kdb_backtrace+0x60 > > #1 0xffffffff809489e6 at vpanic+0x126 > > #2 0xffffffff809488b3 at panic+0x43 > > #3 0xffffffff80d4aadb at trap_fatal+0x36b > > #4 0xffffffff80d4addd at trap_pfault+0x2ed > > #5 0xffffffff80d4a47a at trap+0x47a > > #6 0xffffffff80d307f2 at calltrap+0x8 > > #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60 > > #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69 > > #9 0xffffffff81a1b724 at smb_iod_thread+0xb4 > > #10 0xffffffff8091244a at fork_exit+0x9a > > #11 0xffffffff80d30d2e at fork_trampoline+0xe > > Uptime: 2h43m55s > > Dumping 103 out of 999 MB: (CTRL-C to abort) > > ..16%..31%..47%..62%..78%..93% > > > This crash is occurring when doing an mtx_unlock(&Giant). Unfortunately, I'm not > conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight. > If you don't get any responses, I'd suggest reposting to freebsd-current@ with > "crashes in mtx_unlock(&Giant)" in the subject line. > > Btw John, the code does tsleep() in a loop before the mtx_unlock(&Giant). I do > remember that was once allowed, but am not sure if it still is (ie a tsleep() call > while holding Giant)? > > Hopefully someone who knows what is special about Giant that might cause this will > respond. > > Good luck with it, rick tsleep() with Giant is still allowed. However, this sort of panic usually means you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get an assertion failure earlier). I don't see anything obviously wrong in smb_iod_thread() however. If you have the crashdump, can you please run this in kgdb: frame 9 p (struct mtx *)c p *(struct mtx *)c -- John Baldwin